Lesson 10 Discussion & Wiki (2019)

Do you mean with skipping some stats re-calculations? or are you saying that replacing:

        self.sums.detach_()
        self.sqrs.detach_()

with:

        x = x.detach()

and not skipping any calculations has a detrimental impact on the outcome?

However it doesn’t actually skip any computation at the moment, since you’re checking self.batch%2 , which is always true, since bs is even. You should instead create and check an iteration counter - if you do that, you’ll find you still have the dreaded “Trying to backward through the graph a second time, but the buffers have already been freed” error!..

Ah, yes, good catch - thank you, Jeremy! Let’s add self.bc counter:

        self.bc = 0 # batch counter
    def update_stats(self, x):
        self.bc += 1

So, we need to get a good understanding what is to be in the graph and what not.

With:

        x = x.detach()

all temp results are not part of the graph. Only factor and offset are since their calculation includes mults and adds params. You can check:

    def forward(self, x):
        if self.training: self.update_stats(x)
        if self.bc < 2:
            l = "factor offset mults adds sums sqrs count means varns s ss c".split()
            print("not leaf ", list(filter(lambda x: not getattr(self,x).is_leaf, l)))
            print("want grad", list(filter(lambda x: getattr(self,x).requires_grad, l)))
        return x*self.factor + self.offset
not leaf  ['factor', 'offset']
want grad ['factor', 'offset', 'mults', 'adds']

I guess the refactoring to reduce broadcasting added this complication of having factor and offset to be part of the graph. Perhaps this is wrong?

If you detach sums and sqrs, instead of x you end up with:

not leaf  ['factor', 'offset', 'sums', 'sqrs', 'means', 'varns', 's', 'ss']
want grad ['factor', 'offset', 'mults', 'adds', 'sums', 'sqrs', 'means', 'varns', 's', 'ss']

So now all those running temps will no longer do the right thing, since they will get changed by backprop and we want them to be fixed.

Moreover you are detaching them in the wrong place. You detach them at the beginning of update_stats, but then you make a calculation on them which involves undetached x and they end up being on the graph again! So you want to detach them after all calculations are done if you don’t detach x. But as I have shown above this is not right either, since a whole bunch of other temps are now on the graph and will be “adjusted” by the net.

Now going back to the very original implementation as it was presented in the class (with dbias), we get:

not leaf  ['sums', 'sqrs']
want grad ['mults', 'adds', 'sums', 'sqrs']

So it wasn’t detaching them either!

Only after you move them to the end of update_stats:

        self.sums.detach_()
        self.sqrs.detach_()

you get:

not leaf  []
want grad ['mults', 'adds']

I’m still trying to wrap my head around this detach thing, so please bear with me if I’m saying an incorrect thing. If what I described above is correct, then you were getting good results not because of the better BN (or at least not just because of it), but because your temps were actually backpropagated, so the stats weren’t calculated on the running averages, but on running averages that are also variables that are learnable - i.e. the network was messing (in a good way) with those numbers that we intended to be fixed. Does this make sense?

And this in a long way answers why you get the error. You tried to skip calculations on variables that are on the graph and that’s why you get the error.

If you detach all of those other temp vars, you won’t get the error. i.e. finish update_stats with:

        l = "sums sqrs count means varns s ss c".split()
        for a in l: getattr(self,a).detach_()

The originally proposed:

x = x.detach()

at the very beginning of update_stats does the same thing, but more efficiently, since the code doesn’t need to swap temp vars back and forth to require grads and then not.

To conclude: decide which variables are to be fixed and controlled only by you, and which are learnable, and then use detach accordingly. Perhaps a significant part of the magic of RunningBatchNorm is a side-effect of a coding mistake :slight_smile:

1 Like

Specifically to the error:

“Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.”

What I understood from Thomas is that pytorch compares the graph from the previous batch with the graph generated by the current batch - if the new graph lacks a node that was in the old graph that’s when you get this error.

That’s why you must repeat calculations that place all the variables that were in the graph on the very first batch, and they can’t be skipped.

I don’t know enough pytorch yet to explain why dynamically skipping a node is considered to be an error. So this is only a circumstantial explanation.

@t-v, please kindly correct me if I my explanation is incorrect or incomplete.

5 posts were merged into an existing topic: Running batch norm tweaks

To get around my conflict issues in conda with pytorch nightly build:

In the past we used local directories to import from and there were no conda installs. So the basic use of datasets is to load data from the net. If fastai is cloned to your local environment or even just the datasets part and imported from there you could work without a conda fastai version.

I took from from the github clone 3 items

datasets.py
core.py
the imports directory with all its contents

and placed in as fastai directory at the same level as exp

After using you step 1 above for the conda no-fastai environment I installed these additions

fastprogess
jupiter
pyyaml and yaml
requests

This gave me an environment which has minimal fastai and which on the running of 08_data_block works fine at least until the image of a man with a TENCH

I also installed for my own use

pandas, pandas-summary, sklearn-pandas
scipy

I hope my memory serves me right here so in case it doesn’t

Here is a package list for such an environment

asn1crypto=0.24.0=py37_0
attrs=19.1.0=py37_1
backcall=0.1.0=py37_0
blas=1.0=mkl
bleach=3.1.0=py37_0
ca-certificates=2019.1.23=0
certifi=2019.3.9=py37_0
cffi=1.12.2=py37h2e261b9_1
chardet=3.0.4=py37_1
cryptography=2.6.1=py37h1ba5d50_0
cudatoolkit=10.0.130=0
cycler=0.10.0=py37_0
dbus=1.13.6=h746ee38_0
decorator=4.4.0=py37_1
defusedxml=0.5.0=py37_1
entrypoints=0.3=py37_0
expat=2.2.6=he6710b0_0
fastprogress=0.1.21=py_0
fontconfig=2.13.0=h9420a91_0
freetype=2.9.1=h8a8886c_1
glib=2.56.2=hd408876_0
gmp=6.1.2=h6c8ec71_1
gst-plugins-base=1.14.0=hbbd80ab_1
gstreamer=1.14.0=hb453b48_1
icu=58.2=h9c2bf20_1
idna=2.8=py37_0
intel-openmp=2019.3=199
ipykernel=5.1.0=py37h39e3cac_0
ipython=7.4.0=py37h39e3cac_0
ipython_genutils=0.2.0=py37_0
ipywidgets=7.4.2=py37_0
jedi=0.13.3=py37_0
jinja2=2.10.1=py37_0
jpeg=9b=h024ee3a_2
jsonschema=3.0.1=py37_0
jupyter=1.0.0=py37_7
jupyter_client=5.2.4=py37_0
jupyter_console=6.0.0=py37_0
jupyter_core=4.4.0=py37_0
kiwisolver=1.0.1=py37hf484d3e_0
libedit=3.1.20181209=hc058e9b_0
libffi=3.2.1=hd88cf55_4
libgcc-ng=8.2.0=hdf63c60_1
libgfortran-ng=7.3.0=hdf63c60_0
libpng=1.6.36=hbc83047_0
libsodium=1.0.16=h1bed415_0
libstdcxx-ng=8.2.0=hdf63c60_1
libtiff=4.0.10=h2733197_2
libuuid=1.0.3=h1bed415_2
libxcb=1.13=h1bed415_1
libxml2=2.9.9=he19cac6_0
markupsafe=1.1.1=py37h7b6447c_0
matplotlib=3.0.3=py37h5429711_0
mistune=0.8.4=py37h7b6447c_0
mkl=2019.3=199
mkl_fft=1.0.10=py37ha843d7b_0
mkl_random=1.0.2=py37hd81dba3_0
nbconvert=5.4.1=py37_3
nbformat=4.4.0=py37_0
ncurses=6.1=he6710b0_1
ninja=1.9.0=py37hfd86e86_0
notebook=5.7.8=py37_0
numpy=1.16.2=py37h7e9f1db_0
numpy-base=1.16.2=py37hde5b4d6_0
olefile=0.46=py37_0
openssl=1.1.1b=h7b6447c_1
pandas=0.24.2=py37he6710b0_0
pandas-summary=0.0.41=py_1
pandoc=2.2.3.2=0
pandocfilters=1.4.2=py37_1
parso=0.3.4=py37_0
pcre=8.43=he6710b0_0
pexpect=4.6.0=py37_0
pickleshare=0.7.5=py37_0
pillow=5.4.1=py37h34e0f95_0
pip=19.0.3=py37_0
prometheus_client=0.6.0=py37_0
prompt_toolkit=2.0.9=py37_0
ptyprocess=0.6.0=py37_0
pycparser=2.19=py37_0
pygments=2.3.1=py37_0
pyopenssl=19.0.0=py37_0
pyparsing=2.4.0=py_0
pyqt=5.9.2=py37h05f1152_2
pyrsistent=0.14.11=py37h7b6447c_0
pysocks=1.6.8=py37_0
python=3.7.3=h0371630_0
python-dateutil=2.8.0=py37_0
pytorch=1.0.1=py3.7_cuda10.0.130_cudnn7.4.2_2
pytorch-nightly=1.1.0.dev20190413=py3.7_cuda10.0.130_cudnn7.4.2_0
pytz=2018.9=py37_0
pyyaml=5.1=py37h7b6447c_0
pyzmq=18.0.0=py37he6710b0_0
qt=5.9.7=h5867ecd_1
qtconsole=4.4.3=py37_0
readline=7.0=h7b6447c_5
requests=2.21.0=py37_0
scikit-learn=0.20.3=py37hd81dba3_0
scipy=1.2.1=py37h7c811a0_0
send2trash=1.5.0=py37_0
setuptools=41.0.0=py37_0
sip=4.19.8=py37hf484d3e_0
six=1.12.0=py37_0
sklearn-pandas=1.8.0=pypi_0
sqlite=3.27.2=h7b6447c_0
terminado=0.8.1=py37_1
testpath=0.4.2=py37_0
tk=8.6.8=hbc83047_0
torchvision=0.2.2=py_3
tornado=6.0.2=py37h7b6447c_0
traitlets=4.3.2=py37_0
urllib3=1.24.1=py37_0
wcwidth=0.1.7=py37_0
webencodings=0.5.1=py37_1
wheel=0.33.1=py37_0
widgetsnbextension=3.4.2=py37_0
xz=5.2.4=h14c3975_4
yaml=0.1.7=had09818_2
zeromq=4.3.1=he6710b0_3
zlib=1.2.11=h7b6447c_3
zstd=1.3.7=h0b5b093_0

I’m not sure what problem you’re trying to solve, @RogerS49 - just install fastai in whatever way you like - conda, pip, local checkout and it just works with the part2 lessons.

Well I got rid of my conflict issues in conda with pytorch nightly build. This makes more sense to me as whats in those packages and dependencies are not really fastai it seems except around data URLs. I managed to run the whole of the 08_data_block notebook, perhaps I may run into other dependency issues I am not aware of. Thanks for your reply.

1 Like

You could always just download the source via github and place it under the lessons

git clone https://github.com/fastai/fastai_docs
git clone https://github.com/fastai/fastai
cd fastai_docs/dev_course/dl2
ln -s ../../../fastai/fastai .

so now when you load a notebook from fastai_docs/dev_course/dl2 it will use these local fastai modules since '' (nb dir) is always in sys.path. This way you don’t have any dependency conflicts to deal with since you’re not using any package manager here.

This is more or less what you suggested you did above, just easier since you don’t need to go and fish out specific files from the fastai modules.

1 Like

I just quickly plotted the layer norm vs batch norm for the sunny vs foggy day to double-check Jeremy’s thought on why layer norm doesn’t do well. And the plots confirmed it.

Sunny road and foggy road before (top row) and after (bottom row) applying layer norm


Really hard to tell which one is sunny/foggy for the bottom 2 images.

For comparison, I did the the same processes for batch norm


Way better!

9 Likes

Although I can’t offer a resource, I can offer empathy. I was fairly relieved when Jeremy noted that his utility function for loading images took a week to develop. I would’ve felt like throwing in the towel if he had said he wrote it while eating breakfast one morning.

3 Likes

feel same. :frowning:

1 Like

the Gof book has formed many software engineers (me included) : https://en.wikipedia.org/wiki/Design_Patterns

1 Like

Go for The Swift Programming Language, a nice resource.

You can read about design patterns but in real life nobody uses them.

1 Like

whats you PoV on these books, also newer versions like clean code or refactoring ?

Personally, while I totally get the idea of patterns and clean code, I find many books and articles on the subject verbose, sometimes a little dogmatic, I do not agree necessarily in the details (typically I find they make simple things complex honestly) and always dry to read. Maybe its like writing a book about salsa dancing: style matters but you just get it on the dancefloor (never ever with a book).

:slight_smile:

the clean code reference looks like good housekeeping rules.
concerning refactoring i have not read the book. However Martin fowler is one of my heroes and with a foreword of Eric Gamma (one of the authors og the Gof book) it doesn’t get better.

I think that design pattern are important in the same way that we expect certain components to be standardised when building a house. It is just too much mental overhead (and often short sighted) if everybody invents their own personal way of doing things. This is not to says that a design pattern is implement in identical ways in every language but the concept should transcend languages.

I know some people have this point of view, and that’s fine. Personally however I find the exact opposite - I’ve found trying to shoehorn things into a set of predefined patterns limits my thinking and is harder for me to understand than avoiding that idea entirely.

2 Likes

Many design patterns (if not all, AFAIK) focus on Object Oriented programming paradigm. We are dealing with a mix of Object Oriented, Functional and Dataflow paradigms. This makes OO patterns partially applicable, but not that useful within a bigger picture. We need a new methodology and new design patters to emerge.

Fastai programming style gives us an interesting example and insights into what these patterns might be. Fastai offers examples of well thought-through use of decorators, closures, partials and compose. I wish Software Engineering methodology researchers paid more attention to it.

2 Likes

I like your point about the over-emphasize of OO patterns, I guess if I would find a book with coding patterns that look beyond languages and specific paradigms it would definitely be worth the read and the fast.ai code to me is the best source I am aware of. I still suspect programming like speaking a language is a skill, where you can’t just learn grammar and some elegant ways to express yourself to become a master.

good point concerning functional programming. The processor pattern is fastai is a good match for that

1 Like

If you find such a book, please let me know.

1 Like

I see Jeremy here being complementary about Fowler

So maybe I give his newly revised book on refactoring a closer look another time

1 Like