Lesson 14 official topic

Absolutely! So typically a PyTorch tensor has the channel axis as the first indexchannels first. This portion of the function is checking to see that the tensor that it receives still has this orientation of its axes. If it does, it will permute the tensor to ensure that the number of channels is the last index → channels last. This is the orientation of the axes that matplotlib expects in order to display an image.

In our notebook example, the img that gets passed in is a PyTorch tensor with size [1, 28, 28]
so im.shape[0] < 5 checks to make sure that we still have our channels in the first dimension. If that is the case, it will permute it as needed for matplotlib.

Conversely, if you had passed in a tensor that was already permuted into the desired final shape ([28, 28, 1]) → the if statement would return false because the first index has a shape greater than 5 and subsequently no permutation is necessary and the code will continue along without changing the image tensor.

Hope this helps!

1 Like

Thanks @ali_baba! I did understand that the code was trying to get it to a ‘channel last’ format what I was more looking to understand was the use of the 5 to determine whether an input image tensor is in the ‘channel first’. Now that I think of it I suppose this must be because images are expected to have at most 4 channels.

2 Likes

I believe that is right – sorry for not thinking about and speaking to that point. A 3 channel image is in RGB format and 4 channel is CMYK – here’s a nice visual representation of the two: python - What is the 4th channel in an image? - Stack Overflow

2 Likes

4th could also be alpha channel in RGBA format (RGB with transparency).

2 Likes
train_dl = DataLoader(train_ds, batchs=train_samp, n_workers=2)
it = iter(train_dl)
xb,yb = next(it)
xb.shape,yb.shape

The multiprocessing dataloader code seems to be taking forever to run on my local machine whereas it works fine on colab.

Any pointers as to why this might be happening?

In the minibatch_training, the result after RandomSpling part is not good.

image

Switch from shuffle in training to False then it learns better

I’m not sure why it happens. I think it is a bug because the result is still bad no matter if I change the learning rate and epochs.

1 Like

Silly comment, but I don’t know where to put it. I think that the title of the page in Practical Deep Learning for Coders - 14: Matrix multiplication should be “Backpropagation” rather than “Matrix multiplication”

Oops! Thanks for letting us know. Will fix now.

1 Like

I also got the similar results when testing this out . Do you know why this might be the case ? It’s a bit strange on first sight .

I was confused with the DataLoader implementation in random sampling section and thought I’ll share my confusion

From the notes

class DataLoader():
    def __init__(self, ds, batchs, collate_fn=collate): fc.store_attr()
    def __iter__(self): yield from (self.collate_fn(self.ds[i] for i in b) for b in self.batchs)

The implementation of __iter__ is confusing for me

    def __iter__(self): yield from (self.collate_fn(self.ds[i] for i in b) for b in self.batchs)

So I replaced it with an imperative version

    # REFACTOR: Imperative style
    def __iter__(self):
        for b in self.batchs: # b = [0, 1, ...]
            ds_batch = [self.ds[i] for i in b] # ds_batch = [(xb0, yb0), (xb1, yb1), ...]
            collated_ds_batch = self.collate_fn(ds_batch) # [[xb0, xb1, ...], [yb0, yb1, ...]]
            yield collated_ds_batch

Furthermore, Jeremy also explained that our datasets can access a list of indices. Which means the default collate_fn can be replaced

    # REFACTOR: direct indices access
    def __iter__(self):
         for b in self.batchs: # b = [0, 1, ...]
            ds_batch = self.ds[b] # ds_batch = [[xb0, xb1, ...], [yb0, yb1, ...]]
            yield ds_batch

Back to one liner if you prefer that

    # REFACTOR: one liner with direct indices access
    def __iter__(self): yield from (self.ds[b] for b in self.batchs)

@dhoa @Najdorf I think Jeremy went through it in the stream

def fit():
    for epoch in range(epochs):
        for xb,yb in train_dl:
            pred = model(xb) # <-- `pred` should be `preds`
            loss = loss_func(pred, yb) # <-- `pred` should be `preds`
            loss.backward()
            opt.step()
            opt.zero_grad()
        report(loss, preds, yb)

This is not reflected in the notebook though. I would be interested in making a PR fix to the repo but not sure if anyone can make PRs.

Yup anyone can make PRs - would be much appreciated! (And at-mention me please if/when you do)

1 Like

Thanks @jeremy! I just opened a PR! :blush:

I got this error when trying nbdev.nbdev_export() :
InterpolationMissingOptionError Traceback (most recent call last)
Cell In[60], line 1
----> 1 import nbdev; nbdev.nbdev_export()

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\fastcore\script.py:110, in call_parse.._f(*args, **kwargs)
107 @wraps(func)
108 def _f(*args, **kwargs):
109 mod = inspect.getmodule(inspect.currentframe().f_back)
→ 110 if not mod: return func(*args, **kwargs)
111 if not SCRIPT_INFO.func and mod.name==“main”: SCRIPT_INFO.func = func.name
112 if len(sys.argv)>1 and sys.argv[1]==‘’: sys.argv.pop(1)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\nbdev\doclinks.py:140, in nbdev_export(path, **kwargs)
138 for f in files: nb_export(f)
139 add_init(get_config().lib_path)
→ 140 _build_modidx()

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\nbdev\doclinks.py:98, in build_modidx(dest, nbs_path, skip_exists)
96 if idxfile.exists(): res = exec_local(idxfile.read_text(), ‘d’)
97 else: res = dict(syms={}, settings={})
—> 98 res[‘settings’] = {k:v for k,v in get_config().d.items()
99 if k in (‘doc_host’,‘doc_baseurl’,‘lib_path’,‘git_url’,‘branch’)}
100 code_root = dest.parent.resolve()
101 for file in globtastic(dest, file_glob=“*.py”, skip_file_re='^
', skip_folder_re=“.ipynb_checkpoints”):

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\nbdev\doclinks.py:98, in (.0)
96 if idxfile.exists(): res = exec_local(idxfile.read_text(), ‘d’)
97 else: res = dict(syms={}, settings={})
—> 98 res[‘settings’] = {k:v for k,v in get_config().d.items()
99 if k in (‘doc_host’,‘doc_baseurl’,‘lib_path’,‘git_url’,‘branch’)}
100 code_root = dest.parent.resolve()
101 for file in globtastic(dest, file_glob=“*.py”, skip_file_re=‘^_’, skip_folder_re=“.ipynb_checkpoints”):

File :861, in iter(self)

File ~\AppData\Local\Programs\Python\Python311\Lib\configparser.py:1274, in SectionProxy.getitem(self, key)
1272 if not self._parser.has_option(self._name, key):
1273 raise KeyError(key)
→ 1274 return self._parser.get(self._name, key)

File ~\AppData\Local\Programs\Python\Python311\Lib\configparser.py:815, in RawConfigParser.get(self, section, option, raw, vars, fallback)
813 return value
814 else:
→ 815 return self._interpolation.before_get(self, section, option, value,
816 d)

File ~\AppData\Local\Programs\Python\Python311\Lib\configparser.py:396, in BasicInterpolation.before_get(self, parser, section, option, value, defaults)
394 def before_get(self, parser, section, option, value, defaults):
395 L = []
→ 396 self._interpolate_some(parser, option, L, value, section, defaults, 1)
397 return ‘’.join(L)

File ~\AppData\Local\Programs\Python\Python311\Lib\configparser.py:435, in BasicInterpolation._interpolate_some(self, parser, option, accum, rest, section, map, depth)
433 v = map[var]
434 except KeyError:
→ 435 raise InterpolationMissingOptionError(
436 option, section, rawval, var) from None
437 if “%” in v:
438 self._interpolate_some(parser, option, accum, v,
439 section, map, depth + 1)

InterpolationMissingOptionError: Bad value substitution: option ‘lib_name’ in section ‘DEFAULT’ contains an interpolation key ‘repo’ which is not a valid option name. Raw value: ‘%(repo)s’

I got stuck for 2 day :frowning: can anyone help me?thanks

1 Like

I find this one indicate my issue : Nbdev discussion - #18 by moon
but I still don’t know how to fix :frowning:

oh, run nbdev_export in terminal works fine but nbdev.nbdev_export() in jupyter_notebook doesn’t

1 Like

When I run the nb 05_datasets.ipynb there is the following line in an early cell

from miniai.training import *

That line causes the error:
ModuleNotFoundError: No module named ‘miniai’

I tried pip install miniai but I don’t think that miniai exists as a module that can be pip installed. Is there a simple solution to this problem that I am missing? Thanks in advance for any pointers.

The solution is to run the nbs from wherever the setup.py file is located.

Then also do not forget to pip install as below
!pip install -e . (there is a dot there for the current dir).

If you ensure those 2 things then you are good to import from miniai

2 Likes

Hi there. I’m trying to load the dataset from fashion-MNIST and found that the testing data is not loading properly. It was able to properly load train.

Generating test split:   0%|          | 0/10000 [00:00<?, ? examples/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/.pyenv/versions/3.9.13/envs/fastai/lib/python3.9/site-packages/datasets/builder.py:1676, in GeneratorBasedBuilder._prepare_split_single(self, gen_kwargs, fpath, file_format, max_shard_size, split_info, check_duplicate_keys, job_id)
   1675 _time = time.time()
-> 1676 for key, record in generator:
   1677     if max_shard_size is not None and writer._num_bytes > max_shard_size:

File ~/.cache/huggingface/modules/datasets_modules/datasets/fashion_mnist/0a671f063342996f19779d38c0ab4abef9c64f757b35af8134b331c294d7ba48/fashion_mnist.py:135, in FashionMnist._generate_examples(self, filepath, split)
    134     _ = f.read(8)
--> 135     images = np.frombuffer(f.read(), dtype=np.uint8).reshape(size, 28, 28)
    137 # Labels

ValueError: cannot reshape array of size 436 into shape (1835802742,28,28)

I tested it using their mnist dataset and it seems to load fine:

Generating train split: 100%|██████████| 60000/60000 [00:05<00:00, 11644.26 examples/s]
Generating test split: 100%|██████████| 10000/10000 [00:00<00:00, 10814.39 examples/s]

I’m having trouble trying to run the notebook from this lesson which i got from the github repo (07_convolutions.ipynb). Jupyter lab is giving me this:

jupyterstb

(I’ve encountered this jupyter error before in part 1 of the course. In that case the fix was just making sure i was using my gpu instead of cpu and making sure that i had sufficient vram to store the model/parameters.)

I’m at the point where we have defined a CNN using torch and we want to train it for a few epochs. here is the line of code where everything goes wrong:

loss,acc = fit(5, simple_cnn.to(def_device), F.cross_entropy, opt, train_dl, valid_dl)

After I execute it, windows shows my cuda utilisation spiking to around 100% for a moment (no discernible increase in memory used or cpu) and then the kernel dies and I get the error that I showed at the top of my post.

This fit function is a training loop that we have defined ourselves in the miniai library. (I just installed the exact code from jeremy’s github repo using pip install -e . . ) Inside that training loop we have this loop:

        for xb,yb in train_dl:
            loss = loss_func(model(xb), yb)
            loss.backward()
            opt.step()
            opt.zero_grad()

And the problem is with:

model(xb)

This call to the torch model is what causes this problem.

I’m running everything through wsl on windows. My cuda device is rtx 3060. I have been running a bunch of different models just fine until I hit this notebook. Here is the output from pip freeze:

aiohttp==3.9.1
aiosignal==1.3.1
anyio==4.1.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
async-timeout==4.0.3
attrs==23.1.0
Babel==2.13.1
beautifulsoup4==4.12.2
bleach==6.1.0
certifi==2023.11.17
cffi==1.16.0
charset-normalizer==3.3.2
cmake==3.27.9
comm==0.2.0
contourpy==1.2.0
cycler==0.12.1
datasets==2.15.0
debugpy==1.8.0
decorator==5.1.1
defusedxml==0.7.1
diffusers==0.24.0
dill==0.3.7
einops==0.7.0
exceptiongroup==1.2.0
executing==2.0.1
fastcore==1.5.29
fastjsonschema==2.19.0
fastprogress==1.0.3
filelock==3.13.1
fonttools==4.46.0
fqdn==1.5.1
frozenlist==1.4.0
fsspec==2023.10.0
huggingface-hub==0.19.4
idna==3.6
importlib-metadata==7.0.0
ipykernel==6.27.1
ipython==8.18.1
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.2
json5==0.9.14
jsonpointer==2.4
jsonschema==4.20.0
jsonschema-specifications==2023.11.2
jupyter-events==0.9.0
jupyter-lsp==2.2.1
jupyter_client==8.6.0
jupyter_core==5.5.0
jupyter_server==2.11.2
jupyter_server_terminals==0.4.4
jupyterlab==4.0.9
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.2
kiwisolver==1.4.5
lit==17.0.6
MarkupSafe==2.1.3
matplotlib==3.8.2
matplotlib-inline==0.1.6
-e git+https://github.com/fastai/course22p2.git@df9323235bc395b5c2f58a3d08b83761947b9b93#egg=miniai
mistune==3.0.2
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
nbclient==0.9.0
nbconvert==7.12.0
nbformat==5.9.2
nest-asyncio==1.5.8
networkx==3.2.1
notebook_shim==0.2.3
numpy==1.26.2
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
overrides==7.4.0
packaging==23.2
pandas==2.1.3
pandocfilters==1.5.0
parso==0.8.3
pexpect==4.9.0
Pillow==10.1.0
platformdirs==4.1.0
prometheus-client==0.19.0
prompt-toolkit==3.0.41
psutil==5.9.6
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==14.0.1
pyarrow-hotfix==0.6
pycparser==2.21
Pygments==2.17.2
pyparsing==3.1.1
python-dateutil==2.8.2
python-json-logger==2.0.7
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.2
referencing==0.31.1
regex==2023.10.3
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.13.2
safetensors==0.4.1
Send2Trash==1.8.2
six==1.16.0
sniffio==1.3.0
soupsieve==2.5
stack-data==0.6.3
sympy==1.12
terminado==0.18.0
timm==0.9.12
tinycss2==1.2.1
tomli==2.0.1
torch==2.0.0
torcheval==0.0.7
torchvision==0.15.2
tornado==6.4
tqdm==4.66.1
traitlets==5.14.0
triton==2.0.0
types-python-dateutil==2.8.19.14
typing_extensions==4.8.0
tzdata==2023.3
uri-template==1.3.0
urllib3==2.1.0
wcwidth==0.2.12
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
xxhash==3.4.1
yarl==1.9.3
zipp==3.17.0

If you don’t know exactly how to help, but you have some tips/experience with hunting down similar issues, please contribute! Thanks!