Ok thank you. @wdhorton I am available if you need help.
Just finished:
-
LanguageModelLoader
(used behind the scenes byTextLMDataBunch
) has now been replaced byLanguageModelPreLoader
which isn’t aDataLoader
but an intermediate between the dataset and a pytorchDataLoader
. It’s aDataset
and aCallback
at the same time, and is responsible for reading a portion of the stream created by all the texts concatenated. - Which means we can have pre-loader now that are
Callback
. The only events we can call areon_epoch_begin
oron_epoch_end
since the multiprocessing in pytorchDataLoader
(with num_workers>=1) makes a copy of the underlying dataset that is only synchronized at the end of the iteration.
this is really nice and memory usage is down. THX
Here is a small suggestion for def getitem__(self, k:int): inserting the blow line just before the comment “#Returning the right portion”. will allow users to provide the token id’s in a format that match the vocab . FX: np.uint16 for a vocab of size 64k.
if concat.dtype != np.int64: concat = concat.astype(np.int64)
Will add.
Also note I removed the varying bptt because it doesn’t add anything now that we shuffle the texts at each batch (tested on witkitext-2).
agree i could not measure any difference using p_bppt nor my own uniform distribution
I believe there is a ±1 offset issue between batches in the new version.
I had so many problems making my own indexing of the jagged array work, that i created a test to generate jagged arrays with continous numbers but random layout. I feel confident when i can handle 10000 different layouts
Here is a result with the ± issue from running on the newest version of LanguageModelPreLoader i fastai dev: https://github.com/kasparlund/nlp/blob/master/test_languagemodelloader.ipynb
when i tried again it failed because start became greater than end in getitem.
I can create an issue if you agree that there is an issue
get_transforms() with default settings threw the error below for histology images
"RuntimeError: B should have at least 2 dimensions, but has 1 dimensions instead"
I’m using images from this Kaggle competition:
First I set all the arguments for get_transforms() to zero to get a baseline
tfms = get_transforms(do_flip=False,
flip_vert=False,
max_rotate=0.,
max_zoom=0.,
max_lighting=0.,
max_warp=0.,
p_affine=0.,
p_lighting=0.)
Everything, I did below worked:
data = (ImageItemList.from_df(df=df, path=path, cols='fpaths')
.random_split_by_pct(valid_pct=0.2, seed=10)
.label_from_df(cols='class_label')
.transform(tfms, size=49)
.databunch(bs=128))
data.show_batch(rows=3, figsize=(7,7), hide_axis=False)
learn = create_cnn(data, models.resnet34, metrics=[error_rate, accuracy])
learn.fit_one_cycle(6)
Then I tried using the default options for get_transforms() and got the error:
tfms = get_transforms()
data = (ImageItemList.from_df(df=df, path=path, cols='fpaths')
.random_split_by_pct(valid_pct=0.2, seed=10)
.label_from_df(cols='class_label')
.transform(tfms, size=49)
.databunch(bs=128))
data.show_batch(rows=3, figsize=(7,7), hide_axis=False)
"RuntimeError: B should have at least 2 dimensions, but has 1 dimensions instead"
Finally, I narrowed the cause of the problem down to max_warp() by manually entering all of the defaults and changing each to zero, one at a time:
tfms = get_transforms(do_flip=True,
flip_vert=False,
max_rotate=10.,
max_zoom=1.1,
max_lighting=0.2,
max_warp=0.,
p_affine=0.75,
p_lighting=0.75)
Thought I’d share this in case anyone else ran into the same issue.
It is impressive that fastai knows that these images should not have max_warp applied! Is this a bug?
I wonder if this same error will be thrown when I want to look at 3D images of cells and organelles? … TBD
The full error text was:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-25-0a63e3fd5550> in <module>
----> 1 data.show_batch(rows=3, figsize=(7,7), hide_axis=False)
~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/basic_data.py in show_batch(self, rows, ds_type, **kwargs)
151 def show_batch(self, rows:int=5, ds_type:DatasetType=DatasetType.Train, **kwargs)->None:
152 "Show a batch of data in `ds_type` on a few `rows`."
--> 153 x,y = self.one_batch(ds_type, True, True)
154 if self.train_ds.x._square_show: rows = rows ** 2
155 xs = [self.train_ds.x.reconstruct(grab_idx(x, i, self._batch_first)) for i in range(rows)]
~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/basic_data.py in one_batch(self, ds_type, detach, denorm)
134 w = self.num_workers
135 self.num_workers = 0
--> 136 try: x,y = next(iter(dl))
137 finally: self.num_workers = w
138 if detach: x,y = to_detach(x),to_detach(y)
~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/basic_data.py in __iter__(self)
68 def __iter__(self):
69 "Process and returns items from `DataLoader`."
---> 70 for b in self.dl:
71 y = b[1][0] if is_listy(b[1]) else b[1]
72 yield self.proc_batch(b)
~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
466 self.reorder_dict[idx] = batch
467 continue
--> 468 return self._process_next_batch(batch)
469
470 next = __next__ # Python 2 compatibility
~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
487 self._put_indices()
488 if isinstance(batch, _utils.ExceptionWrapper):
--> 489 raise batch.exc_type(batch.exc_msg)
490 return batch
491
RuntimeError: Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py", line 486, in __getitem__
x = x.apply_tfms(self.tfms, **self.tfmargs)
File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/image.py", line 113, in apply_tfms
else: x = tfm(x)
File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/image.py", line 498, in __call__
return self.tfm(x, *args, **{**self.resolved, **kwargs}) if self.do_run else x
File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/image.py", line 445, in __call__
if args: return self.calc(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/image.py", line 450, in calc
if self._wrap: return getattr(x, self._wrap)(self.func, *args, **kwargs)
File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/image.py", line 167, in coord
self.flow = func(self.flow, *args, **kwargs)
File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/transform.py", line 227, in symmetric_warp
return _perspective_warp(c, targ_pts, invert)
File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/transform.py", line 213, in _perspective_warp
return _apply_perspective(c, _find_coeffs(_orig_pts, targ_pts))
File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/transform.py", line 194, in _find_coeffs
return torch.gesv(B,A)[0][:,0]
RuntimeError: B should have at least 2 dimensions, but has 1 dimensions instead
It’s a bug with the new version of pytorch, it has been fixed in master I believe.
split from Fastai v1 install issues thread
Trouble with tests/test_vision_data.py
Second problem, is that when I run the tests in the latest pull ($make test
) I get an error related to pulling some data in using mnist = untar_data(URLs.COCO_TINY)
Looking at the ~/.fastai/data/
directory, it seems that sometimes the HTTP call fails to pull in a .tgz and that leads to the untar failing. If I delete the empty .tgz, then I (sometimes) get success the second time around. I believe this is due to a socket.timeout
My solution is to remove all the files in that directory rm -r ~/.fastai/data/*
and then try again. After a few tries, it all seems to come in and I can run all the tests. I don’t know why the timeout is an issue and an intermittent one at that.
Show install information:
=== Software ===
python : 3.7.1
fastai : 1.0.40.dev0
fastprogress : 0.1.18
torch : 1.0.0
nvidia driver : 396.51
torch cuda : 9.0.176 / is available
torch cudnn : 7401 / is enabled
=== Hardware ===
nvidia gpus : 2
torch devices : 2
- gpu0 : 12194MB | TITAN Xp
- gpu1 : 12196MB | TITAN Xp
=== Environment ===
platform : Linux-4.15.0-32-generic-x86_64-with-debian-stretch-sid
distro : Ubuntu 16.04 Xenial Xerus
conda env : fai_v1_dev
python : /home/farzin/anaconda3/envs/fai_v1_dev/bin/python
sys.path :
/home/farzin/anaconda3/envs/fai_v1_dev/lib/python37.zip
/home/farzin/anaconda3/envs/fai_v1_dev/lib/python3.7
/home/farzin/anaconda3/envs/fai_v1_dev/lib/python3.7/lib-dynload
/home/farzin/anaconda3/envs/fai_v1_dev/lib/python3.7/site-packages
/home/farzin/fast_ai/fastai-fork
Fri Jan 11 14:32:56 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.51 Driver Version: 396.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:03:00.0 Off | N/A |
| 30% 47C P8 21W / 250W | 12MiB / 12194MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:04:00.0 On | N/A |
| 23% 36C P8 17W / 250W | 979MiB / 12196MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 1433 G /usr/lib/xorg/Xorg 661MiB |
| 1 2423 G compiz 302MiB |
| 1 22300 G /usr/lib/firefox/firefox 3MiB |
+-----------------------------------------------------------------------------+
I am willing to look deeper into this and try to fix/debug if anyone has ideas about where to look. I don’t quite know where to even start on this particular one.
Great, here is what needs to be done, if you can resolve those via PR that would be great. thank you.
- the error is not user-friendly error
…/…/anaconda3/envs/fai_v1_dev/lib/python3.7/tarfile.py:2304: ReadError
it needs to catch this error and report that the file failed to download fully. Do note that the git version now stores a simple checksum (not md5 as it’s too slow) for each dataset. So use that checksum to validate that file was fully downloaded.
but I think if solution (2) is implemented (1) will no longer be needed as it’ll include error handling in it. But the checksum part will still have to be a part of the solution.
- do a retry 5 times on failure and if that still failed, have the function raise an exception that prints the instructions on how to fix this manually, i.e.:
mkdir -p ~/.fastai/datasets
cd ~/.fastai/datasets
wget -c path/file.tgz
tar -xvzf file.tgz
and rerun the code/test which should now work.
So that the user could find another way to get the dataset.
wget -c will download as much as it can in a single connection and then continue retrying until it gets the file downloaded fully, not sure about curl equivalent, but it has one.
I will give it a shot and come back with questions.
I have tried two times to submit a PR with updated docs. The only file I have touched in both PRs is fastai/docs_src/core.ipynb
Both times, my version has errors which I initially was able to replicate, but can no longer get on my local system. I went through a lot of steps to “clean up” my conda install, then to get the pip dev install working correctly in a clean environment. That seems to have addressed the issues on my side.
When I pull from master
and run the make test
and the docs_src/run_tests.sh
I get no errors (well, I do get these download errors, but I am working on that…)
How can I proceed to debug if I can’t get the errors locally? I still feel like a noob in all of this space, so any feedback or help greatly appreciated!
See my comments in https://github.com/fastai/fastai/pull/1457, specifically your commit. i.e. you’re getting those numpy warnings locally.
I realized now, why you’re not seeing this issue. The problem is that the first code cells of the notebook are hidden, that’s why you can’t see them. Use the Hide input ext to unhide them https://docs.fast.ai/gen_doc.gen_notebooks.html#Installation. It’s a bit tricky to do since you need to click on the invisible cell and then the unhide icon in the menu, once you enabled that extension. You can also use Hide All which turns all cells’ visibility on or off, but then it’d be hard to go back and re-hide the ones that need to be hidden.
Once you unhide them you will see:
from fastai.gen_doc.nbdoc import *
from fastai.core import *
and that’s where I suggested to try to look and then reduce to the smallest case that triggers the issue.
The reason I noticed those is because I looked at your commit in a raw json form. and it’s the first chunk of it that has the warnings.
@stas Thank you for all your help and patience here! The bug was between the keyboard and the chair!
I am a bit embarassed, but here is what happened. I had updated that core.ipynb
notebook and wanted to be sure, that in debugging, I did not lose my changes. I copied out into another folder. Then went through all the install/removal of installs until I got around to the problem with the the Bottleneck
version that manifested as the numpy
bug. Once that was resolved, I copied the file back and did not re-run the notebook. So, while the problem was gone locally, and I passed all the tests and saw no bugs on imports, I still had the old cell value from the prior run.
Anyway, resolved now. And I also removed the help()
calls in the documentation per @sgugger’s recommendation.
Sorry for all the trouble. Thanks again for all the help finding my problem.
all is good, @bfarzin. Glad you sorted it out.
Another culprit with docs_src nbs is to remember to save the notebook after it finished its run and before it gets committed.
Since I tend to forget to do that In my own projects, I always have this cell at the end which automatically saves the notebook upon its completion. And of course it’ll not be run if you go back and re-run just a few cells and not the whole notebook. So I always do a full re-run of the notebook when I’m ready to commit.
%%javascript # prevent committing an unsaved notebook
_=IPython.notebook.save_notebook()
I have a question about the checksum test. It can be easily included as an assert:
assert _check_file(fname) == _checks[url]
but that is not very verbose if it fails. Should I catch the checksum difference and then flag with a more verbose message to “delete the current file and re-download”? Any thoughts appreciated.
Second question, should I add a test for this in the ./tests/test_vision_data.py
file? I am not quite sure how to mock up a connection send failure, but I can google around and try to set that up correctly if that is important to adding any new code.
Use, the second part of assert:
assert _check_file(fname) == _checks[url], "nice user-friendly message goes here"
Second question, should I add a test for this in the
./tests/test_vision_data.py
file? I am not quite sure how to mock up a connection send failure, but I can google around and try to set that up correctly if that is important to adding any new code.
To do the testing while you develop this I’d suggest to override files.fast.ai in your /etc/hosts with a localhost and fake it via a local webserver and deliver a truncated file. Of course, this is just one way. Perhaps there is a way to do that quickly on python level w/o changing the server configuration.
I am not sure how to mock this easily in the test suite, so unless you feel the inclination to figure it out, feel free to leave this w/o a new test.
I managed to test locally (and document here how I did it.) I am not well versed in setting up tests, adding a requirement, etc. Any feedback appreciated on the matter. I am going to open a PR with the checksum and retry as we discussed above (and that is the code I tested below)
First create a local file that is truncated.
head -n2800 ~/.fastai/data/coco_tiny.tgz > ~/.fastai/mock_data/coco_tiny_truncated.tgz
Then pip install responses
and setup a mocked response
import pytest
import responses
from fastai.vision import *
@responses.activate
def test_trunc_download():
with open('/home/farzin/.fastai/mock_data/coco_tiny_truncate.tgz','rb') as cc_trunc:
file_io = cc_trunc.read()
mock_headers = {'Content-Type':'text/plain','Content-Length':'168168549'}
responses.add(responses.GET, 'http://files.fast.ai/data/examples/coco_tiny.tgz',
body=file_io,status=200,headers=mock_headers)
coco = untar_data(URLs.COCO_TINY)
with an empty ~/.fastai/data
run the pytest test_request.py
and you get the output expected:
> assert _check_file(fname) == _checks[url], f"Downloaded file {fname} does not match checksum expected! Remove the file from ~/.fastai/data and try your code again."
E AssertionError: Downloaded file /home/farzin/.fastai/data/coco_tiny.tgz does not match checksum expected! Remove the file from ~/.fastai/data and try your code again.
fastai/datasets.py:158: AssertionError
-------------------------------------- Captured stdout call --------------------------------------
Downloading http://files.fast.ai/data/examples/coco_tiny
==================================== 1 failed in 2.78 seconds ====================================
Looking good, @bfarzin
requirements:
- add
responses
todev_requirements
insetup.py
test:
- in the test use a string with a few bytes in it instead of a file, no need for a truncated file if mock uses a string.
- finally, try/except the assertion from
untar_data
and assert that the message is as you expect it to be
@sgugger i have submitted a PR to languagemodelpreloader that 1) fixes the current issue with the concatenation running out of arrays and include the optimisations i have made 2) includes a testcase the verify that the batches are continuous going reading forward and backwards from the jagged token arrays.
There is one usecase that i do not understand so i just copied it from the current version of fastai:
def __getitem__(self, k:int): if self.item is not None: return self.dataset[0]