What do you mean exactly? The model knows the labels for images in the training set. There cannot be any mislabeled image. Plus, if you train long enough the losses upon training imgs will approach zero.
Hello @balnazzar. You are right about datasets well prepared by universities like ImageNet. In this case, each image is correctly labeled (ie, in a folder with the right label for example). But in real life (when you download images from Google for example), it is not true.
There are plenty of images mislabeled on the Web, or with the same label but meaning different (ex: the label of the mango fruit in Portuguese is “manga”… which is as well the label of Japanese drawings…) and the AI of Google can not help with 100% efficiency in the filtering of Web images.
Then, you must check the labels of your images, both train and val images, not only val ones. That’s why I think that the plot_top_losses() would be a great mislabeled images detection tool for both train and val images, not only for val ones as today.
I didn’t actually think about that, Pierre. Thanks.
If you prefer not to do it yourself, I’ll try and implement your suggestions upon plot_top_losses()
and plot_multi_top_losses()
on my local fork. Should the results be interesting, we could submit a PR.
Great ! Thanks Andrea
Anybody here has a good understanding of python internals? Currently we have an issue with ipython autoreloader - at the very least the Learner
object doesn’t get updated to the address space of the newly reloaded modules.
It all started with getting:
PicklingError: Can't pickle <class 'fastai.basic_train.Recorder'>:
it's not the same object as fastai.basic_train.Recorder
after I had learn.export()
called, right after editing fastai/basic_train.py
and having jupyter autoreload it via the usual:
%reload_ext autoreload
%autoreload 2
I started digging and looking at why pickle was failing. I reduced its main verification function of a much longer code from pickle, with exceptions et al to a simplified version, relevant just to our situation:
def pickle_get_class(obj):
name = obj.__class__.__name__
module_name = getattr(obj, '__module__', None)
obj2 = sys.modules[module_name]
for subpath in name.split('.'): obj2 = getattr(obj2, subpath)
return obj2
#obj = learn.recorder
obj = learn
class1 = obj.__class__
class2 = pickle_get_class(obj)
print(f"class 1: {hex(id(class1))}")
print(f"class 2: {hex(id(class2))}")
print(class1 is class2)
When the notebook is run the first time after the kernel was restarted both print the same address, i.e. pointing to the same version of the class.
class 1: 0x5592541235a8
class 2: 0x5592541235a8
True
Then I’d modify, say, fastai/basic_train.py
and rerun the cell and now the addresses are not the same, and class 1
hasn’t changed.
class 1: 0x5592541235a8
class 2: 0x5592541245b8
False
So ipython reload magic failed to update the objects as it describes in caveats.
Actually, you can ignore pickle_get_class
. If reload were to work correctly hex(id( learn.__class__ ))
should be different after each autoreload (if learner class was reloaded - directly or as a dependency).
You can see from my code that I first started with learn.recorder
as reported by pickle
, but then I noticed learn
had the same issue.
This situation sucks since it’s no longer possible to use autoreload with at least learner objects if their modules are modified and the failure is silent so you could be still working with the old version and wasting hours debugging the wrong thing. Surely, doing a kernel restart will remedy it, but it’ll make debug much slower in some situations that requires pre-running extra steps.
When a new fastai function is developed it’s easy to make a fast running notebook so autoreload is not a necessity in such cases. But when a user is debugging something failing in fastai code deep in their notebook the restart approach doesn’t cut.
So we need to understand and hopefully fix why ipython reload magic fails to reload one or more of fastai class objects.
The reload magic functionality (the actual update of objects) in ipython is here: https://github.com/ipython/ipython/blob/master/IPython/extensions/autoreload.py#L253 From skimming through the code I don’t see anything that would suggest that it silently ignores any parts of the reload.
I think I forgot to mention I developed a little tool to compare conda envs. Get it at https://github.com/stas00/conda-tools.
I needed it to compare one working env vs. one failing to see what packages were different, here is an example of its output. You just pass the names of 2 environments you want to compare:
$ conda-env-compare.pl work27 work36
Comparing installed packages in environments: work27 and work36
********************************************** Match: Differ **********************************************
environment work27 work36 work27 work36
package name version version source source
------------------------------------------------------------------------------------------------------------
ipykernel 4.10.0 5.1.0 py27_0/anaconda py36h39e3cac_0/anaconda
python 2.7.15 3.6.8 h9bab390_6/anaconda h0371630_0/anaconda
********************************************** Match: Missing **********************************************
environment work27 work36 work27 work36
package name version version source source
------------------------------------------------------------------------------------------------------------
backports 1.0 py27_1/anaconda
backports-abc 0.5 py27_0/anaconda
backports.shutil-get-terminal-size 1.0.0 py27_2/anaconda
configparser 3.5.0 py27_0/anaconda
enum34 1.1.6 py27_1/anaconda
functools32 3.2.3.2 py27_1/anaconda
futures 3.2.0 py27_0/anaconda
get-terminal-size 1.0.0 haa9412d_0/anaconda
ipaddress 1.0.22 py27_0/anaconda
pathlib2 2.3.3 py27_0/anaconda
scandir 1.9.0 py27h14c3975_0/anaconda
singledispatch 3.4.0.3 py27_0/anaconda
xz 5.2.4 h14c3975_4/anaconda
*********************************************** Match: Same ***********************************************
environment work27 work36 work27 work36
package name version version source source
------------------------------------------------------------------------------------------------------------
bleach 3.1.0 3.1.0 py27_0/anaconda py36_0/anaconda
ca-certificates 2018.3.7 2018.3.7 0/anaconda 0/anaconda
certifi 2018.11.29 2018.11.29 py27_0/anaconda py36_0/anaconda
decorator 4.3.0 4.3.0 py27_0/anaconda py36_0/anaconda
entrypoints 0.2.3 0.2.3 py27_2/anaconda py36_2/anaconda
gmp 6.1.2 6.1.2 h6c8ec71_1/anaconda h6c8ec71_1/anaconda
ipython 5.1.0 5.1.0 py27_0/anaconda py36_0/anaconda
ipython-genutils 0.2.0 0.2.0 py27_0/anaconda py36_0/anaconda
jinja2 2.10 2.10 py27_0/anaconda py36_0/anaconda
jsonschema 2.6.0 2.6.0 py27_0/anaconda py36_0/anaconda
jupyter-client 5.2.4 5.2.4 py27_0/anaconda py36_0/anaconda
jupyter-core 4.4.0 4.4.0 py27_0/anaconda py36_0/anaconda
libedit 3.1.20170329 3.1.20170329 h6b74fdf_2/anaconda h6b74fdf_2/anaconda
libffi 3.2.1 3.2.1 hd88cf55_4/anaconda hd88cf55_4/anaconda
libgcc-ng 8.2.0 8.2.0 hdf63c60_1/anaconda hdf63c60_1/anaconda
libsodium 1.0.16 1.0.16 h1bed415_0/anaconda h1bed415_0/anaconda
libstdcxx-ng 8.2.0 8.2.0 hdf63c60_1/anaconda hdf63c60_1/anaconda
markupsafe 1.1.0 1.1.0 py27h7b6447c_0/anaconda py36h7b6447c_0/anaconda
mistune 0.8.4 0.8.4 py27h7b6447c_0/anaconda py36h7b6447c_0/anaconda
nbconvert 5.3.1 5.3.1 py27_0/anaconda py36_0/anaconda
nbformat 4.4.0 4.4.0 py27_0/anaconda py36_0/anaconda
ncurses 6.1 6.1 he6710b0_1/anaconda he6710b0_1/anaconda
notebook 5.7.4 5.7.4 py27_0/anaconda py36_0/anaconda
openssl 1.1.1a 1.1.1a h7b6447c_0/anaconda h7b6447c_0/anaconda
pandoc 2.2.3.2 2.2.3.2 0/anaconda 0/anaconda
pandocfilters 1.4.2 1.4.2 py27_1/anaconda py36_1/anaconda
pexpect 4.6.0 4.6.0 py27_0/anaconda py36_0/anaconda
pickleshare 0.7.5 0.7.5 py27_0/anaconda py36_0/anaconda
pip 18.1 18.1 py27_0/anaconda py36_0/anaconda
prometheus-client 0.5.0 0.5.0 py27_0/anaconda py36_0/anaconda
prompt-toolkit 1.0.15 1.0.15 py27_0/anaconda py36_0/anaconda
ptyprocess 0.6.0 0.6.0 py27_0/anaconda py36_0/anaconda
pygments 2.3.1 2.3.1 py27_0/anaconda py36_0/anaconda
python-dateutil 2.7.5 2.7.5 py27_0/anaconda py36_0/anaconda
pyzmq 17.1.2 17.1.2 py27h14c3975_0/anaconda py36h14c3975_0/anaconda
readline 7.0 7.0 h7b6447c_5/anaconda h7b6447c_5/anaconda
send2trash 1.5.0 1.5.0 py27_0/anaconda py36_0/anaconda
setuptools 40.6.3 40.6.3 py27_0/anaconda py36_0/anaconda
simplegeneric 0.8.1 0.8.1 py27_2/anaconda py36_2/anaconda
six 1.12.0 1.12.0 py27_0/anaconda py36_0/anaconda
sqlite 3.26.0 3.26.0 h7b6447c_0/anaconda h7b6447c_0/anaconda
terminado 0.8.1 0.8.1 py27_1/anaconda py36_1/anaconda
testpath 0.4.2 0.4.2 py27_0/anaconda py36_0/anaconda
tk 8.6.8 8.6.8 hbc83047_0/anaconda hbc83047_0/anaconda
tornado 5.1.1 5.1.1 py27h7b6447c_0/anaconda py36h7b6447c_0/anaconda
traitlets 4.3.2 4.3.2 py27_0/anaconda py36_0/anaconda
wcwidth 0.1.7 0.1.7 py27_0/anaconda py36_0/anaconda
webencodings 0.5.1 0.5.1 py27_1/anaconda py36_1/anaconda
wheel 0.32.3 0.32.3 py27_0/anaconda py36_0/anaconda
zeromq 4.2.5 4.2.5 hf484d3e_1/anaconda hf484d3e_1/anaconda
zlib 1.2.11 1.2.11 h7b6447c_3/anaconda h7b6447c_3/anaconda
you’re such a fantastic toolsmidth - fearless and thorough - we are luckly you are here
Since there was a lot of confusions, in DataBunch
I’ve renamed the tfms
argument to dl_tfms
(people often used it for ds_tfms
in computer vision).
I have built a small version of Beam Search that seems promising. In the process, I looked carefully at the LanguageLearner.predict()
method. I am not sure if this is a bug or I am misunderstanding how it works.
When you call predict()
, you begin with an initial self.model.reset()
that sets the hidden states to zero. Then you pass through the sample text
and continue to append a new token each time to your list of generated tokens. However, your text
is now the full set of tokens you have generated from the start, but you have not reset the state, so you are predicting from the end of the last prediction state.
What am I missing here?
I think we should add the parameter cut
to unet_learner
to be able to use custom model.
Current signature of the function is unet_learner(
data:DataBunch,
arch:Callable,
pretrained:bool=
True,
blur_final:bool=
True,
norm_type:Optional[NormType]=
’NormType’,
split_on:Union[Callable, Collection[ModuleList], NoneType]=
None,
blur:bool=
False,
self_attention:bool=
False,
y_range:OptRange=
None,
last_cross:bool=
True,
bottle:bool=
False,
kwargs:Any)
I propose unet_learner(data:DataBunch, arch:Callable, pretrained:bool=True, blur_final:bool=True, norm_type:Optional[NormType]=NormType, split_on:Optional[SplitFuncOrIdxList]=None, blur:bool=False, self_attention:bool=False, y_range:Optional[Tuple[float,float]]=None, last_cross:bool=False, bottle:bool=False,cut:Union[int,Callable]=None, **kwargs:Any)->None:
and pass the parameter to create_body
as in create_cnn
I found this setup to debug PyTorch memory leaks on the Pyro forums: https://forum.pyro.ai/t/a-clever-trick-to-debug-tensor-memory/556
Maybe this is interesting for the library development.
That’s a nice version, @MicPie! Except it’s incomplete, it should be merged with this version: https://discuss.pytorch.org/t/how-to-debug-causes-of-gpu-memory-leaks/6741/24
We should put it somewhere in the docs for sure.
If you find other goodies please share!
Hi @sgugger cutout
for Data Augmentation was implemented in previous fastai (before v1) but not in v1. Do you plan to add it in vision.transform or this is not a relevant technique and will not be implemented ? Thanks.
We just forgot. Will implement it when I have a bit of time next week, send me a PM if I forget!
Thanks Sylvain !
@sgugger @pierreguillou I was procrastinating (instead of training LMs) and implemented it: https://github.com/fastai/fastai/pull/1489
Figured Sylvain’s busy with text stuff and I’ll help out a bit.
@pierreguillou, you can test if this works for you by using my fork, or wait till it’s merged (or till Sylvain implements it himself if my code sucks). Ping me if you’ll decide to try it out now and if you’ll have any question about it!
Thanks @xnutsive and @sgugger (Just the letter t
is missing at the end of the following phrase in fastai/docs_src/vision.transform.ipynb: “The normalization technique described in this paper: Improved Regularization of Convolutional Neural Networks with Cutou”).
Hello, this message concerns 2 issues with show_batch().
Note: I link it to my previous messages about plot_top_losses() as they have in common one issue about the DatasetType used by these 2 functions.
1) DatasetType: only train ?
The function show_batch()
asks as argument ds_type
(ie, the DatasetType) and has DatasetType.Train
as default. Right ? But in its code (see below), self.train_ds
is hard coded. Does it mean we can’t use show_batch()
to display a validation batch (we would need self.valid_ds
in this case, not ?)?
def show_batch(self, rows:int=5, ds_type:DatasetType=DatasetType.Train, **kwargs)->None:
"Show a batch of data in `ds_type` on a few `rows`."
x,y = self.one_batch(ds_type, True, True)
if self.train_ds.x._square_show: rows = rows ** 2
xs = [self.train_ds.x.reconstruct(grab_idx(x, i)) for i in range(rows)]
#TODO: get rid of has_arg if possible
if has_arg(self.train_ds.y.reconstruct, 'x'):
ys = [self.train_ds.y.reconstruct(grab_idx(y, i), x=x) for i,x in enumerate(xs)]
else : ys = [self.train_ds.y.reconstruct(grab_idx(y, i)) for i in range(rows)]
self.train_ds.x.show_xys(xs, ys, **kwargs)
2) When batch size is one (bs=1), show_batch() does not work.
When batch size is one (bs=1), data.show_batch()
(data is a ImageDataBunch) gives the following error, which is normal as the function tries to display by default 5x5=25 images from a train batch:
IndexError: index 1 is out of bounds for dimension 0 with size 1
However, data.show_batch(rows=1)
that should display 1 image gives as well an error:
TypeError: 'AxesSubplot' object is not iterable
And, even if the batch size is > 1, data.show_batch(rows=1)
gives the same error.
Then, the minimum to make show_batch()
worked is bs=4
and data.show_batch(rows=2)
.
How to solve this issue and make show_batch()
worked even for bs=1
?
Thanks.
Hm, I can look into that tomorrow. I saw that show_batch doesn’t work for small batches (if you do show_batch(1) it’ll work, it just tries to show rows*cols elements, and a batch of 1 doesn’t have enough elements, default rows is 5.
That’s an easy fix, and I could look into valid_ds issue too.