Update: I fixed the dataset on our S3 bucket so it should run smoothly now. Just be sure to remove imdb/ and imdb.tgz from your .fastai/data/ folder to trigger the download of the new dataset.
Just released 1.0.20. Main changes likely to impact folks:
DataBunch.dlreplaces the various
is_trainapproaches with a single consistent enum. (h/t @zachcaceres)
download_urlreads the get request with
iter_contentwhich is robust to ‘content-length’ errors. (h/t @lesscomfortable)
create_cnnshould work with models other than resnet now
- Data blocks API support for fastai.text
- QRNN seems to actually be working properly again.
Hi Fastai Devs, While I was working on one of the classification problems I found that the plot_top_losses is sometimes confusing and I would personally like to see the current prediction probability as well. In many of the local meetups I found it is being discussed also. I have done the changes locally and its working fine also but I would request the devs here to suggest if we can add it to the master repo.
This being my 1st contribution I would request devs here to excuse any protocol breach and guide me on correct path.
First real namespace conflict in fastai notebooks?!
I just had a case where the
from fastai import * strategy really caused problems.
When running the lesson3-planets notebook (but I guess this would happen in many other notebooks too):
Suddenly data.classes did not exist anymore and gave me errors, and many other things data.xxx that I tried. Turned out after some trial and error, that by rerunning the top cell (with the imports) after adding an additional import is the cause for this:
data = xxxx is the name of the databunch we create using the new (awesome!!) block api.
after running imports again:
data is now
fastai.vision.data module. and of course there is no data.classes anymore…
So this is a clear naming/namespace conflict.
Maybe (@sgugger) the variables in the notebooks could be renamed to databunch = xxx so this will not happen to others?
If it helps I can do that and do a PR…
And explicitly a big thank you to everyone involved in creating the new data blocks API!
I think it is awesome and will make things much easier, flexible, adaptable and explicit!
Released 1.0.22 which fixes learn.predict, and also avoids importing submodules directly in to the namespace.
That was a bug - the submodules weren’t meant to be imported directly. Fixed now.
data won’t be clobbered. My fix is really ugly, so if any python experts know how to make our
__init__.py files less awful, please let me know
Hi. Are you accepting PRs from non-core developers? I’ve been looking at the library for the past couple days to find a way to integrate “observation weights” into the codebase. I think the change in code would be very minimal and would be completely confined to
fit() function and its dependencies,
loss_batch(). The gist of the PR would be about allowing
yb to a be a list where the last item in the list is a tensor representing the observation weights.
Forgot to reply to your other post. There’s no tweak needed, a target can already be a list of tensors. You have to properly address it with your loss function, that’s all.
For a new features, it’s best to prepare a notebook showing how it works so we can help refactor your code, but otherwise we’re happy to accept PRs from anyone!
You’re right! So long as the Dataset returns a list, the
loss_batch() function will work just fine. However, I would like to propose one line of code to change the
I’ve created a notebook for this and bundled it alongside a PR.
I was reading through the fastai code and I came across the Stepper class
My question is why are we using a class when all we need to do is iterations can’t we use generators in here and it will be beneficial also less code, less use of memory, lazy execution.
wrote a generator which does the same thing
linear_anneal = lambda start,end,pct : start+(end-start)*pct def stepper(start,end,n_iter): n=1 while n<=n_iter: yield linear(start,end,n/n_iter) n+=1 step = stepper(1,15,100) #intialize it like this step.__next__() #use it like this
moved the post to a separated thread as @stas suggested
IPyExperiments: Getting the most out of your GPU RAM in jupyter notebook
Thanks for taking the lead on starting a focused thread based on my earlier posts, @piotr.czapla. I felt that your title was much broader than the very specific-intention my posts had - avoid restarting the kernel all the time. So I renamed it to a more specific: Getting the most out of your GPU RAM in jupyter notebook.
But please don’t let it prevent you from starting a much more important topic on stability and performance of fastai v1.
Just merged: huge refactor of the data block API. If you were only using the databunch factory methods, this shouldn’t impact you.
If you were using the data block API note that the calls to
tokenize don’t exist anymore and that you now have to split your data before labeling it.
If you were using the internal datasets of fastai… learn how to use the datablock API very quickly because those don’t exist anymore.
The basic idea is that to allow more flexibility, there is no dataset anymore: you explain what are your xs and your ys with the datablock API and that’s it. That way regression (or single classification or multi classification) for computer vision has the same underlying class than for text or tabular.
Update to the docs will follow shortly. Lessons should run smoothly.
Name 'ImageFileList' is not defined in fastai version 1.0.24
Does this mean we now have a way to solve all types of ML problems (classification, multi-classification, regression) for all types of data (vision, text, tabular)?
I’m observing that the suggested use of partial functions for metrics leads to misleading results, e.g. in lesson3-planet nb:
acc_02 = partial(accuracy_thresh, thresh=0.2) f_score = partial(fbeta, thresh=0.2) learn = create_cnn(data, arch, metrics=[acc_02, f_score])
epoch train_loss valid_loss accuracy_thresh fbeta
the metrics column names are misleading, because these are not the metrics functions that were used (the defaults are different).
There must be a better way to have the used metrics match the names displayed in the header of the results.
The relevant code is:
def on_train_begin(self, epochs:int, pbar:PBar, metrics:MetricFuncList)->None: "About to start learning." self.state_dict = _get_init_state() self.state_dict['n_epochs'],self.state_dict['pbar'],self.state_dict['metrics'] = epochs,pbar,metrics names = [(met.name if hasattr(met, 'name') else camel2snake(met.__class__.__name__)) for met in self.metrics] self('train_begin', metrics_names=names)
I see we already have
AverageMetric class, so this could be now fixed with a hack:
acc_02 = AverageMetric(partial(accuracy_thresh, thresh=0.2)) acc_02.name = "acc_02" learn = create_cnn(data, arch, metrics=[acc_02])
now, the metric header is displayed correctly.
epoch train_loss valid_loss acc_02
But perhaps we can add a new wrapper class?
acc_02 = MakeMetric(partial(accuracy_thresh, thresh=0.2), "acc_02") learn = create_cnn(data, arch, metrics=[acc_02])
I also researched
partial() and it’s possible to write a wrapper around
partial to inject a name, say under
partial_func.__name__ but it won’t be the same as normal functions which also have
__class__.__name__ set and this can’t be set in the partial function. So probably, this is not a good approach.