Developer chat


(Jeremy Howard (Admin)) #402

We agree. Stand by…


#403

Done. image_opener is now an attribute of every image dataset, and you can pass any function that will take a file name and return an Image object.
I did the same for segmentation masks and remove the kwargs that were stored in SegmentationDataset, people can modify the default value by changing the mask_opener attribute.

To change the attributes on all the datasets at once, I strongly suggest using the data block API. Here is a simple use case:

data = (InputList.from_folder(path)
        .label_from_folder()
        .split_by_folder()
        .datasets(ImageClassificationDataset)
        .set_attr(image_opener=my_custom_opener)
        .transform(tfms, size=224)
        .databunch(bs=64))

(Kaspar Lund) #404

hurray that makes my life easier - no more patching after git pull:)


(Jeremy Howard (Admin)) #405

In order to make it more convenient to submit a PR with docs and lib changes together (amongst other things) I’ve now merged the fastai_docs and fastai repos. The downside is that the fastai repo is pretty big now - sorry about that, but I think it’ll be worth it.


(Jeremy Howard (Admin)) #406

Just released 1.0.19.

Biggest changes are:

  1. add an argument resize_method that tells apply_tfms how to resize the image to the desired size (crop, pad, squish or no).
  2. jupyter et al no longer forced dependencies

You should find that most of the things that led to errors in fastai.vision.data before, such as not including size= or not including a crop_pad transform will now work - and will instead squish images to be the requested size. You can also use a rectangular size= if you’re squishing.


#407

Breaking changes in text! To make the API more consistent with the vision side and make it compatible with the data block API, plus cleaning the code that was a bit too messy, I’ve just merged big changes in NLP. Examples and docs are updated to reflate that. There’s also a working lesson notebook in course-v3 (I’d love to know if it doesn’t work for you).

Please note:

  • RNNLearner.classifier is now text_classifier_learner
  • RNN_Learner.language_model is now language_model_learner.
  • You need to manually save/load your TextDataBunch. Doing it automatically was inducing too many subtle bugs.
  • TextDataBunch.from_csv now takes on csv and valid_pct instead of two csv

If you’re using the dataset methods, changes are more numerous so you’d better check the docs. And remember to use the data bock API as it’s been designed to make your life easier :wink:


Beginning of NLP
(Sudarshan) #408

Docs for the classifier show errors in the page.


#409

Thanks for flagging, will correct!


(Malcolm McLean) #410

Hello developers. I ran into a problem with download_images() in fastai/vision/data.py while downloading training images from Google for making a useless but fun classifier. All images are given extension .jpg, even when they are actually .png. This mismatch causes Image Viewer to reject those images, which in turn makes it an extra hassle to clean up the training data.

I am not yet fluent enough in Python to offer a fix, but here is a gist with notebook and url source file that demonstrates the problem. Just put them both in the same folder to test.

https://gist.github.com/PomoML/1836e9f2b9138ecc9fba1586d2118919

Thanks for looking at this issue!


#411

Getting an index out of range error when instantiating data_lm. The value used to index the list of folders should not be len?

data_lm = (TextFileList.from_folder(path)         
           #grap all the text files in path
           .label_from_func(lambda x:0)           
           #label them all wiht 0s (the targets aren't positive vs negative review here)
           .split_by_folder(valid='test')         
           #split by folder between train and validation set
           .datasets(TextDataset, is_fnames=True) 
           #use `TextDataset`, the flag `is_fnames=True` indicates to read the content of the files passed
           .tokenize()
           #tokenize with defaults from fastai
           .numericalize()
           #numericalize with defaults from fastai
           .databunch(TextLMDataBunch))
           #use a TextLMDataBunch
data_lm.save('tmp_lm')

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-30-6e16f2eee3d4> in <module>
      3            .label_from_func(lambda x:0)
      4            #label them all wiht 0s (the targets aren't positive vs negative review here)
----> 5            .split_by_folder(valid='test')
      6            #split by folder between train and validation set
      7            .datasets(TextDataset, is_fnames=True)

/fastai/fastai/data_block.py in split_by_folder(self, train, valid)
    135         """
    136         n = len(self.path.parts)
--> 137         folder_name = [o[0].parent.parts[n] for o in self.items]
    138         valid = [o for o in self.items if o[0].parent.parts[n] == valid]
    139         train = [o for o in self.items if o[0].parent.parts[n] == train]

/fastai/fastai/data_block.py in <listcomp>(.0)
    135         """
    136         n = len(self.path.parts)
--> 137         folder_name = [o[0].parent.parts[n] for o in self.items]
    138         valid = [o for o in self.items if o[0].parent.parts[n] == valid]
    139         train = [o for o in self.items if o[0].parent.parts[n] == train]

IndexError: tuple index out of range


#412

Update: I fixed the dataset on our S3 bucket so it should run smoothly now. Just be sure to remove imdb/ and imdb.tgz from your .fastai/data/ folder to trigger the download of the new dataset.


(Jeremy Howard (Admin)) #413

Just released 1.0.20. Main changes likely to impact folks:

  • DataBunch.dl replaces the various holdout, is_test, and is_train approaches with a single consistent enum. (h/t @zachcaceres)
  • download_url reads the get request with iter_content which is robust to ‘content-length’ errors. (h/t @lesscomfortable)
  • create_cnn should work with models other than resnet now
  • Data blocks API support for fastai.text
  • QRNN seems to actually be working properly again.

(Ramesh Kumar Singh) #414

Hi Fastai Devs, While I was working on one of the classification problems I found that the plot_top_losses is sometimes confusing and I would personally like to see the current prediction probability as well. In many of the local meetups I found it is being discussed also. I have done the changes locally and its working fine also but I would request the devs here to suggest if we can add it to the master repo.
This being my 1st contribution I would request devs here to excuse any protocol breach and guide me on correct path.
Regards,


(Marc Rostock) #416

First real namespace conflict in fastai notebooks?!

I just had a case where the from fastai import * strategy really caused problems.
When running the lesson3-planets notebook (but I guess this would happen in many other notebooks too):
Suddenly data.classes did not exist anymore and gave me errors, and many other things data.xxx that I tried. Turned out after some trial and error, that by rerunning the top cell (with the imports) after adding an additional import is the cause for this:

data = xxxx is the name of the databunch we create using the new (awesome!!) block api.

after running imports again:
data is now fastai.vision.data module. and of course there is no data.classes anymore…

So this is a clear naming/namespace conflict.
Maybe (@sgugger) the variables in the notebooks could be renamed to databunch = xxx so this will not happen to others?

If it helps I can do that and do a PR…


(Marc Rostock) #417

And explicitly a big thank you :pray: to everyone involved in creating the new data blocks API!:+1:
I think it is awesome and will make things much easier, flexible, adaptable and explicit! :clap:


(Ignacio Oguiza) #418

autoFLfinder proposal

Edit: I’m not sure if this is the right forum to post this!

Hi all,

I’ve taken a stab at Jeremy’s idea this morning about autoRLfinder.

The output of LRfinder is a univariate time series, and as such can easily be transformed into an image, in a similar way to what I showed in a ‘Show your work’ post. The outcome of that experiment using transfer learning in a time series classification problem with only 30 train samples was really good, close to state of the art, and with only standard setting.

What I’ve done is the following:

  1. Created a synthetic dataset roughly mimicking lr_finder outputs (time series)
  2. Estimated a ‘recommended lr’ (this is a very, very bad approximation). I’ve just used to show the proposed methodology.
  3. Padded time series data as they don’t all have the same length.
  4. Transformed ts into images (this is based on a transformer that creates a rich representation of a univariate time series - paper), , for example:

  1. Created a regression problem where the target is the log of the lr.
  2. Train an model in the standard way (so far I have not been able to finalize this, as for some reason I keep getting an error. I’m still working on this)

The way I envision this working would be:

  1. Run the lr_finder with an option to recommend a learning rate
  2. Lr_finder would be run, and the model would predict a recommended lr when finished

I’m not too familiar with fastai library yet, and would need some help to get this working.

So if anybody find this interesting, and would like to collaborate, this is what would be required to complete the project:

  • Create a dataset with lr_finder curve values and ground truths.
  • Train the regression model
  • If the result is good, integrate it into the fastai code

I’ve created a gist with the work I’ve done so far.

I’ll be happy to hear:

  1. If this is of interest to you
  2. Feedback, ideas, recommendations on how to get what’s still missing

Thanks!


(Jeremy Howard (Admin)) #419

This is the key step. Getting people to contribute their LR finder results, along with what they found to be the best learning rate, would make it fairly easy (I’d guess) to automate the LR finding…


(Jeremy Howard (Admin)) #420

Released 1.0.22 which fixes learn.predict, and also avoids importing submodules directly in to the namespace.


(Jeremy Howard (Admin)) #421

That was a bug - the submodules weren’t meant to be imported directly. Fixed now. data won’t be clobbered. My fix is really ugly, so if any python experts know how to make our __init__.py files less awful, please let me know :slight_smile:


#422

Hi. Are you accepting PRs from non-core developers? I’ve been looking at the library for the past couple days to find a way to integrate “observation weights” into the codebase. I think the change in code would be very minimal and would be completely confined to fit() function and its dependencies, validate() and loss_batch(). The gist of the PR would be about allowing yb to a be a list where the last item in the list is a tensor representing the observation weights.