Nasnet.py changes

digitalspecialists · February 12, 2018, 12:10am

Hi all, wonderful resources here, thanks! Just 2 weeks into DL, and python, and 3.5 lessons into fastai. It’s been a fun learning experience.

For NASNet and Dog Breed the significant change I made was changing dropout from 1/2 to 2/3, I think that made the difference, managing < 0.152 loss on kaggle and 0.959 accuracy without even unfreezing layers. 0.142 when ensembled. Happy to stop there. Now onto Plant Seedlings.

amritv · February 12, 2018, 3:22am

That’s awesome, can you give me a guide on how long it took to train? That was another issue for me, the training was sloooooow!

YJP · February 12, 2018, 4:42am

Hallo,

Thank you for your helpful post. I tried with the dropout rate you suggested.

I realised the main difference between my notebook and Jeremy’s nasnet.ipynb was an allocation of a validation set. It looks like nasnet.ipynb did not allocate any validation set and the default is

val_idxs=None

So I tried nasnet on dog breed images with no validation set (I think the default is one when there is no specific val_idxs) and the accuracy went up to 95.6% with a training log loss around 0.16.

digitalspecialists · February 12, 2018, 7:19am

I’m very new to this so forgive me if these are naive speculations rather than true insight.

My guess is a higher dropout might work because from my reading NASNet looks like a very wide architecture at each layer.

Also, I wouldn’t say one should drop the val_idx’s to zero at the outset. I don’t. My guess is that the reason this works for you at the outset is that NASNet already works well ‘out the box’. It doesn’t need to attenuate against a validation set. So including the val_set at the start, which means not having one, doesn’t have a detrimental effect. This isn’t surprising because the dog images are very very similar to imagenet. EDIT: Looks like if you don’t include a validation set from a csv load then fastai just calculates one for you.

For this reason, I don’t think the model needs long to train at all. It’s about getting those hyper-parameters right. When we do run an action, it doesn’t improve much from training much at all. NASNet is slow, but we don’t need to get it to do much. I did run 45 min training after precompute = false, but it didn’t do very much and 10 min training might have done just as well.

Final guess is that I think the Jeremy’s nasnet notebook doesn’t contain any validation set because its meant as a demonstrator of a new architecture not to solve a problem.

One thing I wasn’t able to get to work correctly is unfreezing. unfreeze() wasn’t successful, nor the freeze_to() already in fastai, nor the new freeze_to() in Jeremy’s nasnet.ipynb. I see he uses ‘17’ in his notebook, from my reading it looks like it has 17 layers with a final fc layer. I’m not sure what if any new effect freezing to layer 17 would have as that’s what it already was? Any info about how to unfreeze and to what layer would be useful for further use cases. Maybe this is covered in future lessons I’ve not watched.

Glad the hint helped!

digitalspecialists · February 12, 2018, 7:52am

Re no val_idx. Actually, looking at dataset.py - I’ve only just started with python - but it looks like for a csv load if you don’t include a value it calculates a default set for you, which early lessons did outside the function, and for a PATH load, like jeremy’s nasnet.iypnb, well the validation images are in a path anyway. So in your case with dog breed it just did it for you. Try passing in [0] for val_idx and see if it improves on kaggle (I don’t think a local logloss calculation is any longer meaningful with an empty validation set).

YJP · February 12, 2018, 8:45am

Hello,

You are correct and apologies for the incorrect comment about the default in dataset.py. Thank you for looking into this.

Looking at ‘from_csv’ below from dataset.py, it does pull through 20% if it has not been specified. So the default will be 20% of the images, not 1 image.

    def from_csv(cls, path, folder, csv_fname, bs=64, tfms=(None,None),
               val_idxs=None, suffix='', test_name=None, continuous=False, skip_header=True, num_workers=8):
        """ Read in images and their labels given as a CSV file.

        This method should be used when training image labels are given in an CSV file as opposed to
        sub-directories with label names.

        Arguments:
            path: a root path of the data (used for storing trained models, precomputed values, etc)
            folder: a name of the folder in which training images are contained.
            csv_fname: a name of the CSV file which contains target labels.
            bs: batch size
            tfms: transformations (for data augmentations). e.g. output of `tfms_from_model`
            val_idxs: index of images to be used for validation. e.g. output of `get_cv_idxs`.
                If None, default arguments to get_cv_idxs are used.
            suffix: suffix to add to image names in CSV file (sometimes CSV only contains the file name without file
                    extension e.g. '.jpg' - in which case, you can set suffix as '.jpg')
            test_name: a name of the folder which contains test images.
            continuous: TODO
            skip_header: skip the first row of the CSV file.
            num_workers: number of workers

        Returns:
            ImageClassifierData
        """
        fnames,y,classes = csv_source(folder, csv_fname, skip_header, suffix, continuous=continuous)

        **val_idxs = get_cv_idxs(len(fnames)) if val_idxs is None else val_idxs**
        ((val_fnames,trn_fnames),(val_y,trn_y)) = split_by_idx(val_idxs, np.array(fnames), y)

        test_fnames = read_dir(path, test_name) if test_name else None
        if continuous:
            f = FilesIndexArrayRegressionDataset
        else:
            f = FilesIndexArrayDataset if len(trn_y.shape)==1 else FilesNhotArrayDataset
        datasets = cls.get_ds(f, (trn_fnames,trn_y), (val_fnames,val_y), tfms,
                               path=path, test=test_fnames)
        return cls(path, datasets, bs, num_workers, classes=classes)

In my case, I found a log loss figure improved in Kaggle when I trained on a whole data set -1 (val_idxs = [0]).

About unfreezing for the Dog Breed classification data, apparently it is best to use the pre-trained model for this particular case:

jeremy · February 13, 2018, 2:00am

Thanks for the PR to nasnet.py! I’ve merged that now. If there are any improvements suggested to the notebook, I’ll happily take PRs there too

stoddardg · February 13, 2018, 2:39am

Hi all,

I just spun up my instance on Paperspace for lesson 1 and I’m getting an import error when running the imports in lesson1.ipynb. The error message gives a “Syntax error: “Return” outside of function” and it traces it back to line 620 in nasnet.py. My guess is that it was broken by the recent merge but I’m not sure. I’m attaching the error below.

YJP · February 13, 2018, 4:02am

@jeremy

Hi Jeremy, ‘return model’ was outside of def nasnetalarge. Sorry but could you please change this. Thank you in advance.

sumo · February 13, 2018, 5:36pm

Hi all, I just started coding up lesson 1 on AWS and when I was importing the fastai libraries, it gave the same error ‘return outside function’.

Please let me know if you figure out a way to fix this.

teidenzero · February 13, 2018, 6:27pm

Hi Sumo, is this your error?

sumo · February 13, 2018, 7:10pm

Yes. That is the same error I get. Thanks for following up @teidenzero

teidenzero · February 13, 2018, 7:50pm

@sumo I wish I could help but I’m getting the same error together with @lukeharries. I haven’t found trace of the issue anywhere else and it looks like it’s a consequence of some recent change so hopefully it will be addressed soon

davecazz · February 13, 2018, 8:48pm

Hey Guys,

YJP posted a fix to this issue yesterday. go to your fast.ai folder and do “git pull” from the terminal

You may run into a merge conflict with some notebook files that have also been updated. This will cancel out the pull.

If this happens you may want to commit your changes locally and then merge. If you are not that familiar with GIT the most pain free way of keeping up with code changes is to make a duplicate of any notebooks you are working on so that you dont have to deal with merge issues.

teidenzero · February 13, 2018, 8:51pm

trying immediately, thanks

lukeharries · February 13, 2018, 10:10pm

git pull now fixes it! @teidenzero Must have been a recent merge as it didn’t previously.

Is it worth setting up some CI/CD with CircleCI to make sure there are no breaking changes merged in future?

lukeharries · February 13, 2018, 10:15pm

@teidenzero I’ve created a feature request for CI/CD with testing to be setup: https://github.com/fastai/fastai/issues/152

Give it a thumbs up to try and prevent this happening again

YJP · February 13, 2018, 10:23pm

Hello,

First of all, I am truly sorry about the error as this stems from the pull request I previously made.

To fix this, since Jeremy updated fastai library so you will have to update it again.
To update it, as davecazz suggested, under ~/fastai directory, you can do the following command:

git pull

If you have any conflict message as follow:

then you can commit it locally by following the instruction below and do “git pull” again:

https://help.github.com/articles/resolving-a-merge-conflict-using-the-command-line/#platform-linux

Thank you for your patience.

lukeharries · February 13, 2018, 10:27pm

No worries at all @YJP. Happy to help setup the CI/CD if needed

amritv · February 14, 2018, 4:04pm

This is based on the seedling data. unfreezing the layers leads to a substantial loss of accuracy and I saw the post earlier where not unfreezing may be beneficial on data that is similar to imagenet. However for the seedling challenge are there any thoughts why the numbers are skewed the other way?

Prior to unfreeze:

epoch      trn_loss   val_loss   accuracy                  
    0      0.543078   0.54388    0.913213  
    1      0.484357   0.424838   0.936952                  

[0.4248376, 0.9369517594575882]

After unfreeze

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 6.91 µs
Epoch
100% 2/2 [10:52:34<00:00, 19577.06s/it]
epoch      trn_loss   val_loss   accuracy                       
    0      6.47271    480.077972 0.117191  
    1      6.442844   178.671402 0.10587                        

[178.6714, 0.10587002224517318]