Fastai v2 chat

jeremy · October 22, 2019, 6:59pm

Yes @MadeUpMasters that would be very helpful - we actually have a TODO in the code for just this!

DrHB · October 24, 2019, 12:43am

Hey guys,
quick question what is the correct way in new fastai to add model to multi gpu ?

when I use
learn.model = nn.DataParallel(learn.model)

I get following error

ilovescience · October 24, 2019, 3:01am

I don’t know if this helps, but the code for distributed GPUs in fastai v1 is over here.

For distributed training, you can wrap the model with DistributedDataParallel and you have to change the sampler of the DataLoader to a DistributedSampler.

MadeUpMasters · October 24, 2019, 12:59pm

So I tried to PR test_warns but I’m not sure how to do it cleanly as my fork is full of audio stuff.

My best guess was to create a new branch, force pull the latest version of fastai_dev master, remove my audio notebooks, commit, make the new test_warns function and a few tests that test it, commit (I’m at this point now). Do I just PR that then? Is there a better way? Thanks.

Edit: Nevermind, @baz helped me through it. I was making the mistake of merging my audio feature branches into master, instead of a secondary audio branch. Deleted master and got a clean version and setup my audio work to be based off a new audio_development branch. In case anyone else is facing a similar issue, my workflow is

branch off of audio_development for new features, merge them to audio_development when they are complete. Never merge to master.
if I need to PR to core, update master, branch off, implement feature and PR from there.

Edit 2: PR submitted! Let me know if there are any issues with it. I’m actually forked off a fork, so hopefully that doesnt cause problems.

MadeUpMasters · October 24, 2019, 4:53pm

My PR failed checks, but the error message isn’t clear. Any idea where I went wrong?

jeremy · October 24, 2019, 6:46pm

It’s just that you hadn’t run the export cell at the bottom of the nb, so the notebooks and scripts didn’t match up. It’s not a big deal - I’ve pulled it and will run the cell and push back.

muellerzr · October 25, 2019, 1:44am

@tank13 and anyone else having issues with Google Colab and the Image import issue, you need the newest installation of PIL to get it working.

!pip install Pillow --upgrade

fanyi · October 25, 2019, 1:00pm

It works, thank you very much!

heye0507 · October 28, 2019, 12:51am

Hi,

As far as this weekend I can tell, suggestion=True is not in the plot anymore…
And there is no plot function, in v2 you need to call learn.recorder.plot_lr_find() or something similar.

In addition, the callbacks are changed, SaveModelCallback and EarlyStoppingCallback has few minor changes, now you don’t have to put learn object inside, be aware of some other parameters changed. I highly suggest peaking the source code before calling them.

muellerzr · October 28, 2019, 1:00am

I’ve been able to get it to plot with just learn.lr_find(). That is intentional

github.com

fastai/fastai_dev/blob/master/dev/local/callback/schedule.py#L176


    lrs    = self.lrs    if skip_end==0 else self.lrs   [:-skip_end]
    losses = self.losses if skip_end==0 else self.losses[:-skip_end]
    fig, ax = plt.subplots(1,1)
    ax.plot(lrs, losses)
    ax.set_ylabel("Loss")
    ax.set_xlabel("Learning Rate")
    ax.set_xscale('log')


#Cell
@patch
def lr_find(self:Learner, start_lr=1e-7, end_lr=10, num_it=100, stop_div=True, show_plot=True):
    "Launch a mock training to find a good learning rate"
    n_epoch = num_it//len(self.dbunch.train_dl) + 1
    cb=LRFinder(start_lr=start_lr, end_lr=end_lr, num_it=num_it, stop_div=stop_div)
    with self.no_logging(): self.fit(n_epoch, cbs=cb)
    if show_plot: self.recorder.plot_lr_find()

heye0507 · October 28, 2019, 1:03am

It is by default, unless you set show_plot=False in lr_find, then you have to call learn.recorder.plot_lr_find to show the graph I believe

I was trying to find suggestion option in v1 which tells you the optimal lr

muellerzr · October 28, 2019, 1:04am

Ah yes, seems that is not in there yet. You are right

muellerzr · October 28, 2019, 4:20am

Has anyone else experienced instead of fit printing out the pretty style we’re used to it just shows the L? Eg:

(#5) [0,0.3795085847377777,0.3632674515247345,0.8341862559318542,00:40]
(#5) [1,0.40276384353637695,0.3697092533111572,0.8348686695098877,00:40]

This is happening on Colab, unsure what to do about this.

ilovescience · October 28, 2019, 7:00am

When can we expect support for distributed training? I was hoping to experiment with @jeremy’s RSNA kernels with multiple GPUs in GCP however it seems like distributed training is not yet implemented.

ilovescience · October 29, 2019, 2:25am

I assume the answer is not soon?

jeremy · October 29, 2019, 4:28am

Don’t do that please.

ilovescience · October 29, 2019, 5:06pm

I was just trying to say I assume distributed training right now is not something that will be developed soon (within the next couple weeks) by your team for fastai v2 as you are having other things to focus on for fastai v2. Therefore, I will work on projects accordingly. I didn’t mean to offend anyone.

jeremy · October 30, 2019, 4:42am

The main issue is that you asked your question twice. Within a 24 hour period. If your question isn’t answered, please don’t ask it again. Especially so soon. And there’s no point making assumptions about our priorities, because if we have priorities to share we’ll share them, and if we don’t and haven’t then there’s no point making up guesses.

As it happens, distributed was next on our list, and Sylvain finished it today (although it’s not well tested yet).

alexamadori · October 30, 2019, 5:00pm

I noticed that at the transformer and transformer-XL archs are commented out at https://github.com/fastai/fastai_dev/blob/master/dev/local/text/models/core.py#L13.

Is this because the code for the transformers hasn’t been ported yet? Would you like someone to pick up the work, or is someone already working on it (as far as you know)?

EDIT: I also noticed there’s no way to use pretrained word embeddings. Should that be a concern?

ilovescience · October 30, 2019, 10:58pm

Sorry about that. I thought it had been more than 24 hours.

Thanks for letting us know that distributed training is almost ready and thanks @sgugger for porting over the code. Please let us know when it’s ready for use!