Developer chat

No, I’m not doing that. In a virtual environment created for fastai I’m running this in a notebook thats it.

edit: solved that was due to proxy server :slight_smile:

I was looking at the code for transformers ( fastai/fastai/text/models/ transformer.py) and noted that the code for _line_shift1 is commented out in this script, while it is still used in line 130:

BD = _line_shift1(torch.einsum('bind,jnd->bijn', (wq+v, wkr)))

I found no other occurrences of _line_shift1 in the repo. Is this intentional or is it a bug?

Hi @jeremy ,

Really appreciate the work that you and your team have put into building this amazing community!

As a designer I would like to suggests a few new features that might make for a better UX for people who’ll want to clean their dataset using the ImageCleaner.

  1. It would be much better if there’s a progress bar that informs the user what percentage of the work they have already completed. Motivating the user and giving them dopamine feedback is going to be very helpful for them to continue making progress.

  2. A nice UI design that makes the user want to engage further with the tool.

  3. I think having a done button which immediately runs the ImageList to save the ImageCleaner’s state in the cleaned.csv file will alleviate the risk of the user accidentally overwriting their progress.

I attached a screenshot of a design I created really quickly to demonstrate my point.

1 Like

Idea, which I already implemented for myself inside a notebook, but I wonder if it might have broader appeal:

A split_with_grouping(group_from_filepath_re, pct, seed) function that would allow you to split a dataset randomly by percentage, like split_by_rand_pct, but without splitting up groups as defined by a regex.

For example, I’m using the 50States10k dataset of US Google StreetView images(https://arxiv.org/pdf/1810.03077.pdf, smaller dataset here). This has folders for each State, and then files for each cardinal direction for each of many randomly selected points labeled by some kind of hash, so for instance:

  • 50States10k/Alabama/2007_-NPWPMrYipeYcLsiZqKRyw_0.jpg
  • 50States10k/Alabama/2007_-NPWPMrYipeYcLsiZqKRyw_180.jpg
  • 50States10k/Alabama/2009_3BS7oprV5tjwg-M4dA1nLA_270.jpg
  • 50States10k/South Dakota/2011_iloPUAZx7Vw59X-qJB2OQw_90.jpg

Now if you simply use split_by_rand_pct, you will wind up with an unfair validation set, as for each validation image, in most cases you were training with images from other cardinal directions of the same point,. You want to instead validate it with examples of streetviews from locations it has never seen at all.

You could make a csv file and split the images manually but that sounds like a major pain.

So instead why not just have a function that can take an regex which can identify that, for instance, the top two examples (and two others) are all part of the same group and need to be collectively assigned to either the training or validation set.

(in this case, what I used was:
re.match(r'\d{4}_([\w-]+)_\d+',Path(x).stem).group(1) which for example above spits out -NPWPMrYipeYcLsiZqKRyw)

I could share more of my code, but it’s not particularly optimized and I first wanted to float the general idea.

I have a weird error which I believe is a problem with the new library (basically because there is extremely little input from my side).

I have just enrolled in the Kaggle competition, as practice for the course 1 (from 2020). Although I wanted to do a more complicated model, I wanted to start simple, so I tried using the simplest NLP classifier model of fast.ai. For that I have written

dls = TextDataLoaders.from_csv(path=path, 
                           csv_fname='train.csv', 
                           text_col='text', 
                           label_col='target', 
                           valid_pct=0.2)

This loads just fine. Then, replicating the NLP model:

learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=F1Score)
learn.model_dir = '/kaggle/working/'
learn.fine_tune(2, 1e-2)

However, when trying to calculate the F1Score, it seems to get back an error

TypeError: unsupported operand type(s) for *: 'AccumMetric' and 'int'

Does anyone have any intuition of why does this happen and how would one go about solving it?

Responded here

I am trying a script that has a line

md = LanguageModelData(PATH, 1, vs, trn_dl, val_dl, bs=bs, bptt=bptt)

on running it, I get an error:
“NameError: name ‘LanguageModelData’ is not defined”

On looking at previous issues I discovered that this function was available only till v0.7 and has been removed in v1.0.x onwards. Is there a workaround to run codes that have used this function or an equivalent function that does the job of LanguageModelData?

I am using this function for the implementation of ULMFit, in the following code snippet:
trn_dl = LanguageModelLoader(np.concatenate(trn_lm), bs, bptt)
val_dl = LanguageModelLoader(np.concatenate(val_lm), bs, bptt)
md = LanguageModelData(PATH, 1, vs, trn_dl, val_dl, bs=bs, bptt=bptt)

where trn_lm and val_lm are numpy arrays.
I am new to this forum and would appreciate any help/insights, thank you

Hi @shreyagupta08, welcome to the fastai community

This is deprecated and no more available. What you are looking for is language_model_learner.

Tutorial
https://docs.fast.ai/tutorial.text#Fine-tuning-a-language-model-on-IMDb

3 Likes

Hello. Created dataloaders, correctly returns pictures with classes, but when training a neural network after the first epoch, an error appears

line 240, in encodes
def encodes(self, o): return TensorCategory(self.vocab.o2i[o])
KeyError:

can anyone help solve this?

upd: problem solved. should explicitly passing a list of classes to CategoryBlock

Hi, just wanted to follow up about this idea. I suspect it got lost amidst the upgrade to v2.

To generalize a bit (and update for v2), you could have a splitter function in the new datablock api (GroupPreservingSplitter? SegregatedSplitter?) which takes a function (item->groupidentifier) and a percentage, and splits into training and validation sets without splitting up groups (as identified by the function).

Edit: I went looking for an implementation of the underlying algorithm (to avoid reinventing the wheel), and this is the only one I could find. It’s definitely more polished than what I had written for myself but not substantially different.

Should I just write up the code and submit a pull request?

Is there any way to use the callback functionality to save a model after x epochs and resume training from x + 1 epochs in fastai version 0.7?

Essentially I want to be able to run the following piece of code in v0.7:

any help/insights greatly appreciated! thank you

hey @jeremy how can I create a post in forum? I don’t see a button for it.

@monajalal you need to read at least 4 topics over at least 15 minutes before you can post.

1 Like

Thanks a lot for your response.

I have been wanting to use stochastic weight averaging (SWA) for some projects with fastai, which I had not seen for v2. Thus, I implemented a callback importing what is now part of PyTorch version 1.6 (i.e. from torch.optim.swa_utils import AveragedModel). That actually makes it very quick to write (see code below). I wanted to check what you think is the most appropriate way to make it available to others.

I.e., does it make sense to submit a PR to add it as a callback to the fastai package (if so, I’d probably want some feedback on whether I messed anything up, the best stylistic way to do it, is there a preference about importing from PyTorch or not, should the model of the Learner be replaced at the end of training etc.)? Or is it a little too esoteric to add it and you’d prefer if I just post / do a blog post on how to do SWA?

class SWA(Callback):
    def __init__(self, model, swa_start=0, device=torch.device('cpu')):        
      self.swa_start = swa_start
      self.swa_model = AveragedModel(model)
      self.device = device

    def after_epoch(self):        
        if self.epoch >= self.swa_start:         
          self.swa_model.update_parameters(self.model)
          if self.epoch == self.n_epoch-1:
            torch.optim.swa_utils.update_bn(loader=self.dl, 
                      model=self.swa_model, 
                      device=self.device)
            self.learn.model = self.swa_model
3 Likes

@deepgander I’d be happy to take a PR. It would need to be in a notebook with an explanation of what it does and how to use it, and some tests that it’s working correctly. You might want to add to the top of your class:

run_valid=False

Include in the notebook checks that inference works correctly. Although think about whether you could use after_fit for bn update.

Have a look at NativeMixedPrecision to see how to_native_fp16 and to_native_fp32 make it easy to turn the behavior on and off. You might want something similar.

I’m not sure why you’re passing device. Is that not something you can get from the model or learner? If you do need to pass it, should you be passing the default device instead of cpu?

(If you want to discuss these issues more in real time, feel free to join the Discord chat.)

1 Like

I have been asking the same questions on how correctly implement SWA in the last few days.
If you have any update please contact me and I will be happy to help!

Would this be a better alternative?

from torch.optim.swa_utils import AveragedModel
class SWA(Callback):
    def __init__(self, model, swa_start=0, cycle_len=1):    
        store_attr()
        self.swa_model = AveragedModel(model)

    def after_epoch(self):        
        if self.training and (self.epoch >= self.swa_start) and (self.epoch%self.cycle_len==0):        
            self.swa_model.update_parameters(self.model)
            
    def after_fit(self):
        torch.optim.swa_utils.update_bn(loader=self.dl, model=self.swa_model, device=self.dls.device)
        self.learn.model = self.swa_model
    
    def before_validate(self):
        self.old_model = self.learn.model
        self.learn.model = self.swa_model
    
    def after_validate(self):
        self.learn.model = self.old_model         

I didn’t set run_valid=False because I am using swa_model to validate. I thought it would be better to record metrics based on predictions from swa_model since it will be the final model. Also, if I understand correctly from the paper using cyclical learning rates lead to a wider optima hence better. So, I added an cycle_len parameter if people want to use cyclical lr schedulers opposed to constant lr. Will run some tests shortly to see how they compare.

5 Likes

@kcturgutlu looks like you’re not using SWA during training? If so, is that intentional? I was under the impression that SWA is meant to be part of the training - although I do see that doesn’t seem to be shown in the snippet you showed from the paper.

That’s quite elegant (and gets the validation scores shown!). One issue with it though, I think: it looks like the initial (random) model is part of the average, which to me does not make sense. Or is that not what happens? I’m also thinking that showing the updating validation scores of the standard until the first SWA happens makes sense.

I also think it makes sense to have the if (self.epoch >= self.swa_start): check in after_fit, because we may wish to do fine-tuning (see rationale in the comment below).

Combining with my latest draft:

from torch.optim.swa_utils import AveragedModel
class SWA(Callback):
    "Implementation of Stochastic Weight Averaging based on https://arxiv.org/abs/1803.05407"
    def __init__(self, swa_start=1, cycle_len=1):
        self.swa_start, self.cycle_len = swa_start, cycle_len 

    def after_epoch(self):
        if self.training and (self.epoch == self.swa_start): self.swa_model = AveragedModel(self.learn.model)
        elif self.training and (self.epoch > self.swa_start) and (self.epoch%self.cycle_len==0): self.swa_model.update_parameters(self.model)
            
    def after_fit(self):
        # The if statement below is for when we use learn.fine_tune, which also leads to after_fit (assumption: SWA is only intended in 2nd stage of fine-tuning)
        if (self.epoch >= self.swa_start):
            # Need to update bn statistics for the swa_model at the end of fitting
            torch.optim.swa_utils.update_bn(loader=self.dl, model=self.swa_model, device=self.dls.device)
            self.learn.model = self.swa_model

    def before_validate(self):
        if self.epoch >= self.swa_start:
            self.old_model = self.learn.model
            self.learn.model = self.swa_model
    
    def after_validate(self):
        if self.epoch >= self.swa_start:
            self.learn.model = self.old_model

Regarding using the SWA during training: I believe the standard SWA really does indeed not involve the SWA averaged model in training, at all, and just keeps averaging the models at the end of epochs from the normal training process.

1 Like