No, I’m not doing that. In a virtual environment created for fastai I’m running this in a notebook thats it.
edit: solved that was due to proxy server
No, I’m not doing that. In a virtual environment created for fastai I’m running this in a notebook thats it.
edit: solved that was due to proxy server
I was looking at the code for transformers ( fastai/fastai/text/models/ transformer.py) and noted that the code for _line_shift1
is commented out in this script, while it is still used in line 130:
BD = _line_shift1(torch.einsum('bind,jnd->bijn', (wq+v, wkr)))
I found no other occurrences of _line_shift1
in the repo. Is this intentional or is it a bug?
Hi @jeremy ,
Really appreciate the work that you and your team have put into building this amazing community!
As a designer I would like to suggests a few new features that might make for a better UX for people who’ll want to clean their dataset using the ImageCleaner.
It would be much better if there’s a progress bar that informs the user what percentage of the work they have already completed. Motivating the user and giving them dopamine feedback is going to be very helpful for them to continue making progress.
A nice UI design that makes the user want to engage further with the tool.
I think having a done button which immediately runs the ImageList to save the ImageCleaner’s state in the cleaned.csv file will alleviate the risk of the user accidentally overwriting their progress.
I attached a screenshot of a design I created really quickly to demonstrate my point.
Idea, which I already implemented for myself inside a notebook, but I wonder if it might have broader appeal:
A split_with_grouping(group_from_filepath_re, pct, seed) function that would allow you to split a dataset randomly by percentage, like split_by_rand_pct, but without splitting up groups as defined by a regex.
For example, I’m using the 50States10k dataset of US Google StreetView images(https://arxiv.org/pdf/1810.03077.pdf, smaller dataset here). This has folders for each State, and then files for each cardinal direction for each of many randomly selected points labeled by some kind of hash, so for instance:
Now if you simply use split_by_rand_pct, you will wind up with an unfair validation set, as for each validation image, in most cases you were training with images from other cardinal directions of the same point,. You want to instead validate it with examples of streetviews from locations it has never seen at all.
You could make a csv file and split the images manually but that sounds like a major pain.
So instead why not just have a function that can take an regex which can identify that, for instance, the top two examples (and two others) are all part of the same group and need to be collectively assigned to either the training or validation set.
(in this case, what I used was:
re.match(r'\d{4}_([\w-]+)_\d+',Path(x).stem).group(1)
which for example above spits out -NPWPMrYipeYcLsiZqKRyw
)
I could share more of my code, but it’s not particularly optimized and I first wanted to float the general idea.
I have a weird error which I believe is a problem with the new library (basically because there is extremely little input from my side).
I have just enrolled in the Kaggle competition, as practice for the course 1 (from 2020). Although I wanted to do a more complicated model, I wanted to start simple, so I tried using the simplest NLP classifier model of fast.ai. For that I have written
dls = TextDataLoaders.from_csv(path=path,
csv_fname='train.csv',
text_col='text',
label_col='target',
valid_pct=0.2)
This loads just fine. Then, replicating the NLP model:
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=F1Score)
learn.model_dir = '/kaggle/working/'
learn.fine_tune(2, 1e-2)
However, when trying to calculate the F1Score, it seems to get back an error
TypeError: unsupported operand type(s) for *: 'AccumMetric' and 'int'
Does anyone have any intuition of why does this happen and how would one go about solving it?
I am trying a script that has a line
md = LanguageModelData(PATH, 1, vs, trn_dl, val_dl, bs=bs, bptt=bptt)
on running it, I get an error:
“NameError: name ‘LanguageModelData’ is not defined”
On looking at previous issues I discovered that this function was available only till v0.7 and has been removed in v1.0.x onwards. Is there a workaround to run codes that have used this function or an equivalent function that does the job of LanguageModelData?
I am using this function for the implementation of ULMFit, in the following code snippet:
trn_dl = LanguageModelLoader(np.concatenate(trn_lm), bs, bptt)
val_dl = LanguageModelLoader(np.concatenate(val_lm), bs, bptt)
md = LanguageModelData(PATH, 1, vs, trn_dl, val_dl, bs=bs, bptt=bptt)
where trn_lm and val_lm are numpy arrays.
I am new to this forum and would appreciate any help/insights, thank you
Hi @shreyagupta08, welcome to the fastai community
This is deprecated and no more available. What you are looking for is language_model_learner.
Tutorial
https://docs.fast.ai/tutorial.text#Fine-tuning-a-language-model-on-IMDb
Hello. Created dataloaders, correctly returns pictures with classes, but when training a neural network after the first epoch, an error appears
line 240, in encodes
def encodes(self, o): return TensorCategory(self.vocab.o2i[o])
KeyError:
can anyone help solve this?
upd: problem solved. should explicitly passing a list of classes to CategoryBlock
Hi, just wanted to follow up about this idea. I suspect it got lost amidst the upgrade to v2.
To generalize a bit (and update for v2), you could have a splitter function in the new datablock api (GroupPreservingSplitter? SegregatedSplitter?) which takes a function (item->groupidentifier) and a percentage, and splits into training and validation sets without splitting up groups (as identified by the function).
Edit: I went looking for an implementation of the underlying algorithm (to avoid reinventing the wheel), and this is the only one I could find. It’s definitely more polished than what I had written for myself but not substantially different.
Should I just write up the code and submit a pull request?
Is there any way to use the callback functionality to save a model after x epochs and resume training from x + 1 epochs in fastai version 0.7?
Essentially I want to be able to run the following piece of code in v0.7:
any help/insights greatly appreciated! thank you
Thanks a lot for your response.
I have been wanting to use stochastic weight averaging (SWA) for some projects with fastai, which I had not seen for v2. Thus, I implemented a callback importing what is now part of PyTorch version 1.6 (i.e. from torch.optim.swa_utils import AveragedModel
). That actually makes it very quick to write (see code below). I wanted to check what you think is the most appropriate way to make it available to others.
I.e., does it make sense to submit a PR to add it as a callback to the fastai
package (if so, I’d probably want some feedback on whether I messed anything up, the best stylistic way to do it, is there a preference about importing from PyTorch or not, should the model of the Learner be replaced at the end of training etc.)? Or is it a little too esoteric to add it and you’d prefer if I just post / do a blog post on how to do SWA?
class SWA(Callback):
def __init__(self, model, swa_start=0, device=torch.device('cpu')):
self.swa_start = swa_start
self.swa_model = AveragedModel(model)
self.device = device
def after_epoch(self):
if self.epoch >= self.swa_start:
self.swa_model.update_parameters(self.model)
if self.epoch == self.n_epoch-1:
torch.optim.swa_utils.update_bn(loader=self.dl,
model=self.swa_model,
device=self.device)
self.learn.model = self.swa_model
@deepgander I’d be happy to take a PR. It would need to be in a notebook with an explanation of what it does and how to use it, and some tests that it’s working correctly. You might want to add to the top of your class:
run_valid=False
Include in the notebook checks that inference works correctly. Although think about whether you could use after_fit
for bn update.
Have a look at NativeMixedPrecision
to see how to_native_fp16
and to_native_fp32
make it easy to turn the behavior on and off. You might want something similar.
I’m not sure why you’re passing device
. Is that not something you can get from the model or learner? If you do need to pass it, should you be passing the default device instead of cpu
?
(If you want to discuss these issues more in real time, feel free to join the Discord chat.)
I have been asking the same questions on how correctly implement SWA in the last few days.
If you have any update please contact me and I will be happy to help!
Would this be a better alternative?
from torch.optim.swa_utils import AveragedModel
class SWA(Callback):
def __init__(self, model, swa_start=0, cycle_len=1):
store_attr()
self.swa_model = AveragedModel(model)
def after_epoch(self):
if self.training and (self.epoch >= self.swa_start) and (self.epoch%self.cycle_len==0):
self.swa_model.update_parameters(self.model)
def after_fit(self):
torch.optim.swa_utils.update_bn(loader=self.dl, model=self.swa_model, device=self.dls.device)
self.learn.model = self.swa_model
def before_validate(self):
self.old_model = self.learn.model
self.learn.model = self.swa_model
def after_validate(self):
self.learn.model = self.old_model
I didn’t set run_valid=False
because I am using swa_model
to validate. I thought it would be better to record metrics based on predictions from swa_model
since it will be the final model. Also, if I understand correctly from the paper using cyclical learning rates lead to a wider optima hence better. So, I added an cycle_len
parameter if people want to use cyclical lr schedulers opposed to constant lr. Will run some tests shortly to see how they compare.
@kcturgutlu looks like you’re not using SWA during training? If so, is that intentional? I was under the impression that SWA is meant to be part of the training - although I do see that doesn’t seem to be shown in the snippet you showed from the paper.
That’s quite elegant (and gets the validation scores shown!). One issue with it though, I think: it looks like the initial (random) model is part of the average, which to me does not make sense. Or is that not what happens? I’m also thinking that showing the updating validation scores of the standard until the first SWA happens makes sense.
I also think it makes sense to have the if (self.epoch >= self.swa_start):
check in after_fit
, because we may wish to do fine-tuning (see rationale in the comment below).
Combining with my latest draft:
from torch.optim.swa_utils import AveragedModel
class SWA(Callback):
"Implementation of Stochastic Weight Averaging based on https://arxiv.org/abs/1803.05407"
def __init__(self, swa_start=1, cycle_len=1):
self.swa_start, self.cycle_len = swa_start, cycle_len
def after_epoch(self):
if self.training and (self.epoch == self.swa_start): self.swa_model = AveragedModel(self.learn.model)
elif self.training and (self.epoch > self.swa_start) and (self.epoch%self.cycle_len==0): self.swa_model.update_parameters(self.model)
def after_fit(self):
# The if statement below is for when we use learn.fine_tune, which also leads to after_fit (assumption: SWA is only intended in 2nd stage of fine-tuning)
if (self.epoch >= self.swa_start):
# Need to update bn statistics for the swa_model at the end of fitting
torch.optim.swa_utils.update_bn(loader=self.dl, model=self.swa_model, device=self.dls.device)
self.learn.model = self.swa_model
def before_validate(self):
if self.epoch >= self.swa_start:
self.old_model = self.learn.model
self.learn.model = self.swa_model
def after_validate(self):
if self.epoch >= self.swa_start:
self.learn.model = self.old_model
Regarding using the SWA during training: I believe the standard SWA really does indeed not involve the SWA averaged model in training, at all, and just keeps averaging the models at the end of epochs from the normal training process.