Lesson 4 official topic

I made one small change to Jeremy’s Kaggle NLP notebook from this:
model_nm = 'microsoft/deberta-v3-small'

To this:
model_nm = 'distilroberta-base'

The tokenizer with this model is different but was still working.

But then I got an error during the first epoch after running >> trainer.train();

This reply to a week 2 question helped correct the problem.

What other issues might occur when using different models?

Continuing on the sharing of Transformers resources, I had a whole thread of awesome resources online:

15 Likes

Yes! That’s what it was! I had the wrong train.csv file.

Thank you very much all that helped! Love you all!

I might come back for more help though, stay tuned :joy:

5 Likes

Also, an interesting post describing transformers architecture, starting from very basic things:
https://e2eml.school/transformers.html

2 Likes

Love this Yannic’s classic paper review: it explains two of the fundamental concepts in transformer’s architecture:

  • positional encoding
  • key/value queries

That’s my thought: at the end of the day, positional encoding is just feature engineering on token position: same idea of breaking a date field into multiple columns (day, day_of_week, month, …).

9 Likes

I am trying to submit the notebook on kaggle but this error message comes up: Cannot submit Your Notebook cannot use internet access in this competition. Please disable internet in the Notebook editor and save a new version.

I have disconnected from the internet but some FASTAI functions require internet.

I tried this solution fastai_offline User Guide ✔ | Kaggle but still getting this error.

Help please!

It’s telling you the problem - you n eed to disable the ‘internet’ option for this notebook. It’s in the options on the top right of the main kaggle window

How do you decide whether you can get rid of outliers? When you removed those values it made your score go up, but those values will probably exist in the test set as well so do you remove the row entirely or do you use a different method for handling outliers

8 Likes

I did that but some FASTAI function requires internet… I think…

You should try out this guide instead:

Basically, you need to add the model as a dataset so you can use it offline.

10 Likes

Trying to run “Getting started with NLP” via Kaggle; I bump into an error at line tokz = AutoTokenizer.from_pretrained(model_nm) with a message of ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. I am a bit surprised this was not already reported.

I mostly focus the live stream rather than running, so this will wait, but mentioning in case others have the same issue.

How do you go about determining and effective batchsize for GPU on whatever service we are using?

2 Likes

You don’t have internet connection in your notebook. You need to enable it, and you might need to verify your identity to do so.

If you are submitting your notebook, you technically need it to be offline and not accessing the internet, so you instead can add the model as a dataset to your notebook, something like this guide:

2 Likes

all right, just saw this reply from @ilovescience

1 Like

Oh and just pointing out that this guide is Jeremy’s but edited to be offline by @miwojc

1 Like

For me its just trial and error until I get a feel for what works on the machine. Maybe someone else has a better approach?

EDIT: I should clarify - my default position is to maximise batch size (for speed and loss normalization), so the trial and error is to see what is the largest batch size I can use. But not sure if this is an entirely correct assumption

2 Likes

I normally try a small batch size on multiples of 2. Try small then go higher and higher. Normally you’ll get a low memory error then that’s when you start reducing.

2 Likes

Will do! Thanks yet again Nick :slight_smile:

1 Like

A lot of this is trial and error (for me personally at least).

With that said, you can explore using techniques such as mixed precision and gradient accumulation to train with bigger batch sizes regardless of whatever compute capabilities you are running with. In the end, you want to train with as big of batches as you can in general.

4 Likes

need to add some description in the notebook what exactly was changed, which is not much. it’s all the
great work by Jeremy. i uploaded datasets package and deberta model as kaggle datasets so they can be accessed when offline.

3 Likes