Love this Yannic’s classic paper review: it explains two of the fundamental concepts in transformer’s architecture:
positional encoding
key/value queries
That’s my thought: at the end of the day, positional encoding is just feature engineering on token position: same idea of breaking a date field into multiple columns (day, day_of_week, month, …).
I am trying to submit the notebook on kaggle but this error message comes up: Cannot submit Your Notebook cannot use internet access in this competition. Please disable internet in the Notebook editor and save a new version.
I have disconnected from the internet but some FASTAI functions require internet.
It’s telling you the problem - you n eed to disable the ‘internet’ option for this notebook. It’s in the options on the top right of the main kaggle window
How do you decide whether you can get rid of outliers? When you removed those values it made your score go up, but those values will probably exist in the test set as well so do you remove the row entirely or do you use a different method for handling outliers
Trying to run “Getting started with NLP” via Kaggle; I bump into an error at line tokz = AutoTokenizer.from_pretrained(model_nm) with a message of ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. I am a bit surprised this was not already reported.
I mostly focus the live stream rather than running, so this will wait, but mentioning in case others have the same issue.
You don’t have internet connection in your notebook. You need to enable it, and you might need to verify your identity to do so.
If you are submitting your notebook, you technically need it to be offline and not accessing the internet, so you instead can add the model as a dataset to your notebook, something like this guide:
For me its just trial and error until I get a feel for what works on the machine. Maybe someone else has a better approach?
EDIT: I should clarify - my default position is to maximise batch size (for speed and loss normalization), so the trial and error is to see what is the largest batch size I can use. But not sure if this is an entirely correct assumption
I normally try a small batch size on multiples of 2. Try small then go higher and higher. Normally you’ll get a low memory error then that’s when you start reducing.
A lot of this is trial and error (for me personally at least).
With that said, you can explore using techniques such as mixed precision and gradient accumulation to train with bigger batch sizes regardless of whatever compute capabilities you are running with. In the end, you want to train with as big of batches as you can in general.
need to add some description in the notebook what exactly was changed, which is not much. it’s all the
great work by Jeremy. i uploaded datasets package and deberta model as kaggle datasets so they can be accessed when offline.