Lesson 4 official topic

If you pip install before you try to import then it’ll work without a restart fyi.

1 Like

Jeremy, what is the focus of the next lecture for next week? Since that’s the last lecture for now, then it’s 4 weeks break right.
Just curious about what’s next.

I have this running locally, and it was not trivial to set up. To install the protbuf library, use pip install protobuf, I had to build the HF transformers package/library from their instructions to get it to work with my setup.

Thanks for posting this. I attempted to submit from Jeremy’s notebook 10 days ago and couldn’t figure it out.

Did you also follow the steps in the link Jeremy recommended? Severstal: Steel Defect Detection | Kaggle.

I tried copying Jeremy’s notebook and making the changes in the notebook I could see from yours but then received an error. Copying your notebook however the notebook ran offline and I was able to submit the notebook and get on the leaderboard - finally! Is there something I’m missing? Jeremy’s notebook has nothing in the input folder but yours has debertav3small and housing. Did you follow the post above to get those files into your input folder?

Can you provide the exact steps you followed for a kaggle newbie please? :slight_smile:

1 Like

there are three places where Jeemy’s notebook uses internet:

  • install datasets package
  • download deberta-v3-small model
  • download housing dataset
    So to make it work offline i have downloaded the datasets package and it’s dependencies using pip, deberta-v3-small model from huggingface hub and housing dataset from internet and pre-processed it as per Jeremys notebook. Then uploaded all the above as kaggle dataset sand included the dataset s in the notebook.
4 Likes

changing it from preds to preds.flatten() should fix it…

4 Likes

You should be able to use mamba install -c fastchan transformers or pip install transformers if you’re on Linux or WSL.

FYI I added a comment to your kaggle notebook last week asking if you can provide more info about how you got this set up – I’m sure people would find it helpful to understand!

4 Likes

Initial outline here:

2 Likes

Thanks, yeah it looks like sentencepiece 0.1.86 was already installed: running pip install on sentencepiece 0.1.96 + restarting the kernel then rerunning with import worked - I tried proceeding without restarting per Jeremy’s suggestion but it seemed to need a restart.

Finally caught up with the lesson today. As Jeremy mentioned, it’s all new content not quite 1-to-1 mappable in the book. Some thoughts on the lecture :

  • Get introduced to a different library (eg. huggingface) and play around with it
  • Notice how the API might be feel a bit different, but the core concepts stay the same
  • The concept of pre-training a language model on unlabelled training data (via next work prediction, masked word prediction etc.)
  • Taking that language model and then fine-tuning the language model on specific labelled tasks (eg. classification)
  • Thinking through the process of trying to reshape a non-classification problem into a classification problem, unfamiliar problem category → familiar problem category.
  • Revisit Training, validation & test sets, especially Validation set design & Test set separation
  • Importance of visual exploration of data, metrics etc.
  • The “Text Preprocessing” part in Chapter 10 explains more on Tokenisation & Numericalization that should still be conceptually relevant for the lecture today

Also, the transformer arch. itself is fun to study, but maybe avoid it at this stage. There’ll be plenty of time later.

14 Likes

I’d suggest reading ch10 of the book (and the chapters before that), and running the “clean” version of the notebook as discussed in the previous lesson (and in “lesson 0”).

(NB: Info about what book chapters are covered is in the first post of each lesson thread.)

6 Likes

For my setup that did not work, I did this last week and it complained about not having the correct SSL1.0 library, I tried many things to solve the problem and unfortunately didn’t save the exact error message, but it was one of the libssl 1.0.0.so files that could not be found. When I rebuilt the transformer library it worked (using the latest version). I’m using CUDA 11 and a RTX-3090 GPU.

Are you using conda? Stuff like SSL libs should all be handled automatically by package dependencies in conda. (Although once you start building your own libs or using pip installers this can break – so best to just use conda/mamba as much as possible.)

True and yes. Pytorch complains a lot about the RTX-3090 unless you install it according to their CUDA 11 instructions. When you do that, there are some incompatibilities that must be dealt with. So some of the libraries in my Conda install may be different. The SSL version it installs is 1.0.0.X which is a different version than one that Conda transformers uses. Fast.ai though seems to work fine. Probably best not to rebuild anything unless you’re in a similar situation.

I wonder if anybody else is finding the official topic a bit unwieldy. When I’m listening to the lesson and paying attention, it becomes hard for me to find questions that others have asked because they become buried in discussion and commentary. I think the discussions are worthy but perhaps there’s a better way to organise it so people can easily find the questions that are being asked without losing track of the lesson, and upvote so Jeremy can address those questions during the session.

2 Likes

Might be a start of word indicator issue possible keyboard language mismatch, you can always do %debug at the top of the cell, this will open the debugger after the error, and you can then inspect the variables in debug, when it open there is a help on instructions. Could be one transformer is handling encoded/unencoded text better than the other (UTF-8 etc)

2 Likes

A dedicated tool for live Q&A like Slido would indeed be nice, but on the other hand, the advantage of the forum is that other participants can also answer questions, even after the lesson has ended. So not sure how one would best tackle that problem.

@jeremy wouldn’t it be beneficial to include env file for an easy local setup (I don’t know much about conda/mamba setup, but requirements.txt works well for Python in general) with instructions to install i.e.

conda env create -f environment.yml
1 Like

Was about to say the same thing.