Lesson 4 official topic

hi jeremy , i just finished with the lecture - 4 , after finishing the video i thought about applying fast.ai lib for the us patent question instead of hugging faces , but i’m facing issues , like how to load a pre trained model like the deberta v3 through fast.ai , can you give me some tips and how to proceed with using the fast.ai lib from NLP?

Also can you recommend a source where all the function names and all the functionality which fast.ai is named and given

Hi everyone! I am just curious about how I can get a list of a few pre-trained NLP tasks optimized models.
In Chapter 3, we learned how to use timm library to get a list of state-of-the-art pre-trained computer vision models. (notebook:Which image models are best?).

Is there a timm equivalent library for NLP models?
What is the best way to get the list of state-of-the-art pre-trained NLP models?

I would like to play around with different NLP models on top of the microsoft/deberat-v3-small model in our chapter 4 notebook (Getting started with NLP for absolute beginners).

I see a similar question that is asked before in this forum, and it sounds like manually checking the evaluation results of the models on hugging face is a way…?

If anyone has their own way of getting a list of a few NLP models, I would greatly appreciate if you can share your tactics so I can learn and apply to my own learning :smiley:

Thanks for asking the question a year ago @marix1120 :smiley: I literally hit the same issue and was curious how to unblock myself from the issue. I am glad I came to the forum pretty quickly!

The Hugging Face Models gives a selection of pretrained models to use. Everything from CV, NLP & Multimodal stuff.

Look for the NLP section and select the use case you want then the models will appear on the right.

I guess the number of downloads and hearts tells you how popular the models are. :ok_hand:t5:

1 Like

Thank you for investing your time and sharing this awesome tip, @PatTheAtak!
On my way of playing with different models from hugging face :blush:

Hey team, I am experimenting with fine-tuning a different NLP model.
I am following the exact same code from Jeremy’s notebook (Getting started with NLP for absolute beginners | Kaggle), except using a different model name.

Jeremy’s notebook: using microsoft/deberta-v3-small.
image

My experiment: using distilbert-base-uncased-finetuned-sst-2-english by following instructions from here.
image

But whenever I call trainer.train(), I do get an error message indicating the shapes of input arrays aren’t the same.
Full Error Message:
“ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1 and the array at index 1 has size 9119”
Error Screenshot:

I am not quite sure which arrays have different dimensions in this case. I suspect some model configs, argument configs might need to be updated for a different model as the same configs worked for microsoft/deberta-v3-small.

I saw a similar question from the forum (Lesson 4, multi-label classification ). It did not have a solution but had a good starting point to understand the issue which is the hugging face tutorials about Transformers.

While I am learning about how to use Transformers to finetune different models correctly, has anyone encountered a similar issue before or knows some other points I can unblock myself? Any insights/ideas would be greatly appreciated :pray:

This is the link to my colab that contains all e2e code:

Hi all. I have a question on lesson 4 - maybe a dumb question (kindly bear with me!). When @jeremy uses the HuggingFace tokeniser, the result produces the tokenised output. In that, there is also an attention mask.

On examining the mask, it has 0s and 1s.

My question, what is this attention mask? How it is derived? How does it help with the classification task?

Any resources/help pointing me in the right direction would be appreciated. Thanks all !!

Hi,

I don’t know why it does not work either. But I think reading the hugging face NLP tutorial or transformers tutorial might be helpful.

1 Like

Hi,

Attention_mask has 1s and 0s to indicate whether the word should be focused (1) or not (0). This is used because sentences have different lengths. If the sentence is shorter than others, it will have trailing 0s at the end to tell the model to ignore those white spaces. If you want to learn more about it, you can read this entry from Hugging face NLP course.

If it doesn’t make sense, you can skip this detail and come back to it later. The rest of the course does not assume you know these details.

Thanks for the pointers, @galopy!

1 Like

Thank you for this!

2 Likes

I think the fast. ai people should do this, but I understand where you are coming from

I was playing with some numbers to see which tokens they were in Jeremy’s kaggle notebook.

I found that the same token has multiple numbers for it. Why is this? I would expect the token to has unique numbers.

Eg.

image

Did this NLP Disaster Tweet Classification project after Lesson 4 of Part 1.

Got a pretty decent score on it - how would you improve it? Seems like throwing a bunch of models together in an ensemble is pretty crude, are there any ways I could have improved the pre-processing? Some further encoding I could have done on the transformer stuff?

I’ve been trying to finetune a language model per chp 10 of the textbook but have hit the following issue

learn = language_model_learner(
    dls, AWD_LSTM, drop_mult=0.3, 
    metrics=[accuracy, Perplexity()]).to_fp16()

is giving me

File /usr/local/lib/python3.9/dist-packages/fastcore/basics.py:496, in GetAttr.__getattr__(self, k)
    494 if self._component_attr_filter(k):
    495     attr = getattr(self,self._default,None)
--> 496     if attr is not None: return getattr(attr,k)
    497 raise AttributeError(k)

File /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1207, in Module.__getattr__(self, name)
   1205     if name in modules:
   1206         return modules[name]
-> 1207 raise AttributeError("'{}' object has no attribute '{}'".format(
   1208     type(self).__name__, name))

AttributeError: 'SequentialRNN' object has no attribute 'to_fp16'

Hello @jeremy , @ilovescience
I got this error under End to End SGD topic while executing the code. How can I fix this?

TypeError: unsupported operand type(s) for ** or pow(): ‘module’ and ‘int’ .

More Details:
def f(t, params):
a,b,c = params
return a*(t**2) + (b*t) + c

def mse(preds, targets): return ((preds-targets)**2).mean()

params = torch.randn(3).requires_grad_()
preds = f(time, params)

TypeError Traceback (most recent call last)
in
----> 1 preds = f(time, params)

in f(t, params)
1 def f(t, params):
2 a,b,c = params
----> 3 return a*(t**2) + (b*t) + c
4
5 def mse(preds, targets): return ((preds-targets)**2).mean()

TypeError: unsupported operand type(s) for ** or pow(): ‘module’ and ‘int’

Hi ksanad,

Unless the site is burning down, please don’t cold-call @notify the site principles. Everyone would love their direct response, but if everyone did this they would have no time for their own work.

I’m sorry I don’t know the answer, but ChatGPT tells me…

1 Like

Hi bencoman,

Thanks for letting me know regarding @notify. I was able to fix the above error by not using **2 on time which is a python module.

Regards
Anand

1 Like

For sure

I noticed this too. Did you finally conclude on what causes the same words to have different outputs despite the only difference being ‘_’ which in my thinking should not matter .

Further reading on in the forum, this particular thread. I have discovered that the model does not look at the token in isolation but also considers the context of the token in relation to other tokens/words. I am thinking this is probably the cause of above behavior.