Part 2 Lesson 10 wiki

(Arvind Nagaraj) #321

I also did the same…I took a small subset to tune the imdb LM just to see:
(1) how well the classifier scores with a slightly weaker backbone
(2) how quickly I can get this going end to end

I’ll share results soon.

(Kevin Bird) #322

(rachana) #323

How should we work on text that is unsupervised. Language model will learn about the language. But suppose we have general text with no sentiment already given to us. How should we first find the sentiment of the text for Neural Network to learn from? Do we have to do KNN or something.


Things changed much with newer versions of Jupyter :slight_smile: I run the server in a tmux session and it doesn’t stop working if I disconnect. I can run it over night and can reconnect in the morning and everything works just fine.

What is really neat is that it buffers the messages when I am disconnected. When I reconnect it replays them and I see the updates stream to my notebook. Works with tqdm etc. One might need to increase the io msg limits though for it to work well (this can be done in jupyter config file) but due to me having an older laptop this is not an option. Hence for cells that require a lot of updates I sometimes go for %%capture.

I use standard fastai environment. Here are the package versions I have installed:

(fastai) radek@server:~$ conda list | grep jupyter
jupyter                   1.0.0                    py36_4  
jupyter-contrib-core      0.3.3                     <pip>
jupyter-contrib-nbextensions 0.4.0                     <pip>
jupyter-highlight-selected-word 0.1.0                     <pip>
jupyter-latex-envs        1.4.4                     <pip>
jupyter-nbextensions-configurator 0.4.0                     <pip>
jupyter_client            5.2.3                    py36_0  
jupyter_console           5.2.0            py36he59e554_1  
jupyter_core              4.4.0            py36h7c827e3_0  

(Bart Fish) #325

@vibhorsood and @keratin discussed cuda out of memory issues. On my 8GB GTX-1080 I was able to run the LM without problems with a BS of 48, but the Classification model ran out of memory with BS 48 and BS 32, It is currently running for me with BS 16, but each epoch will take 18-20 mins. 14 epochs will take 4-5 hours. I’ll post my results when it completes. nvidia-smi reports it’s using 7598 Mib out of 8114Mib

(Nikhil B ) #326

I ran the LM model with default bs but only 5 iterations for now.
What was the error signature for the mem error in Classification model? I am debugging the Classification model with default bs, and I see this: CuDNNError: 8: b’CUDNN_STATUS_EXECUTION_FAILED’

(Arvind Nagaraj) #327

If you successfully get 95% accuracy in the imdb classification task, you can trust it to predict labels for the unsupervised set as well, right?
The word “unsupervised” might be confusing - they should have called it “unclassified”. But then again ‘unclassified’ also means a whole another thing :grinning:

(Bart Fish) #328

@beecoder, It was a Cuda out of memory error, which I don’t have the text for because the notebook has been reset since then.

(Nikhil B ) #329

sure, thanks anyway.

(rachana) #330

Thanks so much for your response. But in IMDB we know about the classes pos/neg etc. What if we do not know that… What if we need to identify the sentiments from the document. Then can we call it unsupervised :-)… Question is how should we go about solving that… How should we figure out different classes first before applying Neural Network or any other ML technique.

(Arvind Nagaraj) #331

There is an awesome technology called Amazon mechanical turk where you can get extremely intelligent agents to label data for you.
But, in all seriousness, unsupervised learning is not as mature as the supervised learning we do in and some methods like fast approximate neighbors of nearby ‘movie reviews’ in latent space can be used to make this less cumbersome than manually labeling data.

(rachana) #332

Agree thanks for your response. But in real life we do not work with ‘movie review’ data :-)… real life data is unsupervised to begin with…

(Arvind Nagaraj) #333

I encourage engineers in my team to find creative ways and try to label data while it is collected. Of, course it’s not always possible but sometimes some amount of label assignment can be done.

Jeremy has mentioned this in the past - it is usually better to observe what people do than ask them later in a survey. use caution and comply with data security and privacy laws and regulations.

(Christine) #335

I was able to run Focal Loss (with a few different alpha/gamma settings) on the IMDB notebook, but the results were never very good. I’m wondering now - is this because I’m effectively throwing away the pre-trained weights (the wiki weights were trained for a different loss function, so it’s not obvious to me that they’d be close to good for a new loss function). Or possibly I’ve just done something daft with the code, I’m definitely on the math=easy, code=hard side of the spectrum :slight_smile:

(rachana) #336

Thanks Arvind so looks like it is just a manual process. Talked to @binga he also mentioned that this labelling pre processing is pretty much manual… Strange but looks like that is the state.

(Kevin Bird) #337

I’m reading through the Universal Sentence Encoder Paper and in section 3.1 they talk about the encoder input and say

The encoder takes as input a lowercased PTB tokenized string and outputs a 512 dimensional vector as the sentence

My questions is: is there anything special about PTB or is that just a tool that builds tokenized strings. When I looked it up, it looks like it is from Stanford and it looks like a java library with a lot of options available so would using PTB matter in this case or would the using lowercased strings and an output of 512 dimensional vector be the important part of this sentence?

(vibhor sood) #338

@rachel is it possible for you to add english subtitles for lesson 10 video…thanks

(Even Oldridge) #339

I’ve got a PR for a fix for language models that will hopefully help it run on lower memory devices.

Basically Pytorch allocates tensors dynamically as needed and doesn’t reuse the memory. When it’s not sure if it still needs the memory (i.e. before garbage collection) two copies exist. This happens multiple times with a random bptt.

I’m not sure if it’s the same issue in the case of classification. I’ll need to see how the dataloader is creating the batches, but if it contains the same randomness then a similar fix should be possible.

(Even Oldridge) #340

I took a quick look and the classifier is using sortishsampler and sortsampler which if I recall correctly Jeremy mentioned in class starts with the smallest (ish) samples and gets bigger. This is going to have the same issues as before and a similar fix should be possible.

In fact it should be quite easy; just reverse the list so that it starts with the largest samples. I’ll try for another PR in the morning, or someone else can fix it in the meantime and see if that solves the issue.

(Jeremy Howard) #341

Thanks! That’s merged now. What impact on memory did you see, in terms of RAM use before vs after?