Ah you may be correct, sorry for misusing the terminology! What I meant was to use two networks (one for tabular, one for text) and then concatenate them at the head in order to produce a single output.
Thank you so much for the awesome series of lessons! Going through the concepts and code examples of fastbook and attempting to answer questions at the end based on my understanding has been a real constructive exercise - good chance to find out what I thought I understood but didn’t.
I plan on going through the chapters again, starting with Ch1 after a week off. Timing wise currently thinking Mondays or Tuesdays 6-9pm PST or Sunday afternoon. If this is of interest to you, heart this post and I’ll set up a google form to manually organize people into post-class support groups. Format will be silently re-reading the chapters or implementing notebooks, followed by 30 mins of discussion.
Chapter 10 and the ULMFiT paper indicates that training a bidirectional model reduces the error rate on IMDB by almost 1%.
Does this mean that the base LM trainer on wikitext is trained backward and then then we further fine tune this LM with the IMDB dataset in the sale backward direction?
By backward does it mean that every sequence of words in text and text_ are just flipped around and the LM’s task is to predict the first word in the sentence in this case rather than the next?
The reason why tokenization techniques like stemming or lemmatization are not recommended when training neural networks, they essentially throw away certain useful pieces of information about the vocabulary and about the language.
I have seen people still use these techniques in Information Retrieval domain to improve the recall. So it depends on the context and knowing when to use & when not to use them.
More or less actually! You can see my example notebook I experimented with this (and sentence piece too on) back in v1, but it’s still the same thing in terms of concepts (it just shows sentence piece in terms of show_batch but you can see the backwards sentences)
Also Rachel discusses this too in her NLP course as well
Hi everyone! I have been trying to get started with NLP but I struggle to get a very simple example to work and I do no longer know what to try out.
I have a dataframe with several columns most of which I do not need. Among them, the useful ones are my x (‘Answered Questions’) and my y (‘Classification’). I manage to successfully build a language model with it and I am only missing the classifier.
I am struggling a lot to pass the y as a label…what am I missing here?
Thanks a lot @muellerzr! I’ll try re-implementing that!
Have you by any chance worked on visualising the trained embeddings using PCA? I have been Trying to do this but without much luck.
Also have you looked into the slanted triangular learning rates introduced in ULMFiT or do you have any resources for that? I’m trying to work on the IMDB_SAMPLE dataset to try out various quick experimentations as mentioned in the paper on it while not overfitting that model as it’s a tiny dataset!
Thanks!! I finally found the mistake (which I would suggest to clarify in the documentation). So basically, the dataframe that is passed (data in my case) must have only two columns. The x can be called whatever but the y must be called label!
How can I get the indices of my data which are in train and valid set from RandomSplitter? I tried dls_clas.val2idx but I find not way to extract the indices column
In the ch10 text classifier fine tuning, the discriminative learning slice has specific 2.6**4 constant. Is there a blog or some experiments on where this came from? I searched the forums, and I didn’t find an answer. Here’s the code snippet.