Lesson 4 Advanced Discussion ✅

fredguth · November 14, 2018, 2:58am

TextLMDataBunch does not let us set bs nor max_vocab anymore. How do we set that?

KarlH · November 14, 2018, 3:03am

Language models are typically trained using some form of cross entropy loss. This loss function only evaluates if a word is wrong or right. Is there a way to reward the model for predicting an incorrect word that is very similar to the correct word?

Thinking back to the devise lesson in last year’s part 2, could you train a language model by having it output a word vector, evaluating loss by comparing the cosine similarity to the correct word vector?

fredguth · November 14, 2018, 3:04am

I guess we should use DataBlock… how do you set bs and max_vocab with this new api?

gpakosz · November 14, 2018, 3:04am

Do people use CNNs for NLP? Or are people using (Bi)-LSTMs?

pavisj · November 14, 2018, 3:06am

moms parameter in learn_fit_cycle? I didn’t get it… Can you please explain it again?

nithanaroy · November 14, 2018, 3:12am

How to expand the vocab to medical records from wiki text if using transfer learning? Assuming vocab only considers high frequency English words from Wikipedia

sgugger · November 14, 2018, 3:12am

I think Jeremy will explain them later.

nithanaroy · November 14, 2018, 3:13am

How to expand the vocab once the embeddings are learned?

sgugger · November 14, 2018, 3:14am

fastai will do that automatically for you: when you load your pretrained weights, it adapts them to the model. On words that were present in the pretrained model vocab, it puts the learned embeddings, and on new words, it puts the mean of al learned embeddings.
When you train your model, the first stage when the body is frozen, is to make those new embeddings learn something better.

Benudek · November 14, 2018, 3:38am

how can we use language models for language to language translation? So far, am aware of the approach with word vectors (last version fast.ai course, part2)

shaun1 · November 14, 2018, 3:38am

If columnar data contains other types of data, say like text or an image in one of its columns how would we use that information in our model? How would we do that using the FastAI library?

alenas · November 14, 2018, 3:38am

are fully connected networks good for very ‘wide’ tabular datasets with real values? can we train attention/autoencoder while training the network at the same time or would generally fully connected network + regularization work better?

rameshsingh · November 14, 2018, 3:39am

Is there any good resource which explains how to explain a Neural Net… i.e the feature interactions, multi-collinearity ?

paul · November 14, 2018, 3:41am

Regarding interpretation: except for the compute time, is there any other concern with calculating variable importance for a neural net?

rameshsingh · November 14, 2018, 3:45am

As I am thinking yes, When we build ML models using ML interpretation we try to say which features are actually driving the result and in business scenarios we need to deal with those feature more closely than others.

mb4310 · November 14, 2018, 3:56am

At my place of work we have the following approach to a cold-start problem:

For context, I am at a financial institution and we have trained a collaborative filtering model on data purchased from a third party which consists of ~40k businesses detailing their cash management needs. Specifically, it might look like business Y has reported it uses lock-box, fraud-management, equity-management, etc. (from a list of about 40 products).

How we use this to build a recommender for our own customers (who were not part of the model training process): the model works by embedding each unique user and product into a space of some fixed dimension and modeling the probability as a dot product or perhaps shallow neural network. So given a new user, if we knew where they stood in the embedding dimension we would be able to apply our model and say how likely does it think this user is to want product X.

We build a second model (as Jeremy has suggested) using auxiliary data, things like sales volume, number of employees, SIC codes (what ‘kind’ of business it is) and train this model with the following 40k datapoints: we know these features about the businesses in our purchased dataset, and we build a regressor that maps from these “identifying features” to its position in embedding dimension with RMSE error. Once we are satisfied we can situate a new client reasonably well based on these identifying features, we can take a new user, apply model 2 to situate them in embedding space, and then apply the original collaborative filtering model.

Happy to discuss if someone is curious!

harikrishnanrajeev · November 14, 2018, 3:57am

CNN for text classification … quite useful

gpakosz · November 14, 2018, 4:00am

GiantSquid · November 14, 2018, 4:07am

A friend sent me a dataset of NBA team offensive and defensive rating, consisting of the rating for the previous year, and rating for current year to date after each game. The goal is to predict what the rating will be at the end of the current year. I was going to try to use a RNN, but based on what Jeremy just said, it might work better to extract columns? Am I understanding that right? What does extracting columns mean, exactly?

mb4310 · November 14, 2018, 4:07am

@alenas
I assume you are discussing time-series data such as sales over time; my understanding is the fast.ai tabular api separates temporal information into date-parts like day of week, time of day, etc. and treats it as categorical and feeds into a linear network. For attentional models it requires an RNN framework. My experience has been treating time-series data from an RNN point-of-view can be very successful but only in situations where there is sufficient data which varies from problem to problem. Definitely worth experimenting with, it has brought me success in the past! See this thread https://forums.fast.ai/t/time-series-sequential-data-study-group/29686/19 for more discussion on DL approaches to time-series classification/forecasting!