Lesson 4 Advanced Discussion ✅

TextLMDataBunch does not let us set bs nor max_vocab anymore. How do we set that?

2 Likes

Language models are typically trained using some form of cross entropy loss. This loss function only evaluates if a word is wrong or right. Is there a way to reward the model for predicting an incorrect word that is very similar to the correct word?

Thinking back to the devise lesson in last year’s part 2, could you train a language model by having it output a word vector, evaluating loss by comparing the cosine similarity to the correct word vector?

11 Likes

I guess we should use DataBlock… how do you set bs and max_vocab with this new api?

1 Like

Do people use CNNs for NLP? Or are people using (Bi)-LSTMs?

1 Like

moms parameter in learn_fit_cycle? I didn’t get it… Can you please explain it again?

3 Likes

How to expand the vocab to medical records from wiki text if using transfer learning? Assuming vocab only considers high frequency English words from Wikipedia

1 Like

I think Jeremy will explain them later.

2 Likes

How to expand the vocab once the embeddings are learned?

1 Like

fastai will do that automatically for you: when you load your pretrained weights, it adapts them to the model. On words that were present in the pretrained model vocab, it puts the learned embeddings, and on new words, it puts the mean of al learned embeddings.
When you train your model, the first stage when the body is frozen, is to make those new embeddings learn something better.

8 Likes

how can we use language models for language to language translation? So far, am aware of the approach with word vectors (last version fast.ai course, part2)

1 Like

If columnar data contains other types of data, say like text or an image in one of its columns how would we use that information in our model? How would we do that using the FastAI library?

3 Likes

are fully connected networks good for very ‘wide’ tabular datasets with real values? can we train attention/autoencoder while training the network at the same time or would generally fully connected network + regularization work better?

2 Likes

Is there any good resource which explains how to explain a Neural Net… i.e the feature interactions, multi-collinearity ?

1 Like

Regarding interpretation: except for the compute time, is there any other concern with calculating variable importance for a neural net?

2 Likes

As I am thinking yes, When we build ML models using ML interpretation we try to say which features are actually driving the result and in business scenarios we need to deal with those feature more closely than others.

At my place of work we have the following approach to a cold-start problem:

For context, I am at a financial institution and we have trained a collaborative filtering model on data purchased from a third party which consists of ~40k businesses detailing their cash management needs. Specifically, it might look like business Y has reported it uses lock-box, fraud-management, equity-management, etc. (from a list of about 40 products).

How we use this to build a recommender for our own customers (who were not part of the model training process): the model works by embedding each unique user and product into a space of some fixed dimension and modeling the probability as a dot product or perhaps shallow neural network. So given a new user, if we knew where they stood in the embedding dimension we would be able to apply our model and say how likely does it think this user is to want product X.

We build a second model (as Jeremy has suggested) using auxiliary data, things like sales volume, number of employees, SIC codes (what ‘kind’ of business it is) and train this model with the following 40k datapoints: we know these features about the businesses in our purchased dataset, and we build a regressor that maps from these “identifying features” to its position in embedding dimension with RMSE error. Once we are satisfied we can situate a new client reasonably well based on these identifying features, we can take a new user, apply model 2 to situate them in embedding space, and then apply the original collaborative filtering model.

Happy to discuss if someone is curious!

26 Likes

CNN for text classification … quite useful

1 Like
2 Likes

A friend sent me a dataset of NBA team offensive and defensive rating, consisting of the rating for the previous year, and rating for current year to date after each game. The goal is to predict what the rating will be at the end of the current year. I was going to try to use a RNN, but based on what Jeremy just said, it might work better to extract columns? Am I understanding that right? What does extracting columns mean, exactly?

@alenas
I assume you are discussing time-series data such as sales over time; my understanding is the fast.ai tabular api separates temporal information into date-parts like day of week, time of day, etc. and treats it as categorical and feeds into a linear network. For attentional models it requires an RNN framework. My experience has been treating time-series data from an RNN point-of-view can be very successful but only in situations where there is sufficient data which varies from problem to problem. Definitely worth experimenting with, it has brought me success in the past! See this thread https://forums.fast.ai/t/time-series-sequential-data-study-group/29686/19 for more discussion on DL approaches to time-series classification/forecasting!

2 Likes