Lesson 4 Advanced Discussion ✅


(Jeremy Howard (Admin)) #122

FYI when I tried concatenating the embeddings in a colab model I got slight worse results. Perhaps you could do the dot product, then concat the result with the metadata. I’m not sure what best practices are.


(nazim) #123

How has your experience been running Tabular learner on GPU vs CPU? I don’t see much of GPU being used.


(Jonathan Miller) #124

Is there any information anywhere about collaborative filtering where your training data is a temporal series and you want to predict the optimal decision given a set of choices for each timestep in a new instance?

For a toy example, a while back I had an idea for creating a model which makes the decisions for you in a certain video game ,where a playthrough of this game is made up of an exactly similar small number of discrete decisions.

However, there are certain things that you want 1-2 of in general at some point later in the game, but don’t want in the first 5-10 steps of the game, and vice versa. The game data files I have reflect this idea, and I assume that some form of collaborative filtering model could work for this, but how would you implement that idea of the ‘shopping cart’ being a temporal series of decisions?

Has this been studied at all, and if so can anyone link me to some research or search terms?


(Thomas Paul) #125

Hi there,

I want to find similarity between two sentences. I’m thinking of converting the sentence into a vector and then doing cosine similarity for it.

I saw that similar thing is done in case of images in DatasetFormatter.from_similars - Activations of the last layer in the model are converted into a vector and cosine similarity is taken to find similarity between images.[Medium post] [Code]

Can I use ULMFiT to get the activations of the last layer for a sentence and then compare them to find similarity? Your opinion and guidance are much appreciated.

PS: This my first post in the forum and I’m a newbie. If this is not the place to post this question, please direct me to the correct topic.

Many Thanks.


(Zarif) #126

Hi everyone, I’m trying to implement collab filtering to a problem, in business, where I have all the User ratings for movies (filled up matrix) and my main goal is to obtain an accurate embedding matrix that describes each User perfectly. Has anyone done something similar? Let me know and we can share ideas :slight_smile:


(Kabir Khan) #127

@Andreas_Daiminger @seb0 @jeremy
There are 3 main types of recommendation systems
1. Content-based filtering
This is where you would employ search techniques and unsupervised learning to come up with a representation of each item you want to recommend.
For example, if you have the title, description, genres for each movie in MovieLens you use TF-IDF (or the sum of word embeddings) on the title and description and something like a simple set intersection on the genres for each movie to discover the Batman Begins is close to The Dark Knight.
This allows you to say “You watched your first movie, here are some similar ones”
This is the simplest approach

2. Collaborative Filtering
This is everything @jeremy teaches so I won’t go over it really.
However, it should be noted that Explicit collaborative filtering has generally fallen out of favor and more recent recommendation systems rely on Implicit data.

  • Explicit means a rating, the user actually decided to give a star rating to the movie, or clicked the like button on a YouTube video for instance
  • Implicit represents the users inherent behavior. For example, the user watched the movie but did not rate it.

You generally have significantly more implicit data and it turns out the people are actually quite bad at rating stuff as “5 stars” means something different to every person.

In practice, that means you’d have a User x Movie matrix where the values would be 1 if the user has watched the movie and a 0 otherwise. A 0 does not mean the user disliked the movie, it just means they haven’t interacted with it. You need a new loss function to handle this kind of data so we generally use something like Bayesian Personalized Ranking https://medium.com/@andresespinosapc/learning-to-rank-bpr-5fe5561d48e0
(or WARP)

3 Hybrid Recommendation Engines
This IS NOT the combination of 2 separate models (one for Content-based filtering and one for collaborative filtering. That is an ensemble network)

Hybrid recommendation engines run the traditional collaborative filtering process, trying to learn User and Item embeddings. However, they incorporate user-specific and item-specific metadata during the training process.

The user embeddings and item embeddings are represented as a sum of the embeddings of all their features. So if you had the genres: action, thriller to describe the movie John Wick then the item embedding for John Wick would be the sum of the embeddings for the genres action, thriller, and the movie id instead of just the movie id like we do in traditional collaborative filtering.

In practice, this is similar to how the TabularLearner works. It learns an embedding for each of the categorical features. Each row in the table would be a different movie and each column would be a genre. The genre embeddings are learned through SGD but instead of trying to predict a classification value, we sum the genre embeddings for each row E(action) + E(thriller) + E(movie_id) = E(movie) = John Wick.

Hope this all makes sense. Feel free to ask questions.
@jeremy are you interested in adding any of this into the fastai library?
I’ve really only scratched the surface here. There is a whole class of recommendation systems that treats user interactions as a sequence and uses RNNs to predict the next user action that’s showing a lot of promise right now.


(Jeremy Howard (Admin)) #128

If you find you can get better results, and the code is clean, then absolutely! :slight_smile: Although I suspect things like factorization machines may be better still…


(Andreas Daiminger) #131

Hey @seb0

Not sure if you are still interested in this. But I cam across Deep Factorization Machines. It’s a very interesting approach to combine the Factorization with a NN. I will be experimenting with this the next couple of weeks. Here the paper


(Sebastian Fleck) #132

Yup I am ! I will check it out. Thanks @Andreas_Daiminger


(carson dahlberg) #133

Dumb question, (I may have missed this)…

when running out of memory during training, is all lost, (97% done for ex. then couldn’t allocate to GPU…) - i.e. unless learn.save(), it’s time to start over?

is there a way to incrementally save progress, and if need be, perhaps continue with online learning when restarting kernel? Am I thinking about this the wrong way? Thanks!


(Andreas Daiminger) #134

Hi @carsondahlberg

In Fastai 0.7 you could use the best_save_name parameter during training to save the best model.
I am sure there is something similar in Fastai 1.0

You can see a bunch of callbacks to save your training progress here