I am about to apply collaborative filtering for matching target audience of different advertisement campaigns and sites/applications. Before some real examples lets take a look at lesson 5 notebook:

- 671 Users, 9066 movies, 1.6% non empty cells in matrix
- Accuracy on train / test : 0.61688 0.76318 (mine real results)

So, we slightly overfit and accuracy on train set should be slightly better.

** Validation set**

```
fact = learn.data.val_y.reshape(-1)
preds = predict(learn.model, learn.data.val_dl)
```

Box plot of predictions:

Correlation is 0.57 (which is very high). Predictions are very wide, but, as Jeremy told, moving up when real ratings are moving up as well. So we learnt something.

** Train set **

```
fact = learn.data.trn_y.reshape(-1)
preds = predict(learn.model, learn.data.trn_dl)
```

Box plot of predictions:

Correlation is 0.00058. Predictions for all ratings looks like have exactly the same distribution. **Anybody has an idea why? **

**Real world case **

Company X is running advertisement campaigns by buying traffic (user visits) from different sites (placements). The idea is quite simple: ad campaigns = userId, placements = movieId, go and build recommendation system. Instead of rankings I use some other number, called conversion rate = user bought product / users viewed an ad. Evident differences from movies rating example:

- % of non empty cells in matrix is 5 times lower - 0.3%
- target values is continuous (instead of 10 possible rating values)
- range of target values might be huge: from 0.0001 to 7
- 60% of values are zero (placement was useless for specific ad campaign)

What I get:

- Some learning is happening

`[ 0. 0.10338 0.09928] [ 1. 0.06246 0.05808] [ 2. 0.05271 0.05509] [ 3. 0.0477 0.04698] [ 4. 0.0454 0.04563] [ 5. 0.0439 0.04541] [ 6. 0.0423 0.04542]`

Hist of predictions and facts looks terrible, CF significantly overestimates real values (x 20-30 times) :

Correlation is 0.1. For train set - same case as with movie ratings: 0 correlation, no visible accuracy. Any hints are highly welcome.

UPDT:

- I substituted continuous values with binary (0 - placement was useless, 1 - placement was useful, I lost information how good was a placement for a campaign but thats ok in my case) - and this improved correlation to 0.42, AUC 0.74 (after some tuning got AUC 0.8), and make prediction accuracy visible and similar to movies rating:

- Logistic regression gives the same accuracy for this dataset (AUC 0.8)
- TSNE of userId embeddings - no visible clusters, no structure.

- For those who are interested in getting deeppppper into CF - A Comparative Study of Collaborative Filtering Algorithms
- After some paper reading I realize
**embeddings start meaning something only if you have really dense matrix. In this case embeddings need to solve complex problem - to fit to multiple varying cross - ratings. If you have highly sparse matrix, say 1 rating for movie-user than CF is not better than any simple algorithm**.