Lesson 6 - Official topic

sgugger · April 22, 2020, 3:41am

The matrix is not full, it is sparse. (I mean the matrix from users/movies to ratings)

jwuphysics · April 22, 2020, 3:41am

This is a general problem in AI that doesn’t have easy answers. Recognizing out-of-distribution samples is difficult and usually requires a multi-pronged approach (i.e., some sort of unsupervised learning). See, for example, this paper on discovering new categories, or this one on the issue of “domain adaptation” (when your training data set is shifted from the test set in some way).

zmd · April 22, 2020, 3:42am

but svd is still possible on a non full rank matrix

marii · April 22, 2020, 3:43am

I think Jeremy might talk about this, but I think practically speaking these latent factors are much more important and results in a much simpler model. Video based Architectures tend to be very heavy.

champs.jaideep · April 22, 2020, 3:43am

do we need to always build the path for image and pass to image block. earlier fastai was doing it by itself building the path +item+suffix

giacomov · April 22, 2020, 3:44am

aha! Got him! He run out of batteries… so he IS an AI

jwuphysics · April 22, 2020, 3:45am

It depends on which API you feel like using. For example, in the mid-level API, you can do something like

block = DataBlock(
    [...]
    get_x=ColReader(col_name, pref=f'{PATH}/to/images/', suff='.jpg'),
    get_y=[...]
)

sfyash · April 22, 2020, 3:45am

is “@” for matmul overloaded for pytorch or fastai.core or base python?

victor.vargas · April 22, 2020, 3:46am

dumb question would collaborative filtering be similar to svd for finding similar documents or similarity in a corpus for NLP?

sgugger · April 22, 2020, 3:46am

It comes from PyTorch.

champs.jaideep · April 22, 2020, 3:47am

thanks…
in docs do we have the usage shown for various versions of datablocks like from df,from func etc etc which we had in earlier version

steef · April 22, 2020, 3:48am

What might help here is converting from e.g. RGB to CMYK – both of which are standard color models that are used commonly. CMYK is more typical for print whereas RGB is more typical for computer displays – but often they can be used interchangeably and conversion is pretty easy.

giacomov · April 22, 2020, 3:48am

Isn’t that crazy expensive? a o(n**2) operation against a o(1) operation?

giacomov · April 22, 2020, 3:50am

ah, ok, answered already in the class

zevarela · April 22, 2020, 3:50am

dunder = double under[score]

pinaki · April 22, 2020, 3:50am

Does DNN based models for Collab Filtering work better than more traditional approaches like SVD / other Matrix Decomposition ?

jwuphysics · April 22, 2020, 3:51am

Yep, you should definitely look these up in the documentation! For vision the DataBlock/DataLoaders page is very straightforward, e.g., http://dev.fast.ai/vision.data#ImageDataLoaders.from_df

wittmannf · April 22, 2020, 3:53am

Thanks for the detailed explanation @jwuphysics! That was my point - AFAIK it’s not that easy/trivial to acknowledge an unknown class and I wouldn’t expect that simply by using multi-label this problem would be solved. Unless the BCEWithLogitLoss loss function mentioned by @imrandude is robust enough for such handling such type of situation. If yes, then that would be great news! But it seems that I will have to test by myself to check what happens.

pinaki · April 22, 2020, 3:53am

Can anyy Matrix Factorizations be modeled as (Deep) Neural Network ? A papers that explains this ?

zevarela · April 22, 2020, 3:56am

In theory NN can approximate any function, so it might be feasible… I don’t know of any papers for matrix factorization.