Lesson 7 official topic

madhavajay · June 28, 2022, 9:04am

Is it much harder to add a second head for a different task? Like one for regression and one for classification?

checkmate404 · June 28, 2022, 9:06am

Would a multi-target model make sense for an NLP model, compared to having a string with fields separated by xxfld and mark_fields=true?

FraPochetti · June 28, 2022, 9:17am

Not at all.
As long as you change the loss function you are good.
I mean let’s assume for the sake of it that we want to predict the height of the rice plant in cm, together with the disease.
Then you’ll need 3 things:

11 network’s outputs = 10 for classification (diseases) + 1 for regression (plant height)
Adapt the loss function so that the 11th output is evaluated with RMSE loss instead of Cross Entropy
Adapt the dataloader to include the height of the plant

I think that’s it

Sunny8192 · June 28, 2022, 9:31am

Can the number of latent factor / user or movie embedding smaller or larger?
How will this affect the prediction?

Zakia · June 28, 2022, 9:32am

Can recommendation systems be built based on average ratings of users’ experience of, for example, a product? So like instead of collaborative filtering. Just thinking…

Zakia · June 28, 2022, 9:33am

So like when should collaborative filtering be used for recommendation systems, versus the average ratings (if possible) that I mentioned above? Best practices?

checkmate404 · June 28, 2022, 9:35am

It looks like Collaborative Filtering is dependent on being able to fit everything in one big dataframe. Is there a good way to scale this with more data than can fit in a single dataframe at once, taking into account the machine’s memory?

Tamori · June 28, 2022, 9:36am

Has nobody done a hyperparameter optimisation/exploration for embeddings for collaborative filtering?

radikubwa · June 28, 2022, 9:36am

You can try that. I think you can use Non negative matrix factorization as a baseline. However, you’ll need to know the number of movies prior then the empty cells would be zero. On the other hand, your matrix should not have negative numbers.

radikubwa · June 28, 2022, 9:41am

You can use the method that was introduced called gradient accumulation. Other cool things could be using pytorch more or numpy more and sidestep using pandas. In addition, spark and dask could help too.

msp · June 28, 2022, 9:57am

What if we have a kind of network among the users, and/or among the items? For example a hierarchy of products, or a prior grouping of users. What would be a good way to make use of that information to improve the results?

madhavajay · June 28, 2022, 9:57am

So if the combined shape doesn’t suit the two different tasks you just need to waste a few indexes in a larger array that fits them both?

checkmate404 · June 28, 2022, 9:59am

My understanding is that Gradient Accumulation has to do with loading and training on the model. I’m referring to simply creating the dataframe. Opening a 100GB csv file for example.

Is there any equivalent to opening text files in a folder like with NLP? That way it doesn’t all need to be loaded into memory at once.

radikubwa · June 28, 2022, 10:02am

Oh thanks for the clarification. You can either load the dataset in batches (see chunksize argument in pandas docs) https://towardsdatascience.com/loading-large-datasets-in-pandas-11bdddd36f7b or use the the libraries I have mentioned spark and dask which mostly breakdown the work pass them to workers which are coordinated.

nikem · June 28, 2022, 5:36pm

As far as I understand. Data frames(pandas) can already handle much more data than hardware could process. The limit is not about the data frame but hardware.

mike.moloch · June 28, 2022, 6:48pm

I understood the OP’s question to be about how to deal with datasets larger than can fit in memory for a given architecture (be it 32, 64, 128bit etc.), as that would be the upper limit of what any software can do (aside from OS level limitations, 32bit systems only being able to handle addresses / files up to 2^32 etc.)

I always find loading up everything into memory to be problematic, but I’m old skool that way. I got my wrist slapped in a systems class for writing a vi clone which tried to load more than 1kb of the underlying file at a time (“because you cannot expect the system to have enough RAM to load all your stuff at once”) …

FraPochetti · June 28, 2022, 7:40pm

Not sure I understand. Can you please elaborate?

madhavajay · June 28, 2022, 10:54pm

So if I had an existing task like segmentation where the output is a tensor, how would I add an additional set of different shaped outputs. Surely it would make sense to go further up the head and add a second head?

jeremy · June 28, 2022, 11:55pm

Segmentation is a bit of a special case, since the best approach will be something like a u-net, which does indeed need a very different head.

But in general you could just flatten the outputs into a vector, and do what @FraPochetti described.

madhavajay · June 29, 2022, 2:10am

Okay thanks. I’ll ask more detailed questions when I get closer to trying it.