Lesson 7 official topic

Is it much harder to add a second head for a different task? Like one for regression and one for classification?

2 Likes

Would a multi-target model make sense for an NLP model, compared to having a string with fields separated by xxfld and mark_fields=true?

Not at all.
As long as you change the loss function you are good.
I mean let’s assume for the sake of it that we want to predict the height of the rice plant in cm, together with the disease.
Then you’ll need 3 things:

  1. 11 network’s outputs = 10 for classification (diseases) + 1 for regression (plant height)
  2. Adapt the loss function so that the 11th output is evaluated with RMSE loss instead of Cross Entropy
  3. Adapt the dataloader to include the height of the plant

I think that’s it

4 Likes

Can the number of latent factor / user or movie embedding smaller or larger?
How will this affect the prediction?

1 Like

Can recommendation systems be built based on average ratings of users’ experience of, for example, a product? So like instead of collaborative filtering. Just thinking…

1 Like

So like when should collaborative filtering be used for recommendation systems, versus the average ratings (if possible) that I mentioned above? Best practices?

It looks like Collaborative Filtering is dependent on being able to fit everything in one big dataframe. Is there a good way to scale this with more data than can fit in a single dataframe at once, taking into account the machine’s memory?

2 Likes

Has nobody done a hyperparameter optimisation/exploration for embeddings for collaborative filtering?

1 Like

You can try that. I think you can use Non negative matrix factorization as a baseline. However, you’ll need to know the number of movies prior then the empty cells would be zero. On the other hand, your matrix should not have negative numbers.

You can use the method that was introduced called gradient accumulation. Other cool things could be using pytorch more or numpy more and sidestep using pandas. In addition, spark and dask could help too.

2 Likes

What if we have a kind of network among the users, and/or among the items? For example a hierarchy of products, or a prior grouping of users. What would be a good way to make use of that information to improve the results?

So if the combined shape doesn’t suit the two different tasks you just need to waste a few indexes in a larger array that fits them both?

My understanding is that Gradient Accumulation has to do with loading and training on the model. I’m referring to simply creating the dataframe. Opening a 100GB csv file for example.

Is there any equivalent to opening text files in a folder like with NLP? That way it doesn’t all need to be loaded into memory at once.

Oh thanks for the clarification. You can either load the dataset in batches (see chunksize argument in pandas docs) https://towardsdatascience.com/loading-large-datasets-in-pandas-11bdddd36f7b or use the the libraries I have mentioned spark and dask which mostly breakdown the work pass them to workers which are coordinated.

4 Likes

As far as I understand. Data frames(pandas) can already handle much more data than hardware could process. The limit is not about the data frame but hardware.

1 Like

I understood the OP’s question to be about how to deal with datasets larger than can fit in memory for a given architecture (be it 32, 64, 128bit etc.), as that would be the upper limit of what any software can do (aside from OS level limitations, 32bit systems only being able to handle addresses / files up to 2^32 etc.)

I always find loading up everything into memory to be problematic, but I’m old skool that way. I got my wrist slapped in a systems class for writing a vi clone which tried to load more than 1kb of the underlying file at a time (“because you cannot expect the system to have enough RAM to load all your stuff at once”) … :slight_smile:

2 Likes

Not sure I understand. Can you please elaborate?

So if I had an existing task like segmentation where the output is a tensor, how would I add an additional set of different shaped outputs. Surely it would make sense to go further up the head and add a second head?

1 Like

Segmentation is a bit of a special case, since the best approach will be something like a u-net, which does indeed need a very different head.

But in general you could just flatten the outputs into a vector, and do what @FraPochetti described.

3 Likes

Okay thanks. I’ll ask more detailed questions when I get closer to trying it.

2 Likes