“None of [Index([‘SalePrice’], dtype=‘object’)] are in the [columns]”

Keep getting above error when creating the Tabular Object, would anyone know why? Thanks

Thanks. It worked when I changed to ‘.write_text()’.

I got the similar error and it worked before. This is the 09_tabular.ipynb and run on google Colab. Any fastai update on the fastbook?

AttributeError Traceback (most recent call last)
in () ----> 1 (path/‘to.pkl’).save(to)
AttributeError: ‘PosixPath’ object has no attribute ‘save’

Same problem with collab and fastai == 2.0.16.
I’m not sure if it is the right way to do it, but I found save_pickle and load_pickle functions in fastcore/utils.py by first trying to unsuccessfully lookup save method in github search and then searching github for “pickle” directly.

save_pickle(path/‘to.pkl’, to)
to = load_pickle(path/‘to.pkl’)


Thanks for the information. It worked for the problem.


Lesson 7 - chapter 9 in the book - Tabular data
Is there any value/advantage running the random forest on GPU compare to CPU? I can understand the value of GPU when we do mathematical manipulation on many small objects (i.e. image) simultaneously but for large tabular dataset maybe CPU could be better…
Appreciate your clarification

I don’t think that sklearn supports GPU. New fast.ai tabular object is a way to prep data for further processing, but the classical ML sklearn library doesn’t support GPU. But it is nice to have your data prepared for any approach in the tabular object, so you can try some deep learning model after RF.

uploading Kaggle data through api problems. Well I signed in at Kaggle, accepted the competition rules and installed kaggle in the terminal, I can download the zip file manually, the api download downloads no files. So I eventually just upload the zip and unzipped it in paperspace and continuing on…lol

This make very little sense to me…
'However, in this case Kaggle tells us what metric to use: root mean squared log error (RMSLE) between the actual and predicted auction prices. We need do only a small amount of processing to use this: we take the log of the prices, so that rmse of that value will give us what we ultimately need:

dep_var = ‘SalePrice’

df[dep_var] = np.log(df[dep_var])’

‘Sales price’ (dep_var) is what is being predicted, so where is the difference to the original? Where is SalesPredicted - Sales. I feel so lost again…

This code from the Notebook finds the most similar movie:

movie_factors = learn.model.i_weight.weight
idx = dls.classes['title'].o2i['Silence of the Lambs, The (1991)']
distances = nn.CosineSimilarity(dim=1)(movie_factors, movie_factors[idx][None])
idx = distances.argsort(descending=True)[1]

Any Idea on what to change to find like say the 5 most similar movies instead of one?

Sure, you can use this code to get the 5 most similar movies:

idx = distances.argsort(descending=True)[1:6]

The argsort method returns a list of the movie id’s sorted descending by similarity. The most similar movie is the movie itself at index 0, so starting from index 1, there are the other movies.


Thanks!! @johannesstutz :smiley:

Hey guys. So I am doing some experimentation on the Collab Notebook.

learn.fit_one_cycle(5, 5e-3)

Here Jeremy used 5x10-3 as the max learning rate. So I was trying to find out why he used that exact number so I ran lr_find and tried to use a different learning rate. The suggested one was 4x10-6, but when I used it, the model losses were way worse (13.5 instead of 0.87 using Jeremy’s Learning rate)

Does anyone know why this happens? Or how to find an optimal learning rate for the DotProduct model?

I am also having the same confusion. I mean how do you determine which matrices to use?

The step you cited replaces the values in the SalePrice column (which are in absolute US dollars I think) with the logarithm of the sale price. The reason for this is that the metric that the competition used is on a log scale (root mean squared log error). So if we just convert the dependent variable to a log scale, we can use the (standard) RMSE error and we’re good.

SalesPredicted - Sales: I’m not sure what you mean by that. The loss for every row is determined by the RMSE function, which takes the predicted value and the true value from the SalePrice column as arguments.

Let me know if that helped a little :slight_smile:


Hi everyone, I’m working on using the entity embeddings of the neural net to improve random forest results. This is all in the chapter 09_tabular notebook with the bulldozer bluebook dataset.

The first stumbling block: I don’t quite get the dimensions of the embeddings. Every categorical variable should gets its own embedding layer. This seems right:

embeds = list(learn.model.embeds.parameters())

len(embeds) as well as len(cat_nn) is 13.

Now my understanding was that the first dimension of the embedding layer is equal to the number of levels for the variable. The other dimension is determined by a heuristic that works well in practice.

However, these numbers don’t match.

for i in range(len(cat_nn)):
    print(embeds[i].shape, df_nn_final[cat_nn[i]].nunique())

Gives following result:

torch.Size([73, 18]) 73
torch.Size([7, 5]) 6
torch.Size([3, 3]) 2
torch.Size([75, 18]) 74
torch.Size([4, 3]) 3
torch.Size([5242, 194]) 5281
torch.Size([178, 29]) 177
torch.Size([5060, 190]) 5059
torch.Size([7, 5]) 6
torch.Size([13, 7]) 12
torch.Size([7, 5]) 6
torch.Size([5, 4]) 4
torch.Size([18, 8]) 17

Where does the mismatch come from? Am I maybe using the wrong dataframes or do I have a wrong conception about embeddings?

Thank you!

Thanks johannesstutz

Yes that helped alot. I will continue my fumbling through the code.

Though I have hit my next error already…

Everything, even the Kaggle download worked up until the line

Which throws the traceback
AttributeError Traceback (most recent call last)
----> 1 (path/‘to.pkl’).save(to)

AttributeError: ‘PosixPath’ object has no attribute ‘save’

I did some googling and found

which seems to says that this error is raised when path is used on a linux system which then defaults to PosixPath object that has no ‘Save’ attribute or method.

Researching more - anyhelp appreciated.

There was a breaking change in the source code:

