Lesson 7 - Official topic

Thanks @johannesstutz.

I am summarizing my understanding:

Model prediction for a user = sigmoid_range(dot (multiply) product of the embeddings vector (one of the value in user_weight * all the weights of item_weight) + user bias + item bias, *self.y_range)

sigmoid_range(u_weight * i_weight + u_bias + i_bias, *self.y_range)

Referring to the output of learn.model in the 08_collab.ipynb, I am putting it in the matrix multiplication form:
sigmoid_range( matrix(1,50) * matrix(1635, 50) + matrix(944,1) + matrix(1635,1), *self.y_range)

Please do confirm my understanding.

Regards
Ganesh Bhat

This looks good, however for the user bias youā€™ll only want to use the bias for your specific user, so itā€™s just a single value you are adding.
For the multiplication of the weight vectors you could either use elementwise multiplication and take the sum:
(matrix(1,50) * matrix(1635, 50)).sum(dim=1)
or just matrix multiply them, making sure the dimensions match:
matrix(1,50) @ matrix(1635, 50).t()
which makes the second matrix of shape (50, 1635).

I hope this helped, just play around with it, it took me a while to get a feel for the vector and matrix stuff :slight_smile:

When I fit a decision tree on one categorical feature and run scikit-learnā€™s plot_tree, I get a tree diagram that shows splitting using <= rather than equality, which seems to contradict this bit of 09_tabular.ipynb:

Try splitting the data into two groups, based on whether they are greater than or less than that value (or if it is a categorical variable, based on whether they are equal to or not equal to that level of that categorical variable).

Is the passage wrong, or am I misunderstanding something?

Hereā€™s my code:

import matplotlib.pyplot as plt
import pandas as pd
import sklearn.datasets
from sklearn.tree import DecisionTreeRegressor, plot_tree

boston = sklearn.datasets.load_boston()

X = pd.DataFrame(data=boston['data'], columns=boston['feature_names'])
X.loc[:10, "CHAS"] = 2  # adding a third level for generality
X = pd.DataFrame(pd.Categorical(X.loc[:, "CHAS"]))
y = boston['target']

dtr = DecisionTreeRegressor(max_depth=3)
dtr.fit(X, y)

plot_tree(dtr, feature_names=["CHAS"], filled=True)

And hereā€™s the output:

Screen Shot 2020-11-11 at 10.35.45 AM

hey there,iā€™ve got an error when importing fastbook:
name ā€˜log_argsā€™ is not defined

note:iā€™m running the notebook on paperspace

can anyone explain to me in this code the meaning of setting the max_card equal to 1:

cont,cat = cont_cat_split(df, 1, dep_var=dep_var)

does that mean to see all variables as continuous??

Looking at the source code, it defines every column of type ā€œfloatā€ as continuous. Integer columns depend on the cardinality, if max_card is set to 1, then every integer column is treated as continuous as well. Every other column is categorical.

Hello everyone, please can someone help me with this. I donā€™t know what i am doing wrong.

Im running into an error i canā€™t seem to fix. any help will be appreciated.

[Errno 2] No such file or directory: '/root/.fastai/archive/bluebook'

Even though I am following the exact steps as the notebook, I keep on getting this error when I run this code:

if not path.exists():

    path.mkdir()

    api.competition_download_cli('bluebook-for-bulldozers', path=path)

    file_extract(path/'bluebook-for-bulldozers.zip')

path.ls(file_type='text')

Here is my notebook:

EDIT: I SOLVED THIS.

Anyone running into a similar problem change this line

to:

if not path.exists():

    path.mkdir(parents=True)

The problem seems to be coming from the fact that one of the parent folders isnā€™t existing.

@Chikwado This might solve your problem too

3 Likes

Hi Chikwado and jimmiemunyi hope all is well!

I was looking at the code and noticed there is a logical error.

because if you create the path first, before you run the code, the three instructions below

if not path.exists():

will not run.

if not path.exists():
    path.mkdir()

api.competition_download_cli('bluebook-for-bulldozers', path=path)
file_extract(path/'bluebook-for-bulldozers.zip')

The code should probably be as above.

hope this helps.
Cheers mrfabulous1 :grinning: :grinning:

4 Likes

Its a late reply but if you have not figured this out and for others -

If you go into the hierarchy.py file and change:
if labels and Z.shape[0] + 1 != len(labels):
to:
if (labels is not None) and (Z.shape[0] + 1 != len(labels)):

And restart kernel for this change to kick in.

Thanks, as you thought I have already done what you have mentioned.

Hello, in chapter 9 lesson 7, i have a question. What exactly does FillMissing do? I ask because i am given to understand that it fills missing values with median of the column, but in the picture i attached we still addressed missing value (despite using FillMissing earlier):

Youā€™re right. For a few minutes, I was mind fucked. But then I realized that the authors have made a logical mistake.

1 Like

I think youā€™re right. This appears to be a typo in the book. max_card=1 doesnā€™t make any sense to me.

cred_path.write_text(creds)
you need to change from write to write_text

path.mkdir(parents=True) you didnot add parents=True

Very helpful reply. Just trying the last two days different things. I understood there is a problem regarding labels because without labels it was plotting, Although labels are just numbers. However, never imagined one should change the actual scipy function. Thank you very much

I agree that max_card of 1 is weird, and it took me a while to figure out what was going on: setting max_card to 1 I got 51 categorical variables, setting it to 9000 I got 60 categorical variables. I then started investigating some of the 51 I got in the first case and I found out that they all had > 1 category.

If you look at the source code for the cont_cat_split(...) function (e.g. here), you see where the trick is: a variable is considered continuous if it has integer values and > max_card occurrences or if it has float values. In the case of the 51 categorical variables, they are all string-valued!

Iā€™m having trouble understanding the answer to this question in the workbook 08 questionnaire:

  1. Why do we need Embedding if we could use one-hot encoded vectors for the same thing?

Embedding is computationally more efficient. The multiplication with one-hot encoded vectors is equivalent to indexing into the embedding matrix, and the Embedding layer does this. However, the gradient is calculated such that it is equivalent to the multiplication with the one-hot encoded vectors.

I understand the first 2 sentences, though I really donā€™t understand the ā€œgradient is equivalent part.ā€ Can someone share a concrete example of calculating the gradient of the multiplication of the one hot encoded vectors? How can multiplication have any sort of gradient?
Lastly, how does the Embedding class know about the gradient when all itā€™s doing is basically indexing in?