Quick question about the learning rate number:
I was under the impression that the learning rate had to be a relatively small number, usually 0.1 or smaller, and certainly less than 1, as of course you want to gradually descend the gradient and not jump to the other side of any minimums, however towards the end of the linear network set up the learning rate jumps up to 100. Any reason for this?
I am working on the Titanic dataset and wanted to calculate the coefficients instead of generating random ones. I noticed inconsistencies with the results, so I created a notebook on Kaggle that I could share that demonstrates my problem. If I run the notebook, as is, the numbers seem to be consistent, but if I restart the kernel, then run it I get different numbers than before. I also demonstrate in this notebook that pandas, numpy, and pytorch each perform the same action differently. Who can I trust and why is there such diversity in answers?
notebook - Linear model and neural net from scratch
section - deep learning
code:
def calc_preds(coeffs, indeps):
layers,consts = coeffs
n = len(layers)
res = indeps
for i,l in enumerate(layers):
res = res@l + consts[i]
if i!=n-1: res = F.relu(res)
return torch.sigmoid(res)
Observation - it seems in calculating the predictions we are adding constant to each layer compared to the earlier parts in the notebook where we skipped adding constants to the first layer. As per jeremy it is not needed in the first layer (or leaner model) because we have dummy variables for each feature value.
Question - is this a mistake or there is a reason why we are adding constant to each layer including first in deep learning? or in bigger schema of things it doesnât matter if we add constant to each layer for simplicity. you just donât have to add it for first layer if we donât want to.
Hereâs my Pytorch Titanic workbook for anyone else trying to reproduce it.
I learnt quite a lot going through this, although I couldnât quite get the final deep learning section working which dropped to <60% accuracy.
Procs in TabularPandas function:
Can someone tell me functions in procs other than [Categorify, FillMissing, Normalize]. & by default FillMissing uses median replace NA values how can change it other like mean.
Exactly same question from me as maritanap.
Interested in any insights regarding the change from the single hidden layer in the Neural Network section not having the const added versus the two hidden layers in the Deep Learning section having the consts added.
I copied verbatim the code from the Titanic âframeworksâ notebook and tried to apply it to the âSpaceship Titanicâ Kaggle competition, but my neural net wonât train. Am I missing something?
If you try to run the lesson5 notebook locally on your computer (like I did), the latest version of pandas API has changed the default values of the get_dummies call from integers to bools dtype, which causes later steps to fail when creating a tensor:
pd.get_dummies(df, columns=["Sex","Pclass","Embarked"])
Change this to:
pd.get_dummies(df, columns=["Sex","Pclass","Embarked"], dtype=np.uint8)
Iâm trying to understand Jeremyâs approach to scoring the binary split outputs
def _side_score(side, y):
tot = side.sum()
if tot<=1: return 0
return y[side].std()*tot
He mentioned that the lower score is better and that the score is not really valuable if the âsideâ is small, thatâs why we multiply by tot
.
But, if I understand correctly, small group with small standard deviation will actually get really small score, hence will be considered as âbetterâ than a larger group with the same standard deviation.
Am I missing something? Shouldnât we actually divide by tot
instead of multiplying by it?
I think I was confused on the same point you have mentioned in this lesson but I think what resolved my confusion was that he defines score
as the average of both sides:
def score(col, y, split):
lhs = col<=split
return (_side_score(lhs,y) + _side_score(~lhs,y))/len(y)
And so if a particular split causes one side (letâs say the left-hand side lhs
) to be very small with a small standard deviation you are correct that it will have a small _side_score
but then the right hand size being much larger will have a much larger _side_score
. Taking the average of the two _side_score
s takes into account such imbalances (from what I understood).
In the âwhy you should use a frameworkâ notebook in the ensembling section
ens_preds = torch.stack(learns).mean(0)
The first five elements in the ens_preds gives
tensor([[0.8843, 0.1157],
[0.5729, 0.4271],
[0.9334, 0.0666],
[0.8949, 0.1051],
[0.2978, 0.7022]]))
I thought the predictions are for the classes [survived, not_survived]. but from the code:
tst_df[âSurvivedâ] = (ens_preds[:,1]>0.5).int()
It seems itâs rather [not_survived, survived], how can I confirm what it actually is.
I tried looking at the documentation for get_preds but didnât understand it.
You can see the dependent (y) variable categories using dls.vocab
like so:
Thank you
I know that others have had this problem, but I think itâs evolved over the years. Iâm having trouble downloading âBlue Book for Bulldozersâ data. When I go to accept the rules of the competition at Blue Book for Bulldozers | Kaggle, nothing shows up. I shared the error I get in my console below (across browsers).
Is there any other way to access this data? Does someone have it saved somewhere?
I verified my phone number and identity on kaggle, so thatâs not the issue.
Hi, I wanted to let you know I have had the same issue. Were you able to solve your issue?
My best guess is that the ârulesâ of the competition are no longer able to be accepted. Kaggle requires accepting the rules of the competition to download data, but there does not seem to be an âaccept rulesâ button on the rules page currently
I am hoping someone can reply here with a solution, because it feels like the book chapter 9 is bricked if Kaggleâs data is inaccessible which feels a little inconvenient. Other issues I have come across on the forum seem to stem from API/Key issues, and I have verified that works on some other Kaggle pages (like the notebook from Lesson 4). I likewise confirmed my issue between Kaggle, CoLab, and a local jupyter notebook.
I may be missing something but my best guess is that by not being able to accept the rules, we donât have permissions to download the data.
For anyone (and @leeps) also experiencing this issue with Kaggle for downloading the bulldozer data in chapter 9, this post contains a solution to the issue:
These are the instructions:
Really simple solution but really easy to overlook, especially when running the book digitally. I donât know if anyone on the editorial team for the book follows up on these issues, but this may be one small thing to improve regarding the book experience.
My best guess as to why the code works in the video is that the default for get_dummies() dtype parameter was int at the time and that the default is now bool.
Your guess seems to be right! Source