Lesson 3 - A Nnet from scratch on Kaggle

Daijobu · January 8, 2024, 11:06am

Hi !

First post that must come with thanks for the amazing resources that FastAI and @jeremy are offering to the world with this course. I started my training a few weeks ago and I’m trying to reproduce the Excel Titanic exercise with Python as Jeremy suggested in the video.

I’m not so far I think. But I’m struggling with a few questions that I wanted to submit to the community.

Here where I am now:

Parameters definitions

layers = 2
seed = 42
learning_rate = 0.15

#Params
np.random.seed(seed)
params = np.random.uniform(-1, 1, size=(9,layers))
params = torch.tensor(params)

Model

from IPython.display import display


def relu(x):
    x = torch.tensor(x, requires_grad=True)
    y = torch.matmul(x,params)
    y = torch.clip(y,0.)
    y = torch.sum(y,dim=0)
    y.backward()
    return [y,x]

def model(df):
    result_matrix = []
    gradient_matrix = []
    for index, passenger in df.iterrows():  
        pid = int(passenger["PassengerId"])
        survived = passenger["Survived"]
        passenger_matrix = passenger.drop(["PassengerId","Survived"])
        result,passenger_matrix = relu(passenger_matrix)
        loss = (result - survived).pow(2)
        tensor_result = result
        result_matrix.append([pid,tensor_result,loss])
        gradient_matrix.append(passenger_matrix.grad)
    result_matrix = torch.tensor(result_matrix)
    mean_loss = torch.mean(result_matrix[:, 1])
    mean_gradients = torch.mean(torch.stack(gradient_matrix), dim=0)
    return [mean_loss,mean_gradients]

def check_error_rate(validation_df,show_top_losses = False):
    success = 0
    accuracy = []
    prediction = []
    validation_df["Prediction"] = 0
    validation_df["Loss"] = 0
    for index, passenger in validation_df.iterrows():
        passenger_matrix = passenger.drop(["PassengerId","Survived","Loss","Prediction"])
        result,passenger_matrix = relu(passenger_matrix)
        validation_df.at[index, 'Loss'] = (result.item() - passenger["Survived"])**(2)
        validation_df.at[index, 'Prediction'] = result.item()
        if(torch.round(result) == passenger["Survived"]): 
            success += 1
    print(f"Success :{success}/{len(validation_df)} - {round((success/len(validation_df))*100)}%")
    if(show_top_losses):
        print("Top Losses")
        losses = validation_df.sort_values(by='Loss',ascending=False)
        display(losses.head(10))

def fine_tune_params(mean_gradients):
    i = 0
    for param in params:
        params[i] -= mean_gradients[i]*learning_rate
        i += 1

def train(df,cycles):
    i = 0
    while i < cycles:
        i += 1
        loss,gradients = model(df)
        print(f"Round {i} Loss: {loss}")
        fine_tune_params(gradients)
        check_error_rate(validation_df)
    params_df = pd.DataFrame(params.T)
    display(params_df)

The full notebook is on Kaggle.

And now the questions:

By looking at my top losses based on my validation dataset, I noticed that there are all mostly passenger that survived. In fact, after a few rounds, the model do not predict any survivor anymore I guess, this is due to the fine tuning of my params

def fine_tune_params(mean_gradients):
    i = 0
    for param in params:
        params[i] -= mean_gradients[i]*learning_rate
        i += 1

That part is not explained in the video that relies on excel built-in functions to do the gradient descent. So I tried something from what is explained earlier but I’m not sure that it is right : I use the mean gradient of all results and I substract it (times the learning rate) to the param.
I took the loss function from the excel sheet. I have the intuition that it should be somehow related to the fine tuning of the parameters : the change in the params should be related to the importance of the loss. Yet, I’m only using the gradient of the relus’ y :

def relu(x):
    x = torch.tensor(x, requires_grad=True)
    y = torch.matmul(x,params)
    y = torch.clip(y,0.)
    y = torch.sum(y,dim=0)
    y.backward()
    return [y,x]

Also @Jeremy hinted in the video that “It would be cool if we had some way” of constraining the results of the prediction between 0 and 1. It is already the case with the floor here : no result can go under 0 but it could go over 1. My intuition is that this tweeks the results towards 0 over time. Should I add a line to cap the results at 1 ?

Thanks for your help, and again, for the amazing work of FastAI and its community.

Kind regards,

Daij

Daijobu · January 9, 2024, 3:37pm

Hi everyone,

I kept working and found out where my problem was, so I post it here for future references:

Mostly, my intuition was good: the fitting of the parameters was supposed to be done on the basis of the loss derivative.
I edited the model functions (and simplified it) so that .backward() would be call on the loss and not on the result of the relu (as it was the case before, see my previous post).

It is now working better and the results after a few runs are more convincing.

The notebook is updated accordingly : https://www.kaggle.com/daijobuai/fastai-course-lesson-3-exercise

I am still wondering two things:

Should I clamp() my results to constrained them under 1 ? I tried, there is no clear indication that it lead to a better result but I’m still in the dark on the “theory” of it.
My results are still over-predicting death, so the model tend towards 0 and I don’t know why. I added a kind of “homemade” confusion-matrix at the bottom of the notebook that show this. Any clue on what I can do to fix this ?

Daij

Daijobu · January 10, 2024, 4:21pm

Third and probably last update on it as I discovered that this exercise was exactly the topic of the 5th lesson.

Got the answers to my questions. The first one was the sigmoid function, the second one was that my gradient was calculated on the results and not on the params ;
The whole thing is a bit overkill as the main loop in the model function is not necessary.

Hope this could help future students

Best,

Daij

Deco354 · January 23, 2024, 7:44pm

I did exactly the same thing and I’ve only just realized there’s an actual lesson on this

I’m looking forward to moving on to chapter 5 and seeing a more efficient way to do this.