Curious general question: how do you guys keep track of changes in your model. For instance, when you change parameters or change resnet#. Do you guys save the graph in word document, use a software to compare different runs or what do you guys do?
Personally, i had been trying to use Wandb to compare different models but still trying to learn how to use different features. For instance, if i wanted for it to grab a graph of a certain run and the others to compare how its changing.
Hi hi, just a reminder, the meetup is today at 4PM GMT ( 8AM PST = 9:30PM IST = 11AM EST) It is dedicated to NLP and based on Lesson 4. Click to the Zoom link when it is a time.
This worked when I tried few months back, not sure if this still works but you can try this hack to automatically reconnect colab notebook. https://stackoverflow.com/a/58275370/5951165
@Tendo also shared Colab for Amazon Review Sentiment Analysis
Questions
What does np.random.seed(2) do?
This post & this one addresses them
The ImageDataBunch creates a validation set randomly each time the code block is run. To maintain a certain degree of reproducibility the np.random.seed() method is built-in within the fastai library.
How to stop Google Colab from disconnecting? @vijayabhaskar shared this in the previous comment
Challenges
Many participants shared the common challenges that they face. This deserves a separate post (will summarize this & possible options soon)
Hi people!!! As part of this study group, we are starting an algorithms meetup to hone our expertise in using data structures and algorithms, which can be useful for interviews as well.
Preparing for Leet-code styled coding interviews can be a very challenging task because the material is scattered and finding the perfect explanation for the problem can take time. I, along with a friend prepared for these interviews and I intend to cover some patterns that we learnt, (related to data-structures and algorithms) that were useful to us. We both got a new job after weeks of preparation and iteratively figuring out how not to fail. Please note that I will be just sharing my experience and by no means am I an expert (yet ). I hope my experience will help others in solving such coding problems and nailing that interview!!!
People who are interested can join the slack for our study-group using the link in the first post of this thread. (We would be using the #coding_interview_prep channel for this specific purpose)
Forum thread for reference and possible further discussion linked below in Resources
In Tendo’s notebook, total size of training set was 3256, so if we choose rows 800-1000 to be our validation set, that means, with 200 samples, we have a validation set that is around 6% of the training set. Is that enough?
test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names, cont_names=cont_names)
I didn’t quite gather if we fully resolved this in the discussion
Also, why 800-1000? Can we not achieve a more random split by using ratio/percentage like in sklearn?
one reason could be that we want a contiguous set for our validation, because much like, video frames, if we have adjacent frames, one in training, one in valid, then our model is not learning anything - it is cheating
Any other explanations? Is 6% enough?
Collaborative Filtering:
How do I differentiate between when to use collaborative filtering vs tabular?
A thought experiment. Taking the ‘US Salary’ example of Tabular, could I instead run Collaborative Filtering on that and come up with a recommendation for a salary?
Basic intuition for this is to look at it as:
Tabular :: Supervised
Collaborative Filtering :: Unsupervised
What are n_factors?
They are the hidden features that the model learns after training
For example, deciding that some movies are family-friendly vs others not. Family-friendliness is one of the n_factors.
So, while we set up the learner, is the number of n_factors we choose one of the hyperparameters?
It could affect speed and accuracy, but need more experiments to determine.
Just a reminder, we are having a meetup tomorrow(Sunday) at 4PM GMT. We will focus on projects showcase. This is the time for you to show off all your cool projects/get inspiration from others To join just use the same zoom link when the time will come.