Part 1, online study group

Highly interested. expecting an invitation. Thanks

You can join our open slack group. Link is in the first post of this thread. The timings of the next meetup will be updated soon. :slight_smile:

Hi hi, just a reminder, the meetup is today at 4PM GMT ( 8AM PST = 9:30PM IST = 11AM EST) :slight_smile: It is dedicated to NLP and based on Lesson 4. Click to the Zoom link when it is a time.

1 Like

The meetup is starting in ~11mins, zoom link is on!

This worked when I tried few months back, not sure if this still works but you can try this hack to automatically reconnect colab notebook. https://stackoverflow.com/a/58275370/5951165

2 Likes

Meeting Minutes for 01/26/2020

Presentation on Lesson 4

Questions

  • What does np.random.seed(2) do?
    This post & this one addresses them

The ImageDataBunch creates a validation set randomly each time the code block is run. To maintain a certain degree of reproducibility the np.random.seed() method is built-in within the fastai library.

  • How to stop Google Colab from disconnecting?
    @vijayabhaskar shared this in the previous comment

Challenges

  • Many participants shared the common challenges that they face. This deserves a separate post (will summarize this & possible options soon)

Misc

  • Organizational stuff

Resources

2 Likes

I wrote my first post using the fast_template shared by Jeremy, which uses Github pages and Jekyll. I hope it can be helpful to people.

5 Likes

Hi gagan hope you’re having a wonderful day!

I found your first post informative and a real joy to read.

Cheers mrfabulous1 :smiley: :smiley:

1 Like

Thanks @mrfabulous1!! I’m glad you found it useful. !!

Hi people!!! As part of this study group, we are starting an algorithms meetup to hone our expertise in using data structures and algorithms, which can be useful for interviews as well.

Preparing for Leet-code styled coding interviews can be a very challenging task because the material is scattered and finding the perfect explanation for the problem can take time. I, along with a friend prepared for these interviews and I intend to cover some patterns that we learnt, (related to data-structures and algorithms) that were useful to us. We both got a new job after weeks of preparation and iteratively figuring out how not to fail. Please note that I will be just sharing my experience and by no means am I an expert (yet ). I hope my experience will help others in solving such coding problems and nailing that interview!!!

People who are interested can join the slack for our study-group using the link in the first post of this thread. (We would be using the #coding_interview_prep channel for this specific purpose)

3 Likes

Just a reminder, there is a meetup today, at 4PM GMT :wink: We will focus on Lesson 4!

1 Like

Hello all! first time in this meetup… just started lesson 4 today :slight_smile:

4 Likes

Right on time @oscarb :slightly_smiling_face:

Meeting Minutes of 02/02/2019

Presentation on Lesson 4 (Tabular and Collaborative Filtering)

Presenter: @Tendo

Thanks to @Tendo for the wonderful Colab notebooks!

Questions

Tabular Data:
  • What are the heuristics or the formula for determining the size of the hidden layers for the tabular learner?

    learn = tabular_learner(data, layers=[200,100], metrics=accuracy)

    • Forum thread for reference and possible further discussion linked below in Resources
  • In Tendo’s notebook, total size of training set was 3256, so if we choose rows 800-1000 to be our validation set, that means, with 200 samples, we have a validation set that is around 6% of the training set. Is that enough?

    test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names, cont_names=cont_names)

    • I didn’t quite gather if we fully resolved this in the discussion
    • Also, why 800-1000? Can we not achieve a more random split by using ratio/percentage like in sklearn?
      • one reason could be that we want a contiguous set for our validation, because much like, video frames, if we have adjacent frames, one in training, one in valid, then our model is not learning anything - it is cheating
      • Any other explanations? Is 6% enough?

Collaborative Filtering:

  • How do I differentiate between when to use collaborative filtering vs tabular?
    • A thought experiment. Taking the ‘US Salary’ example of Tabular, could I instead run Collaborative Filtering on that and come up with a recommendation for a salary?
    • Basic intuition for this is to look at it as:
      • Tabular :: Supervised
      • Collaborative Filtering :: Unsupervised
  • What are n_factors?
    • They are the hidden features that the model learns after training
      • For example, deciding that some movies are family-friendly vs others not. Family-friendliness is one of the n_factors.
    • So, while we set up the learner, is the number of n_factors we choose one of the hyperparameters?
      • It could affect speed and accuracy, but need more experiments to determine.

Resources

Jeremy’s tweet on Tabular:

6 Likes

Awesome work @shimsan

1 Like

Thank you @shimsan!

1 Like

Just a reminder, we are having a meetup tomorrow(Sunday) at 4PM GMT. We will focus on projects showcase. This is the time for you to show off all your cool projects/get inspiration from others :slightly_smiling_face: To join just use the same zoom link when the time will come.

1 Like

The meetup will start in ~15 mins :partying_face: Join zoom !

Overview of Gradient Descent

What is Gradient Descent(GD)?

  • It is a type of optimization algorithm to find the minimum of a function (loss function in NN).

Nice Analogy for understanding GD :

  • A person stuck in the mountain & trying to get down with minimal visibility due to fog (Source : Wikipedia).

Algorithm

Source: [1]

Variants of Gradient Descent

Source [2]

  • Stochastic Gradient Descent: weights updated using one sample at a time hence batch_size is 1, for 100 samples, weights updated 100 times
  • Batch Gradient Descent: weight updated using the whole dataset, for 100 samples, weight updated only once
  • Mini Batch: middle ground and combination of the above two. Splits the dataset into the batch size of samples of our choice & chosen at random

[1] https://medium.com/@divakar_239/stochastic-vs-batch-gradient-descent-8820568eada1
[2] https://suniljangirblog.wordpress.com/2018/12/13/variants-of-gradient-descent/

I hope this clarifies the different variants of gradient descent.

Lesson 5 - Questions

Audience: Beginner-Intermediate

If you have watched lesson 5 only once/twice, try testing your understanding using the below questions. If you can answer the below questions in two/three sentences, then you have a good understanding of lesson 5 concepts. Else consider reviewing the lecture/notes once again before moving on.

  • Why ReLUs are needed in the Neural Networks(NN)?
  • Is Affine function a linear function?
  • Does Bias-Variance trade-off happen in Deep Learning as well?
  • What is a Variance?
  • Do too many parameters in NN mean higher variance?
  • Why freeze is needed for fine-tuning? What happens when we freeze?
  • Why unfreeze is needed & train the entire model?
  • Can you explain how learning rates are applied to the layers in each of the below cases
    • 1e-3
    • slice(1e-3)
    • slice(1e-5, 1e-3)
  • Can you identify the 3 different variants of GD? How much of training samples are used & when weights are computed in each of the variant? Does Stochastic gradient descent mean using mini-batches & updating loss after each mini-batch?
  • How/When do you update weights and describe the sequence of operations?
  • What is Learning Rate (LR) annealing ? Why are we applying LR?
  • Why are we applying the exponential before softmax?
  • What is the difference between a loss function & a cross function?
  • What is the difference between epoch and iteration?
  • Why do we need a cyclical learning rate? And what happens to momentum during one cycle?
  • What are entropy and softmax?
  • When to use cross-entropy instead of, say, RMSE?
2 Likes