Thread for Blogs (Just created one for ResNet)

I think it’s perfect :slight_smile: What’s your twitter handle?

The explanation using cat whiskers/ears etc was good, it does take a while to ponder over. The fact that not all features are scale-invariant is easier to understand with images. I guess the sequential resizing is kind of adding scale invariance into the CNN system…

1 Like

@jeremy Thanks! My handle is @nikhilbalaji, I did @ to your handle, assuming that I would make any corrections later :slight_smile:

Yes, agreed it will take some time to benchmark + objetivize when and how all this cool new stuff works.

I will have my mind in “atention” mode with all this and will report in the future if I manage to benchmark in some objective way some of this new tools. (Also find diferential learning rates another big “mistery trick” still to fully understand). Well, time is on our side to solve this misteries, one by one …! :grinning:

I think we need to answer the question “why not just use data augmentation” better however…

3 Likes

Hello Everyone,

I’ve written my very first blog post. But before making it public I would like to share it with you guys so that I can get feedback from you :slight_smile: I appreciate your help, thanks in advance !

Best

2 Likes

I think this is great! I’ll jot down them thoughts here as I read:

  • The definition of structured data isn’t really standardized, so you should be clear what you mean here (you do describe it a bit and show an example, but I think you could be more explicit)
  • An embedding is mathematically identical to a 1-hot encoding plus a weight matrix. Since you mention 1-hot encodings, it might be worth mentioning this; also, your description of assumptions made in 1-hot encoding isn’t really correct (since it can be the same as an embedding)
  • It might be a good idea to more directly show the relationship between word embedding and entity embeddings
  • fast.ai’ should be capitalized
  • It’s nice to show the outputs of the code snippets you show - otherwise it’s hard to know what’s going on
  • You may also want to cite the Bengio team’s work in this area: https://arxiv.org/abs/1508.00021

Great job! Let me know when you want us to share it :slight_smile:

Thanks for the feedback, I am going to work on it and make it prettier, then let you :slight_smile:

1 Like

I am having trouble understanding this, since I though embedding was equal to dot product of one hot encoding and learned weight matrix (dot(n x m, m x D)) . And about one hot encoding assumption, I mentioned memory issue and more importantly having equal pair distances among levels. Which assumption is the one that I should change ?

My understanding is like this:

Edit: Apart from one-hot encoding other parts are fixed.

Thanks

I’ve made some changes regarding to your suggestions. But there may be some mistakes or wrong interpretation. Please let me know when you have time to take a look at it. Thank you so much !

Hello everyone,

Here is my contribution in spreading the knowledge.

You can review the 3 most recent posts. Please share your feedback so that I can update posts.

My idea is to create a series of 7 posts covering all possible architectures mentioned in post 1. After completing these 7 posts, I’ll re-visit each of them in the same order, but this time with codes (pytorch, tensorflow) and implementation.

In the meantime, I’m trying to understand pytorch better, so that I can contribute to fastai library development. I would like to request @jeremy to create a dev branch for any feature development. As of now, I am confused between branches and most of the activity occurs on master branch.

Last but not the least, I request you all to suggest possible changes which can better the outcome of this series of posts, which is, “A deeper understanding of Neural Networks”

3 Likes

Ah OK well your understanding is exactly right. I may have read something into your text that wasn’t there, but I kinda thought you were saying 1-hot encoding had some fundamental deficiency in terms of what it could represent.

Yeah I made some changes maybe that was it, thank you so much for your help. Ok now the post is public https://medium.com/@keremturgutlu/structured-deep-learning-b8ca4138b848 and my twitter handle is @KeremTurgutlu.

Thanks !

I’ve created a list of blogs that Jeremy has gone over in the lessons.

If any of these blogs have been written by women, can you let me know? I would like to tweet it out from my Women in Machine Learning & Data Science @wimlds twitter handle. Thanks.

If I’ve forgotten any blogs, or you notice any typos, I would be happy to update.

17 Likes

Thanks @reshama for compiling this!

1 Like

Hello All,
I just wrote my first blog on embedding


I would love to have your feedback.

1 Like

@krishnakalyan3 thanks for sharing :slight_smile: A couple of things that could improve this:

  • Use something like Office Lens to redo your photos. It will clean up the contrast and make them much easier to see. Or alternatively, since you’re mainly showing tables, instead you could create the tables in a spreadsheet and format them nicely, and then take a screenshot of that part of the screen. That can look great!
  • Your description makes it sound like a bit like embeddings handle ordinal variables directly. Perhaps it would be helpful to show how embeddings are actually basically doing one-hot encodings “behind the scenes”
  • It would be nice to show examples of how well they work, or what results they create. Check out some pictures from kaggle winner posts or papers, for instance

Thanks for the feedback :slight_smile:

@everyone
I have created a weekly-ish newsletter where I’ll be sending the cool DL, CV articles/resources that I stumble across every week. I wanted to ask you guys if it’s okay that I share your articles in that?

2 Likes

Very instructive, nice to read and interesting!

About the One Hot encoding assumptions, I also read it twice, and I think I found the -possible- issue; From the post, the sentence that begins it all is:

If we one-hot encode or arbitrarily label encode this variable (…)

So two encodings mixed in one sentence. Label encoding does assume equality of distances in ordered levels, or arbitrary distances in nominals. So I would understand this assumption is made by label encoding, not one hot encoding. The way I understand it OHE is just an embedding of dimension one.

Anyway, thanks for the great post, really worth reading! :slightly_smiling_face:

2 Likes