[Project] Machine Learning Cheatsheet

Hi all,

I’m wondering if people would be interested in starting a ML Cheatsheet website together? I put together a basic prototype based on some of the content I wrote for the fast.ai wiki:

ML Cheatsheet

(Please keep in mind this is a very rough, incomplete first draft and the content has not been reviewed).

The goal is to provide a highly visual, quick reference guide for those with machine learning experience who need a quick refresher on a specific topic.

Here some examples of the various types of content I’m envisioning:

Anyhow, let me know!

18 Likes

Interesting Idea. I’d be willing to contribute.

1 Like

Great idea! I’ll see what I can add to it. I’ve been studying recommender systems a lot lately and need to formalize my thoughts on the topic so this would be a good way to start.

1 Like

It would be great to talk about validation and test sets, and the concept of over-fitting.

2 Likes

Awesome! I just reorganized the table of contents to include sections on Recommender Systems and an intro to model training (overfitting, validation/test sets).

We just had our first contribution from @iNLyze who corrected the pip install instructions on the README. :slight_smile:

1 Like

@brendan How about a section for Tensor Arithmetic Operations ?

1 Like

That sounds great. What did you have in mind?

Ideally we can include short code snippets in vanilla python and numpy.

Tensor operations along different axes:

  1. Binary operations :
    Addition
    Subtraction
    Multiplication
    Concatenation
  2. Unary operations :
    Square
    Add a scalar

etc etc …

1 Like

Oh, this is wonderful. I’d be willing to contribute to this section. This could possibly become a one-stop shop resource to explore various techniques, various toolsets that can be used on various sizes of datasets (sometimes the data size has a say on the kind of tools you could possibly use), also have a dedicated tips and tricks section one could refer to for each algorithm and a dedicated section on intuitive hyperparameter tuning for each of the above.

For example, if accuracy is the sole important criterion in your problem, you could maybe use a very low learning rate for XGBoost and run it for 10000 iterations whereas, if the same model is required in a production setting to predict in real-time, you’d maybe prefer to have a higher learning rate and have just 50-100 iterations depending on latency requirements. This is just an example. A lot of open source tools are coming up and agglomerating them at one place and documenting their pros and cons could be a wonderful resource. Faiss for example. A month ago, I wasn’t aware of GPU runnable nearest neighbors algorithm. Now, Facebook AI has put it up. Maybe we could write best practices a la a reference guide to use such tools.

Also, I’d like to propose a separate section for online learning algorithms. There were some wonderful implementations of online learning algorithms on kaggle competitions, for example, the masterpiece FTRL code on Kaggle, the Kaggler library that spun out of the ftrl idea that contains several online learning algorithms etc.

Also, how about building a gitbook that contains all this information so that it’s easily editable, maintainable, readable online just like a book and also downdable as a PDF?! For example, Mastering Apache Spark. I think this is a great way to build up a knowledge resource about a particular topic.

@brendan and others, what do you think? I’d be interested to know your thoughts.

2 Likes

Sorry for the delay, I’ve been completely absorbed by this Kaggle competition. Super fun and I highly recommend.

I think your ideas are great. We can take this in any number of directions. One thing I like about readthedocs is the navigation, search, and css. The idea for this cheatsheet came from navigating through PyTorch docs and thinking wouldn’t it be nice if basic theory and diagrams we’re displayed alongside the code? I also love reading Distill and like their emphasis on interactive visuals. The more visual the better!

But it’s early days and we can take this idea any direction we like. One thing that’s nice about the current setup is it requires markdown/rst which can be compiled into any format and including epub for ebooks!

1 Like

I like the idea of 1.ConciseConcept - 2.Code - 3.Visual

1 Like