Any brilliant approaches to wrapping one's head around foundations of statistics?

[note: this is now a Wiki post, so please add resources at the end of it. Thank you!]

All that talk in the first lessons of part 2 on the importance of stats is painfully frustrating for me.

Any brilliant suggestions at how to really get all those VAR/Co-VAR/etc. concepts?

Math has been always easy for me, including pretty advanced university-level math, but no matter what I try, my brain just doesn’t get stats :frowning: I couldn’t get it 25 years ago when I studied it at uni, and I still don’t get it when I try hard now. It’s so frustrating. I can do problems and follow formulas, but I just can’t wrap my head around anything stochastic. It’s so weird. I can do very complex integrals and derivations, and what not, I’m very good at spacial things, but not stats. It’s like I’m lacking some science genes (I also hated electromagnetic fields and relativity theory, but loved the rest of my courses).

A year ago I took 3 months to follow math lectures from Stanford, Harvard and MIT to refresh what I studied 25 years earlier but have never used it since then. I absolutely loved the MIT calculus courses. I cringed through stats. Tried Khan’s academy stats - no luck either. I somewhat get it when I study it but it doesn’t sink in and a few days later it’s as if I have never studied it.

I have just went to my local library and picked up a bunch of books on stats, thinking to just re-read the parts I need again and again until I make a break-through. I really really need to find a way to understand the basics of that field, since ML foundation is all about stats.

Have you been in the same situation and found a way to make a break-through?

If you get inspired to follow up please don’t just list books or courses, I have a ton of them, and watched and read a ton of them to no avail. I just can’t find any that help me to really make a break through and to really start grokking stats. It’s like I’m looking for that brilliant teacher who has a way to teach stats to someone who just doesn’t get it. And none of the famous professors at the big unis I tried did it for me (unlike with other branches of math - some amazing profs out there).

My academic background is B.Sc in EE (but from many years ago) if it helps to set the stage.

And thank you!

Here is a summary of what resonated for me in the kind suggestions below (meaning there are more suggestions in the answers that aren’t listed here):

  1. You have to be sure to understand everything you’re doing. Go back to the basics every time you’re not sure about something:

  2. Stats is not about pattern matching and requires a different skill/mindset.

  3. But perhaps by doing enough experiments with data it could be reduced back to pattern matching by honing the intuition. Practice with actual data a lot - hands-on approach

Resources Summary [Wiki]

These were multiple courses and books mentioned, so I turned this post into a Wiki and you can now edit the resource section yourself.


May be more basic than you’re looking for but I go back to maths is fun when I can’t remember my basic stats it’s not a full course and I know you’re not after a list of books/courses but could be useful?


Saw this from another thread & thought will link it here. Pls check it out.


Thank you, @maya. I did both courses.

Indeed MIT one was harder to follow and it wasn’t great for me.

Blitzstein on the other hand is a great professor, probably the best from everything I have tried so far, but didn’t work for me. I did have a few aha moments though, so that was some progress.

1 Like

I have a feeling you’re onto something important. Perhaps I’m lacking some very very basic understanding (skipped some important classes in high school?) and that prevents me from getting much more complex things. It feels right, I will go through what you suggested, thank you @adrian!

I usually find it easier when I can code it out. Run a simple regression in R or Python and see if you can reproduce the values in the ANOVA table - I learned basic stats that way.

In the direction of probability theory I can recommend the CS109. They start very low-level and build up nicely from there. They have the slides and some nice summary handouts on their website. I am currently at lesson 9.

I combine going through the material with making a lot of Anki cards of the concepts, similar to It takes some time, but with this it is much easier to remember and apply the material.

This also helped me while going through the Glorot and He papers from lesson 8, as they Var(x)… parts already looked familiar to me, which was a nice thing. :slight_smile:


Probably not a “brilliant” approach, but at least a good one:
The O’Reilly book Practical Statistics for Data Scientists - 50 Essential Concepts.
I really liked it, lots of examples, lots of graphs, nice boxes with “key concepts” and summaries as well as “further reading” sections, often with links to useful online content. Did not read it back to back but have kept looking at specific chapters if something comes up. Code examples are in R, but short and understandable for Python people I think. It’s certainly not for the advanced statistician but rather introductory, maybe it helps…

1 Like

Ah, and I also did this introductory course on udemy (again, more sampling by chapter rather than back to back):
Probability and Statistics for Business and Data Science
I also liked it (don’t buy unless there is a “sale” for 9.99-11.99 which is almost always!). I think overall what helps me with subjects I have a hard time “remembering” is to have a diverse source of “teachers” to get different explanations of the same concepts. I then remember either because of the implicit repitition or because one explanation of a concept finally really “clicks”. And I really think a lot of the statistics stuff is more about “remembering” rather than “understanding” because most concepts are very easy to grasp, maybe it is just too easy for your brain to deem it necessary to remember :wink:


You might experience a mix of fun and shock feelings by reading the book by ET Jaynes called “Probability theory: the logic of science”. If you tolerate the occasionally aggressive style (Jaynes thought he was an under-appreciated genius and felt bitter about it) and think about the ideas in this book, you might get that mind-bending experience that coders get when they first reach deeply into pure functional programming concepts. I do not think that the Bayesian ideas in this book are a catch-all solution for science, especially since the dramatic advance of deep learning, but given your question and background, I thought you might actually like it and learn from it. This book was reasonably popular, so it should be easy to find a copy to borrow.


If you could understand calculus, statistics should be easy. It is much easier… I suspect you need hands on practice with real data…

Whatever approach you choose, please ensure that it comes with hands-on approach. A professor once said, he never could understand statistics until he had to apply it to his graduate research.

Pick a statistics software like minitab which I recommend and play with real data as you read through the book/course that you will choose. Statistics software like Minitab makes life easier. With a single click you can see a nice visual report of whatever statistics related results that you want like these:

The beauty of statistics as a science, is that, it is an important tool to be closer to truth just about anything in life…

Consider this example:
If somebody tells you that French adults are taller than British adults population, because the average of the French adults is 178 cm while the average of British adults is 172 cm… If you know the basics of statistics, you will know that this is meaningless… Because the premise is based on insufficient information… Most people are tend to make comparisons in everyday life using single numbers… That is okay for single item… However once you are describing more than 10 items, then you should use a different tool for such description or comparison… Statistics is that tool…

Now why is the above comparison meaningless?

Because when we take average of many items, we are losing important information describing the item’s length for each individual. Of course, it is difficult to skim millions of numbers if I want to describe a huge population like that, but there is this standard deviation that describes how many percent of your population is more than the mean by 1 cm or 2 cm or less than the mean by 3 cm …etc. So the correct way to describe the British population is by the mean + S.d. which describes also the distribution of your population around the mean…

Considering the sd in our comparisons, we will find that sometimes our conclusion will be non-intuitive for most people. Take this:

Height of adults:
Germany: 178 cm
Denmark: 174 cm

France: 178 cm
UK: 176 cm

If somebody knows the standard deviation and asks himself which difference is more significant, is it between France and UK or Germany and Denmark people?

Without statistics (for most people) it will be of course the Germany/Denmark difference more significant.
But consider this with S.D. for the following graphs:
Germany vs Denmark:

France vs UK:

You can see that in the 1st graph, although it has twice bigger average difference, but the dispersion (i.e., the s.d.) for those 2 populations are much more than what it is in the 2nd graph. And that’s why the difference is not very significant like in the 2nd graph. In other words, we have less confidence in the difference between the 2 populations in the 1st graph, and if you mix both populations in the first graph, you will not recognize that something weird happened in their heights. But if you mix France and UK populations you will feel that there is a significant change in their heights after the mix up, since all French population are almost strictly 178cm and all UK people are almost strictly 176cm so the change after the mixup is glaring i.e., statistically significant.

You did a brilliant job to make Linear Algebra easy for everybody. What do you think about doing a similar course for statistics? Like Intro to statistics for coders ?

I bet there are a lot of guys here need that… I think statistics is not very essential for DL like Linear Algebra, but it is still an important tool to understand the basic concepts, especially for ML as general… I remember I have studied ML as statistical pattern recognition course 10-15 years ago where statistics was an important prerequisite for studying ML or pattern recognition course…


I don’t have any brilliant ressource to share with you (I had the luck of having an exceptional math teacher that loved stats). However, the year before that, I had a good math teacher and I sucked at stats, so I know the feeling (not that I’m very good now, but at least I feel comfortable).

I wanted to tell you that, from my experience with many classmates, you’re the rule and not the exception. For some reason, apart from some people, stats is hard for us humans.
I think that’s also due to the fact that for a very long time statistics wasn’t considered to be “real” math, and wasn’t taught properly, whereas the teaching of the other math areas was refined. Stats teaching lack the compounding effect of the cycle of good teachers developing good teaching methods that get refined later on.

Having said those abstract things that aren’t of much use to you, my practical advice is quite common and boring: practice. But I find it’s especially true in stats where often we can have the wrong intuitions, and intuition can be quite important in understanding a subject and be comfortable with it. Practice is the only way to get rid of the bad intuition and replace it with a good one. Even though practice is important in everything, I find that in stats, because it’s harder and there’s less good ressources, we tend to practice less when we should work more.
Last other (generic) math advice, that’s also particularly important in stats: you have to be sure to understand everything you’re doing. Go back to the basics every time you’re not sure about something.

Good luck!


How about a ‘ approach’? I mean, go for a top-view of the field. In this case you’d probably like the MOOC from Tibshirani and Hastie (they are top scientists and very fun to watch). Later on, you can deepen into each topics individually (probabilities, distributions, etc.)

Anyway, I highly recommend this course for anyone interested in DL.


This could also be quite interesting to practice statistics with Python:
“Think Stats is an introduction to Probability and Statistics for Python programmers.” (see link to pdf or online version)

1 Like

Hmm, I wonder if you might like Richard McElreath’s course, Statistical Rethinking: It’s very popular as an intro to Bayesian statistics (which I guess isn’t quite what you’re asking about, but I at least initially found the Bayesian approach to be much more “fun” than frequentism, for whatever reason—that fun helped keep me motivated and excited about the other material). He’s an entertaining lecturer and an all-around kind/charming person, plus his course shows a glimpse of a different world from deep learning. Anyways, not sure that would help with your specific question, but maybe it could provide some motivation for why statistics can be fun. [Oh, and it’s also largely programming/simulation-based, rather than math-based.]

1 Like

I’ve never seen any university courses for stats that I like. Personally, I don’t find mathematical approach to stats at all helpful in understanding it. Symbolic manipulation isn’t really what stats is about, so I don’t find it offers much insight into it.

Instead, do lots of experiments and look at what’s going on in your data. Then look for readings that focus on that angle to statistics. Such an approach almost certainly needs to involve lots of code! Here’s one book that I’ve briefly looked thru before and seems to do a good job of this approach:


If math has always been easy for you, then perhaps you may not have developed the mental skills that are needed to grok difficult concepts. Math isn’t easy for me, and so all types of math are hard for me. :wink: But I don’t get bothered by math that is difficult – doing math for me is a process of struggling a lot but I’m OK with struggling.

(Actually this is not 100% true. I was pretty good at math in school but that’s because school math is mostly a matter of pattern matching. “If a problem looks like X, use solutions Y and Z to solve it.” So even though I got good grades because I’m naturally good at pattern matching, I never really understood what was going on.)


The more I’m thinking about it, the more I think that this must be a big part of my puzzle. I’m a pro at pattern matching, and “applied” calculus is mainly pattern matching. So any science that has to do with pattern matching is easy for me. But once it diverges from that path, it becomes hard - I guess that’s why relative and some other domains of physics were hard for me. Any proofs have been hard for me to, which again is different way of thinking from pattern matching.

So in a way the advice several posters gave in this thread to practice with data is going back to pattern matching, the more data you see the better you understand it intuitively because your brain NN has gone through enough of “epochs” to be able to handle new data/problems easily. So the immediate outcome is that like the issue we have with DL that we don’t exactly know how things work, but they do.

Thank you!


Thank you everybody who contributed so far.

Just reading through your replies has already been fruitful to me - in gaining a better understanding of why I have been stumbling in this domain of math.

  • stats != pattern matching
  • practice with data == intuition (comes close to pattern matching)
  • need 100% understanding of basics

I have started making a summary of ideas and new to me resources in the first post.

I’m only summing up things that I believe are of use to me (and hopefully others in the same boat as I), so let’s not turn this thread into yet another all-possible stats resources, but keep it focused primarily on the stats mindset, unusual teaching and learning methods, and practical suggestions that worked for you to make a breakthrough.

Thank you.

One more realization occurred to me - I studied school-level math in Russian, uni-level math in Hebrew, and this now is in English. So I realized I probably indeed need to go through the very basics to solidify my mind map of the concept names in English, since a lot of those words are kind of meaningless and have no foundation in the childhood/teen years roots when one builds conceptual foundations.


Something I find really useful when learning a new topic (or to deepen your understanding) is trying to explain this topic to other people.

This is like writing code: you can think through some algorithm in your head and believe it works, but you don’t know until you turn it into code and try to run it (and usually you’ll find that there was a bug in your thinking).

Same thing with anything else: as soon as you attempt to explain it to someone else (for example by writing a blog post), you’ll find that there are important gaps in your understanding and that you don’t really understand the material as well as you thought.

The trick is that this also works when you’ve just started learning something new. Try to explain it to yourself or some imaginary audience. It will quickly become clear which parts you understand and which parts you don’t (or what you previously glossed over).

I’m sure this is why Jeremy always encourages people to blog about what they’re learning. It works.

I keep detailed notes about pretty much everything I learn, partly because I can’t possibly remember everything, but also because writing down the notes helps me to learn. However, I don’t just copy things verbatim (like what kids do in school). I always try to explain things in my own words. If I can’t, it means I need to study the topic more.

So my suggestion is to start writing a course on statistics from the ground up. Not just how it works but also why you’d want to use it, etc. Initially, this course is just for you. You are the teacher but you are also the student. :smiley: