Of course, there’s also a category-theoretic way of understanding this: differentiation is a functor that takes you from the category of functions on \mathbb{R} to a “weird” version of this category in which functions are “composed” by multiplying them together. (Exercise for the reader: What are the identity arrows in this latter category?) The chain rule is simply the statement that this is indeed a functor.
This isn’t particularly helpful for understanding the chain rule; but it comes in handy if you want to understand something called “automatic differentiation” (which is the context in which I came across this idea).
Here is the website: Explain Paper which has been making rounds in social media, explains a research paper by answering questions from the paper using GPT-3.
I’ve reviewed the meanshift notebook and adding an alternative torch implementation and some plots that should help to understand how the data, batching and the various steps contribute to the final result.
By the way, I thought of a more intuitive way of explaining why composing two linear functions multiplies their slopes: You have to think of a linear function as an affine transformation of the real line. E.g., the function f(x) = mx + c scales the real line by a factor of m, and then shifts it by c.
Now, what happens if we compose f(x) = mx + c and g(x) = nx + d? Well, first we scale by m, then shift by c, then scale by n, then shift by d. To make this easier to imagine, think about what happens to a unit interval. It should be obvious that after applying f, we go from an interval of size 1 to an interval of size m. Then, after applying g, we go from an interval of size m to an interval of size nm! Hence, the slopes (i.e. the scaling factors) are multiplied under composition.
Playing with @HuggingFace Spaces/Gradio. I had wanted to wire this up to Stable Diffusion but its a bit too taxing for the “free tier” CPU instances…
Give my interactive 3D depth viewer (three.js) a try by just dragging and dropping an image to the Spaces app and drag around with a mouse (or your finger). There are some other Spaces that attempt to do this… But I think mine works a lot better… Correcting for the “camera view pyramid” etc…
The idea is that when we are talking about derivatives, we can think of any differentiable function as being approximately linear around each point. So, for the purpose of computing the derivative of g(f(x)) at x=x_0, we can behave as iff(x) is simply the tangent of f at x_0, and g(u) is simply the tangent of g at f(x_0).
Totally agree The spelled-out intro to neural nets is a masterpiece!
It really helps you to connect the dots and have a good grasp of what is happening under the hood.
A good resource to watch looking forward to next lesson on backprop
Is Jax/Flax just Google’s answer to Torch? Pros/Cons vs Torch? Just looking at random colabs etc and seeing some in Jax are there any particular advantages (ie things you can do in Jax that you can’t do in torch)? Anyone have experience with torch and jax?
Jax is perhaps slightly lower-level than PyTorch, in that it provides what is basically numpy with autograd magic and XLA compilation for FAST execution on GPUs/TPUs. I haven’t used it much but
impressions from pros to cons
Very fast when you use all the JIT compilation magic, and great for making use of TPUs
Low-level, which can be fun for learning and makes you feel less reliant on a bunch of high-level libraries and APIs
vmap() means you can write a function that works for a single example and it’ll turn it into one that works with batches of data
Some interesting libraries emerging like equinox which I enjoyed dabbling with
Jax sort of forces a specific kind of coding on you, which can feel weird at first but does seem somehow… elegant? Does take a bit of getting used to though.
You will be forced to think about random numbers and state, which is good but also extra work
Definitely harder to debug (although they’re working on that)
Smaller ecosystem, so you’ll have to do more things yourself vs PyTorch land where you can usually find a library or some code that does what you want.
Ditto for documentation, small but growing.
At the moment the usual joke is that people using Jax are mostly Google folks who don’t want to use TensorFlow, but I think it’s cool that it exists and suspect it’ll keep growing into a more and more useful part of the overall ecosystem.
That how you create uniform random variable between -35 and 35. No particular reason I picked those params - just wanted something that was generally not too big and not too small for talking though.