Coloring with Random Forests

tylervw · November 22, 2017, 2:46am

I wrote a blog post on a visual representation of random forests. This an alternative perspective on how random forests work.

jeremy · November 22, 2017, 3:13am

This is so cool Since I think it’s great, I hope you don’t mind if I provide some feedback which I think might make it even better…

Specifically: it would be worth spending some time cleaning up the text, since the content is too amazing to have text with little problems in it… For instance, looking at the first couple of paragraphs:

Random forest are typically described using trees.

‘forest’ should be plural. Also, I don’t think this sentence is a strong introduction to the post. If there are some folks in class you know to be strong writers, perhaps ask them to help you create a compelling first paragraph?

To explore this representation, lets collect some data

“lets” should be “let’s”

Below we record the location and color of many points. These points are plotted below.

I found this confusing. What are these points? Are they generating synthetically? Is there some structure they represent? Why are you showing them to us? (I figured out after reading further and looking more closely that they are probably are from synthetically generated data designed to have a particular structure)

For the data we have been given, the random forest draws partitions as shown below.

This actually shows a tree partitioning, not a random forest partitioning.

Anyway, as I mentioned, this is fantastic content, and I’d love to share it regardless of how much time you can find to polish the prose. So just let me know when you consider it “done”. And thanks a lot for sharing!

fryanpan · November 22, 2017, 3:39am

Very cool! The plots made me think of the different channels of an image. Maybe it’s been done before, but do you think there is potential for a (very simple) novel alternative to CNNs using this perspective? If you partition each of the RGB channels of, say, a 20 x 20 pixel image separately using random forests, it feels like there is a way to develop a random forests model that can take the spacial information of each layer and classify the image. Just thinking out loud.

tylervw · November 22, 2017, 4:23am

Thanks, for the feedback. I fixed the issues that you noted. I tried to note that the data was generated with a specific structure, and that the random forests would be able to identify it. I would call it “done” at this point.

jeremy · November 22, 2017, 3:35pm

What’s your twitter handle, so I can give you due credit and people can find you? (If you don’t have one, you should probably create one - and note that people will look up your past tweets, so if you have a twitter handle that hasn’t been posting data science or coding topics, you may want to create a separate one for that purpose and make it the one that potential employers/clients can find).

tylervw · November 22, 2017, 4:19pm

My twitter handle is @Tyler_V_White. Here’s a link to the tweet.

parrt · November 22, 2017, 4:54pm

Fantastic!!!

jeremy · November 22, 2017, 5:40pm

And here’s my tweet! https://twitter.com/jeremyphoward/status/933383689496440832

cpcsiszar · November 22, 2017, 5:44pm

super cool visualization Tyler!

parrt · November 22, 2017, 6:58pm

i retweeted