Another treat! Early access to Intro To Machine Learning videos

jeremy · November 3, 2017, 7:10pm

I’ve just added lesson 3 to the top post.

ramesh · November 3, 2017, 7:33pm

Thank you. I have been refreshing this thread a few times since last night. Can’t wait for the gift packet (lesson) to show up

sermakarevich · November 3, 2017, 7:37pm

Mind blowing pace with fastai! I like it!

jeremy · November 3, 2017, 9:15pm

As long as I don’t need to sleep, we’ll be fine

ravivijay · November 4, 2017, 2:19am

Machine Learning has 2 classes per week to go with DL for coders ? Intensive indeed!

miguel_perez · November 4, 2017, 2:43pm

Great lesson Jeremy, and thanks a lot for sharing it!

Maybe you’re already aware of it but if not check this out, it’s big recent news for model-based EDA lovers: https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211 (Yup, as good as it sounds,explains individual predictions importance mix, Xgboost based so both boosting and random forest)

About when not to use RF question, I am specially careful with “trend-ish” data, because of -in my opinion- biggest and maybe only weakness of the model: Inability to extrapolate. (Unless we tweak features to predict outside training ranges)

One last thing, when using sd of predictions as a measure of how confident predictions are, wouldn’t it be useful to normalize to the mean? (Otherwhise bigger target/priced categories can show bigger sd not related with prediction confidence)

jeremy · November 4, 2017, 3:16pm

I’ll check it out. Sounds great.

Great point. It’s actually in the next notebook, but I forgot to mention it regarding this question.

That’s why I take the ratio at the end of that section.

sermakarevich · November 4, 2017, 3:48pm

@miguel_perez did you try to use it? how is it?

miguel_perez · November 4, 2017, 4:13pm

@sermakarevich, it’s brand new stuff (not much written or tested published about it, by now).

As for myself, I spent last month trying to grasp some bash/git/python/tmux… (shamefully enough had to learn anything that is not Windows + R)

In Kagglenoobs Slack it has been object of great interest by members way wiser than I am… but as I say, too new to have any published paper apart of the one I posted. So, no first hand experience but plan to add it to my toolbox as soon as possible.

(btw, the paper example unfortunately uses Caret R package, of which I am not a fan off because of the overhead it introduces, hopefully in the future cleaner code will be published just with the explainer)

sermakarevich · November 4, 2017, 4:18pm

Got it. Just wondered is it news or personal recommendation. I did not use it as well, just heard it still does not work properly. So I checked

miguel_perez · November 4, 2017, 4:23pm

Yes, still under development for sure, also only in R, not python, but the approach has so much potential that I’d be surprised if it doesnt become fully usable soon. Consider a good thing to have on the “radar”

jeremy · November 4, 2017, 5:03pm

I took a look - it’s nice, but note that the same thing has been available for Python random forests for a couple of years now Random forest interpretation with scikit-learn | Diving into data

wgpubs · November 4, 2017, 6:03pm

Is there a recommended way to work with unix timestamps in Pandas and the FastAI framework?

I’ve always gone through the process of loading the dataframe, converting the timestamps to python datetime objects, and going from there. Don’t know if there is a better a way or if the FastAI framework may be able to work directly with a unix timstamp to do it’s feature engineering of datetime objects (e.g., something like a “add_datepart_from_unix_timesampt()” or “add_datepart(…, incoming_format=“unix-timestamp”)”).

jeremy · November 4, 2017, 6:07pm

You should convert them to datetimes, as shown here: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#epoch-timestamps

miguel_perez · November 4, 2017, 6:35pm

Ouch… had no idea (and I know quite a few Python-ers that aren’t aware either), sorry for that…!

abi · November 4, 2017, 9:46pm

Question after watching lesson 2 on Random Forests:

So is ExtraTreesRegressor (which samples rows and cols from the full dataset for each tree) equivalent to RandomForestRegressor + max_features attribute set to something + set_rf_samples?

If so, why can’t one just use ExtraTreesRegressor instead of manually coding the set_rf_samples function?

jeremy · November 4, 2017, 10:56pm

Hey no need to be sorry @miguel_perez! Always appreciate input on this forum

jeremy · November 4, 2017, 10:57pm

No not the same. Xtra trees randomly samples a few splits, rather than looking for the optimal split for a feature

abi · November 4, 2017, 11:04pm

Ah! subtle but important difference. Thank you Jeremy.

rrherr · November 5, 2017, 1:01am

What about BaggingRegressor, which has max_samples parameter… could that be an alternative to set_rf_samples?