Personally, I found working with data frames in Pandas pretty unfamiliar at first and the online cheat sheets not very easy to read. It was really helpful for me just to collect all the examples Jeremy had covered in one place and connect it together (adding some other small useful bits), so though I’d share my code for others to dig through:
Pandas for Machine Learning Cheat Sheet
Let me know if there’s anything big I’ve missed!
Great job! BTW one very minor suggestion: you can always remove
, : from any numpy or pandas indexing. E.g. instead of
arr[0, :] just say
arr - the trailing colon is assumed. (Very few people seem to be aware of this, so most code I see on the internet has the trailing colon - but I think it’s clearer without it, personally).
I also wonder if it should be
df_raw.loc[range(5)]. We happen to have an index of 0, 1, 2, … for that dataset but
iloc will always give you the more numpy indexing.
Thanks Jeremy – useful to know
Hadn’t actually picked up on the difference between
.iloc, Terence, but seems like an important distinction, will edit now
This is what I’ve learned:
loc works on labels in the index (row/column names)
iloc works on the positions in the index (so it only takes integers) - @soorajviraat says it’s an abbreviation for integer location
This article explains the diff b/n loc, iloc and ix in detail, do give it a read.
I’ve already updated the document to include this