Learning data wrangling / prepreparation

Has anyone found useful material on approaching data wrangling, analysis and visualization BEFORE trying to implement models? Would this be of interest to you (to teach or learn)? Its probably the first place people kill their time trying to do DL and ML on their own problems / data (not preprepared by someone else) - and also where overlooked errors are made / improvements missed (e.g. the german sales 3rd place “error”: remove 0 / closed days in lesson 14 (DL part 2, 2017))

For example, lets say a data source for a problem uses IoT devices or say car controller data coming from different sensors. You might find data expected to be float also contains strings like “I/O Timeout”, “Bad Data”, “Err3004”. Before I can work this kind of data I need better skills for 1) understanding (visualise them on the data set) and 2) handling in pandas. I don’t want to blindly zero/mean/remove.

thanks in advance

(Really not sure the best place to put this query, so hopefully its seen and ok here)

found a lot of tips for (2) : themed data wrangling code implementation steps if you scroll down to the next heading below https://chrisalbon.com/#python

1 Like

a very solid run through of pandas from one of its maintainers

I took the “Data Scientist with python” course on Datacamp a while back. It provides a good overview with exercises.