How does one master data pre-processing for images, text, audio, video based problems in ML? I have been struggling with this step (understanding what the preprocessing code does and how to write it on my own) across all the different architectures (especially in NLP + Embeddings). Could someone share their experience, their learning path and any tips/resources that were useful. Just to give you some context, I come from a non computer science background with 3 years of experience in R/SAS/SQL and am relatively new to Python/C++ style of writing code.

Mastering numpy and pandas are the obvious ideas. I understand that this a very broad question. Please let me know if there are any other ideas.

