I see many data scientist job descriptions have Big Data skills (Spark, Hadoop, …). I tried to learn it by myself on dataquest but I haven’t totally understood how to integrate it. Most of the resources on internet teach us how to retrieve and clean data but don’t show us what to do after. For example, after retrieve the data by Spark, we will store it as pandas data frame (I think we will get out of memory if data is massif), or we will cut it in parts and lazy-evaluate it ?
I found in 2017 course, we have some materials about Spark, why don’t we see it in the v2 of fast.ai course ?
I am very appreciated if someone can clarify these things for me