Spark for deep learning

Hi all,

I see many data scientist job descriptions have Big Data skills (Spark, Hadoop, …). I tried to learn it by myself on dataquest but I haven’t totally understood how to integrate it. Most of the resources on internet teach us how to retrieve and clean data but don’t show us what to do after. For example, after retrieve the data by Spark, we will store it as pandas data frame (I think we will get out of memory if data is massif), or we will cut it in parts and lazy-evaluate it ?

I found in 2017 course, we have some materials about Spark, why don’t we see it in the v2 of fast.ai course ?

I am very appreciated if someone can clarify these things for me :smiley:

1 Like

I’m now learning pyspark and I wish run a simple fastai training script with spark-submit. (How) does fastai interface with Spark?

Where you able to run a fastai training script with spark-submit ?, can you share your experience please.

I think the reason we don’t have much Spark content here is that Spark follows a “big data” philosophy, whereas fastai has more of a “small(ish) but clever data” mantra :slight_smile: But still would be quite interesting if someone is using fastai with Spark.