Hi @jeremy, @rachel, and Fast.ai community! I just completed the 4th lecture and had some questions about different deep learning methods. It’s been fascinating for me to compare Keras deep learning approach (can read more in the Deep Learning in Python by Francois Chollet!) and Andrew Ng’s Coursera approach (Deep Learning specialization).
The topic of algorithmic bias seems to come up a lot in various contexts, but if the Deep learning models are using the same data (such as Kaggle data) is there a way to better understand why models are preforming differently? I just had a few questions about that, so that when training my models I have a better grasp of why I would use one approach over another and what one model is doing that another may not be. Thank you!!

For example, in the Embeddings lecture the model using the fast.ai library beat the previous mostaccurate model. If Virtual got 94.1 and Fast.ai got 94.5% accuracy for its text classification method, what are the reasons for these differences? Is there a way to really understand why these models preform differently?

Like Jeremy mentions in the lecture, its crazy how many applications of DL there are (and growing). I have a dataset that i am particularly interested in looking at and applying deep learning. My question though is how should we approach thinking about what the independent and dependent variables are? If i know what I want to try and classify, is that always the dependent variable? By doing this however, are we not making some assumptions about casualty/relationship between specific factors?

Are the ‘most’ accurate classification models using greater numbers of categorical variables? Jeremy mentioned that in the example of the Rossman sales example, it was advantageous to keep as many categorical variables as possible because this allows the models to learn from distributed representation; if its continuous the only thing it can do is find a single functional form that fits it well. We we are using structured data, would you recommend that we train our models by identifying as many categorical variables as possible?
 Could we use a pertained model’s output (such as the results of a CNN) as a categorical variable for a structured data problem?