Lesson 12 AMA

This week we’ll try (time permitting) some ‘Ask Me Anything’ questions. Your question doesn’t have to be related to this lesson, although it should be something that you think Jeremy or Rachel might be reasonably well qualified to answer! :slight_smile: e.g. Ask about deep learning, building data-oriented startups, data-driven medicine, Kaggle, MOOC development, etc…

We’ll primarily use ‘likes’ to prioritize questions, so please vote for questions you’re interested in.


I would like to know what is the plan for next 3 lessons :-). Things I would personally like to know more about:
Time series
Neural Architecture Search using Reinforcement Learning

Is there a part 3? This is one of the best classes that I have attended and we have a great community here. It would be sad if everything ends after 3 weeks…


I’d really love to have an opportunity for a regular-basis local (or maybe even webex-based) practical DL workshops where I could come with an idea and code that doesn’t work - and have a chance to make practical progress toward working implementation under instructor’s guidance.

The reason being the web is full with isolated snippets but it prove to be quite challenging to put something less standard into working implementation close to state of art.



Any advice on imbalanced datasets? Seems to be a recurring issue with real world data.


I would like to learn more about implementations and practical tips in reinforcement learning, just as what we did with deep learning. While there are plenty of RL related tutorials online, most of them are geared towards theories and it is not easy to turn them into running codes that actually produce state-of-the-art results.


Any tips on ensembling deep learning models? I’m interested in methods like stacking, blending, bagging, etc in addition to simple averaging.

When is it a good idea to use a non-deep learning classifier (eg XGBoost) on top of one or more deep learning models? What makes it better, compared to training another neural net using features from neural nets as the inputs?


How do you go from code in a ipython notebook to a data product ?

In my mind, an ipython notebook shows a possible solution to problem/use case. What steps need to be taken to ensure that it works as intended with real-world data, changing product requirements or use-cases and scaling?


Time series is high on my list as well. I have tried out a bunch of ideas on my own, but eager to listen to Jeremy.


@kpatnaik best off clicking ‘like’ for the existing thread then! :slight_smile:

@kpatnaik have you read https://www.oreilly.com/ideas/drivetrain-approach-data-products as yet? Would be interested to know if that was helpful, if you have any follow up q’s, anything that seems unclear or out of date, etc…


For supervised learning, is there a way to prove an upper bound to performance for a given dataset and task? That is, can you say that the noise is at least X so that puts a bound on the loss function? Or will we always wonder how much more the model can be improved?

Some tasks can be described as “easy for a human but hard for a computer” (e.g., dog versus cat) while others are “hard for a human and hard for a computer” (e.g., given gameplay history, will the player still be active in 30 days). For this second case, are there any learnings from Information Theory to help?


If you were to start a company today, what would be your idea?


It strikes me that many major evolutions in deep learning have been driven by the creation of a labelled dataset or competition, with ImageNet being the classic example here.

What datasets do you see missing from the deep learning landscape that you’d like to play with for transfer learning, etc?


To piggyback off @even, any advice on constructing our own datasets to avoid common pitfalls, data leakage, etc?


What is the most interesting DL paper you’ve read recently that does not focus on image data?


A paper on leakage that was mentioned in part 1: http://www.cs.umb.edu/~ding/classes/470_670/papers/cs670_Tran_PreferredPaper_LeakingInDataMining.pdf


@jeremy I noticed you use windows. Do you recommend this OS for data science? It seems like its harder to install things vs. Ubuntu… just curious.


I think this is important question, but I suggest clarifying what exactly the issues are

1 Like

There’s a lot of information on this topic readily available.


is a good starting point.

Google recently released their Video Intelligence api which surprisingly is not build on the LSTM work published by Fei Fei… I am curious if you have any suggestions on what the best approach for Video classification and processing?