Lesson 12 AMA

jeremy · March 28, 2017, 12:20am

This week we’ll try (time permitting) some ‘Ask Me Anything’ questions. Your question doesn’t have to be related to this lesson, although it should be something that you think Jeremy or Rachel might be reasonably well qualified to answer! e.g. Ask about deep learning, building data-oriented startups, data-driven medicine, Kaggle, MOOC development, etc…

We’ll primarily use ‘likes’ to prioritize questions, so please vote for questions you’re interested in.

thunderingtyphoons · March 28, 2017, 12:57am

I would like to know what is the plan for next 3 lessons :-). Things I would personally like to know more about:
Time series
Neural Architecture Search using Reinforcement Learning

Is there a part 3? This is one of the best classes that I have attended and we have a great community here. It would be sad if everything ends after 3 weeks…

gesman · March 28, 2017, 1:12am

I’d really love to have an opportunity for a regular-basis local (or maybe even webex-based) practical DL workshops where I could come with an idea and code that doesn’t work - and have a chance to make practical progress toward working implementation under instructor’s guidance.

The reason being the web is full with isolated snippets but it prove to be quite challenging to put something less standard into working implementation close to state of art.

Gleb

davecg · March 28, 2017, 1:13am

Any advice on imbalanced datasets? Seems to be a recurring issue with real world data.

aifish · March 28, 2017, 1:19am

I would like to learn more about implementations and practical tips in reinforcement learning, just as what we did with deep learning. While there are plenty of RL related tutorials online, most of them are geared towards theories and it is not easy to turn them into running codes that actually produce state-of-the-art results.

alex_izvorski · March 28, 2017, 1:20am

Any tips on ensembling deep learning models? I’m interested in methods like stacking, blending, bagging, etc in addition to simple averaging.

When is it a good idea to use a non-deep learning classifier (eg XGBoost) on top of one or more deep learning models? What makes it better, compared to training another neural net using features from neural nets as the inputs?

kpatnaik · March 28, 2017, 1:21am

How do you go from code in a ipython notebook to a data product ?

In my mind, an ipython notebook shows a possible solution to problem/use case. What steps need to be taken to ensure that it works as intended with real-world data, changing product requirements or use-cases and scaling?

kpatnaik · March 28, 2017, 1:22am

Time series is high on my list as well. I have tried out a bunch of ideas on my own, but eager to listen to Jeremy.

jeremy · March 28, 2017, 1:24am

@kpatnaik best off clicking ‘like’ for the existing thread then!

jeremy · March 28, 2017, 1:25am

@kpatnaik have you read https://www.oreilly.com/ideas/drivetrain-approach-data-products as yet? Would be interested to know if that was helpful, if you have any follow up q’s, anything that seems unclear or out of date, etc…

dennisobrien · March 28, 2017, 1:41am

For supervised learning, is there a way to prove an upper bound to performance for a given dataset and task? That is, can you say that the noise is at least X so that puts a bound on the loss function? Or will we always wonder how much more the model can be improved?

Some tasks can be described as “easy for a human but hard for a computer” (e.g., dog versus cat) while others are “hard for a human and hard for a computer” (e.g., given gameplay history, will the player still be active in 30 days). For this second case, are there any learnings from Information Theory to help?

thanks,
Dennis

brendan · March 28, 2017, 1:44am

If you were to start a company today, what would be your idea?

Even · March 28, 2017, 1:49am

It strikes me that many major evolutions in deep learning have been driven by the creation of a labelled dataset or competition, with ImageNet being the classic example here.

What datasets do you see missing from the deep learning landscape that you’d like to play with for transfer learning, etc?

davecg · March 28, 2017, 1:53am

To piggyback off @even, any advice on constructing our own datasets to avoid common pitfalls, data leakage, etc?

cody · March 28, 2017, 1:56am

What is the most interesting DL paper you’ve read recently that does not focus on image data?

thunderingtyphoons · March 28, 2017, 2:08am

A paper on leakage that was mentioned in part 1: http://www.cs.umb.edu/~ding/classes/470_670/papers/cs670_Tran_PreferredPaper_LeakingInDataMining.pdf

hamelsmu · March 28, 2017, 2:12am

@jeremy I noticed you use windows. Do you recommend this OS for data science? It seems like its harder to install things vs. Ubuntu… just curious.

ljubomir · March 28, 2017, 2:36am

I think this is important question, but I suggest clarifying what exactly the issues are

Even · March 28, 2017, 2:40am

There’s a lot of information on this topic readily available.

http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/

is a good starting point.

garima.agarwal · March 28, 2017, 2:41am

Google recently released their Video Intelligence api which surprisingly is not build on the LSTM work published by Fei Fei… I am curious if you have any suggestions on what the best approach for Video classification and processing?