Lesson 4 In-Class Discussion ✅

KarlH · November 14, 2018, 3:32am

I believe it’s a mean of 0 with a standard deviation of 1

fredguth · November 14, 2018, 3:32am

I wonder how is TL for tabular data.

crostino · November 14, 2018, 3:32am

Ensemble huh?.. I wonder if I can concat the data together as a longer text document and the model can learn based on that like. “Genre: Action, Actor: Taylor Swift, Jane Dawn, Year: 1980, Here is the review text”. By doing this, I hope do not need to have multiple models.

avatar · November 14, 2018, 3:32am

For NLP there is no need to perform the normalization step because all the data is in plain text format?

shub.chat · November 14, 2018, 3:33am

Is the data created in tabular a pandas dataframe? which we can further take into different models like xgboost and rf?

wdhorton · November 14, 2018, 3:33am

Is there any possibility of bringing back Datasets and the ImageDataBunch.create method to coexist with the new datablocks API? I published this code https://github.com/wdhorton/protein-atlas-fastai just last week and it depended on ImageMultiDataset, going to be somewhat tricky to migrate it.

More generally, I think there are use cases where you want to make custom Datasets and it seems like our ability to use data that’s organized in a non-standard way is more limited now.

lesscomfortable · November 14, 2018, 3:33am

Yes. Normalization is for continuous variables.

roman22s · November 14, 2018, 3:33am

Does fastai.tabular work with csv data that is too big to fit in memory as a dataframe?

RajeshMappu · November 14, 2018, 3:34am

@rachel bringing this to your attention. Any help on these questions?

sgugger · November 14, 2018, 3:34am

That’s not for this chat. Happy to develop why it is way more flexible now than before on another topic.
Sorry to hear we broke your code though

Interogativ · November 14, 2018, 3:35am

lost frame with Jeremy.

Interogativ · November 14, 2018, 3:36am

never mind

nithanaroy · November 14, 2018, 3:36am

I had a problem with high cardinality columns and wrote a basic article on it https://medium.com/@Nithanaroy/encoding-fixed-length-high-cardinality-non-numeric-columns-for-a-ml-algorithm-b1c910cb4e6d?source=linkShare-dd5a0af7ea9a-1542166579

wdhorton · November 14, 2018, 3:37am

Thanks for the response! I can see how datablocks is going to work well moving forward, maybe I’m just regretting how much I dug into ImageMultiDataset trying to get this to work.

KarlH · November 14, 2018, 3:37am

There have been a number of questions about large datasets and memory. Is there the ability to load the pandas dataframe as an iterator (loading with a chunksize) and have the dataloader/model treat it as such?

ilangurudev · November 14, 2018, 3:37am

How do we decide the number of layers and the number in those layers for structured deep learning?

vernboy · November 14, 2018, 3:38am

@sgugger does normalize also handle the issues traditionally caused with skewed distributions?

rohitr · November 14, 2018, 3:38am

untar_data() automatically adds .tgz to the url for downloading. Git
It actually fetches: http://files.fast.ai/data/examples/adult_sample.tgz

PegasusWithoutWinds · November 14, 2018, 3:39am

@rachel Thank you so much for keeping track of the discussion! Is it possible to mention the post number of the question, so we can jump to it and read? It really helps people who are listening in noisy environment.

avatar · November 14, 2018, 3:39am

Thanks, I was thinking more in terms of format of the text like font, style etc.,