Lesson 12 (2019) discussion and wiki

paul · April 18, 2019, 4:08am

What do you foresee will be the first state-of-the-art applications using Swift for Tensorflow? It would be nice if the new framework first focuses on a subset of deep learning that’s up and coming so it can gain traction with communities who are not as strongly committed to their old development tools. I guess if fastai builds the libraries in the right order, it could have a big influence in how people will adjust their research workflow.

stas · April 18, 2019, 4:09am

I tried to educate myself on torch.jit today, found a few entry level overviews that gave me good understanding of how/when to use jit.script and jit.trace:

Adam Paszke’s talk: https://youtu.be/WMITGlZCFfw?t=598 (approximate start, you can skip a bit forward)
@t-v’s talk slides + nb (source http://lernapparat.de/pytorch-jit-android/)

Some advanced materials that I haven’t read yet:

rsrivastava · April 18, 2019, 4:12am

We do not want to use Python… Only reason I think Python will last longer… is the echo system of Python… Pandas, Numpy, Matplotlib etc… Not sure how much time will Swift take to reach that level of echo system… To me Swift is as good as Java or Scala I know Swift is heavily used on iOS and OSX development but not sure it has robust ecosystem for datascience.

Question is can we do Swift programming in Jupyter notebook? Are all data science packages like Pandas, Numpy, Matplotlib etc… available in Swift…

harikrishnanrajeev · April 18, 2019, 4:18am

Julia had been around since 2009, why was it not adopted for DL/ML instead of Python earlier ?.

Dee · April 18, 2019, 4:28am

This is where @Moody question about context came from. We are interested in contexts >20,000 tokens, still v.small for genomics. would be interested in hering people thoughts about how we engineer for such large contexts?

KarlH · April 18, 2019, 4:46am

What sorts of things are you trying to model? Depending on the structure of the problem it might not be necessary to deal with 20k+ contigs simultaneously.

jcatanza · April 18, 2019, 4:49am

Please write out acronyms that are not so common that everyone knows them. What is NER?

jcatanza · April 18, 2019, 4:53am

What is TDD?

gietema · April 18, 2019, 4:57am

Test driven development - https://en.m.wikipedia.org/wiki/Test-driven_development

cedric · April 18, 2019, 5:26am

In this context, NER == Named-entity recognition.

devforfu · April 18, 2019, 5:40am

BTW, added TDD to the glossary as soon as it was mentioned a few times in the Part 2 discussions.

sgugger · April 18, 2019, 12:53pm

Yes and yes: there is support for swift in jupyter and an interopability with python libraries.

rsrivastava · April 18, 2019, 2:39pm

Thanks Sylvain. So you are saying we will do data analysis using the Python library we know of… Then for model training we will use Swift?

Lankinen · April 18, 2019, 3:04pm

I always forget. Is parameters the things we learn and activation the things we calculate?
parameters = weights
actiovations = outputs of layer

stas · April 18, 2019, 8:34pm

See: notes/Lesson4.md at master · hiromis/notes · GitHub

Activations and parameters, both refer to numbers. They are numbers. But Parameters are numbers that are stored, they are used to make a calculation. Activations are the result of a calculation﹣the numbers that are calculated. So they’re the two key things you need to remember.

So use these terms, and use them correctly and accurately. And if you read these terms, they mean these very specific things. So don’t mix them up in your head. And remember, they’re nothing weird and magical﹣they are very simple things.

An activation is the result of either a matrix multiply or an activation function.

Parameters are the numbers inside the matrices that we multiply by.

jamesrequa · April 18, 2019, 10:44pm

mixup is so awesome!! It’s like having the benefits of label smoothing + data augmentation all in one, should be really helpful in particular for fine-grained classification problems

pcuenq · April 18, 2019, 11:01pm

I’m testing the mixed precision notebook against my 2080 Ti GPU and I’m not observing any performance improvements. I injected a simple callback to verify that the weights of the conv layers are indeed of type torch.float16, whereas those of the batch-norm layers are torch.float32, as expected.

The 2080 Ti has a “compute capability” of 7.5, so according to this table, 16-bit performance should be good.

I’m not really planning to use mixed precision right now, but I thought this was odd and I’m curious about the limitations. Is this because our network is small? Could it be because a 32-bit layer always follows a 16-bit conv layer? Is there anything I need to configure in my hardware to take advantage of half-float performance? Am I mistaken about the compute capabilities of the RTX cards?

Any suggestions on how to further test this feature are welcome. Perhaps I could use a Volta card from GCP/AWS and see what happens when the same notebook is run.

jeremy · April 18, 2019, 11:55pm

@pcuenq which notebook are you using? Try Imagenette with 224px images.

Also, check that your transformations aren’t cpu bound. Make sure you install pillow-simd with libjpeg-turbo.

https://docs.fast.ai/performance.html#faster-image-processing

Dee · April 19, 2019, 1:27am

The large context is to capture long range dependencies think about GPT 2 (large) or OpenAi 5.

chho6822 · April 19, 2019, 2:19am

As I know, Tensorflow + Pytorch are used in most researches. TF have Python APIs. Pytorch is pretty much Python, it feels just like a Python module. The only framework with Julia APIs was MXNet. But MXNet was not a popular framework for research.

So, I think the reason for Python is because of lack of DL APIs for Julia.