Lesson 12 (2019) discussion and wiki

What do you foresee will be the first state-of-the-art applications using Swift for Tensorflow? It would be nice if the new framework first focuses on a subset of deep learning that’s up and coming so it can gain traction with communities who are not as strongly committed to their old development tools. I guess if fastai builds the libraries in the right order, it could have a big influence in how people will adjust their research workflow.

1 Like

I tried to educate myself on torch.jit today, found a few entry level overviews that gave me good understanding of how/when to use jit.script and jit.trace:

Some advanced materials that I haven’t read yet:


We do not want to use Python… Only reason I think Python will last longer… is the echo system of Python… Pandas, Numpy, Matplotlib etc… Not sure how much time will Swift take to reach that level of echo system… To me Swift is as good as Java or Scala :wink: I know Swift is heavily used on iOS and OSX development but not sure it has robust ecosystem for datascience.

Question is can we do Swift programming in Jupyter notebook? Are all data science packages like Pandas, Numpy, Matplotlib etc… available in Swift…

Julia had been around since 2009, why was it not adopted for DL/ML instead of Python earlier ?.

This is where @Moody question about context came from. We are interested in contexts >20,000 tokens, still v.small for genomics. would be interested in hering people thoughts about how we engineer for such large contexts?

What sorts of things are you trying to model? Depending on the structure of the problem it might not be necessary to deal with 20k+ contigs simultaneously.

Please write out acronyms that are not so common that everyone knows them. What is NER?

What is TDD?

Test driven development - https://en.m.wikipedia.org/wiki/Test-driven_development


In this context, NER == Named-entity recognition. :slightly_smiling_face:

1 Like

BTW, added TDD to the glossary as soon as it was mentioned a few times in the Part 2 discussions.

1 Like

Yes and yes: there is support for swift in jupyter and an interopability with python libraries.


Thanks Sylvain. So you are saying we will do data analysis using the Python library we know of… Then for model training we will use Swift?

I always forget. Is parameters the things we learn and activation the things we calculate?
parameters = weights
actiovations = outputs of layer

1 Like

See: https://github.com/hiromis/notes/blob/master/Lesson4.md#overview-of-important-terminology-13124

Activations and parameters, both refer to numbers. They are numbers. But Parameters are numbers that are stored, they are used to make a calculation. Activations are the result of a calculation﹣the numbers that are calculated. So they’re the two key things you need to remember.

So use these terms, and use them correctly and accurately. And if you read these terms, they mean these very specific things. So don’t mix them up in your head. And remember, they’re nothing weird and magical﹣they are very simple things.

  • An activation is the result of either a matrix multiply or an activation function.
  • Parameters are the numbers inside the matrices that we multiply by.

mixup is so awesome!! It’s like having the benefits of label smoothing + data augmentation all in one, should be really helpful in particular for fine-grained classification problems :slight_smile:


I’m testing the mixed precision notebook against my 2080 Ti GPU and I’m not observing any performance improvements. I injected a simple callback to verify that the weights of the conv layers are indeed of type torch.float16, whereas those of the batch-norm layers are torch.float32, as expected.

The 2080 Ti has a “compute capability” of 7.5, so according to this table, 16-bit performance should be good.

I’m not really planning to use mixed precision right now, but I thought this was odd and I’m curious about the limitations. Is this because our network is small? Could it be because a 32-bit layer always follows a 16-bit conv layer? Is there anything I need to configure in my hardware to take advantage of half-float performance? Am I mistaken about the compute capabilities of the RTX cards?

Any suggestions on how to further test this feature are welcome. Perhaps I could use a Volta card from GCP/AWS and see what happens when the same notebook is run.

@pcuenq which notebook are you using? Try Imagenette with 224px images.

Also, check that your transformations aren’t cpu bound. Make sure you install pillow-simd with libjpeg-turbo.



The large context is to capture long range dependencies think about GPT 2 (large) or OpenAi 5.

As I know, Tensorflow + Pytorch are used in most researches. TF have Python APIs. Pytorch is pretty much Python, it feels just like a Python module. The only framework with Julia APIs was MXNet. But MXNet was not a popular framework for research.

So, I think the reason for Python is because of lack of DL APIs for Julia.

1 Like