Share your work here (Part 2)

(James Dietle) #34

Hey, tired of reading how everything went right? Want to see a bunch of AC/DC references shoved into an article?

Then look no further. I did the Kaggle VSB competition about power lines that went horribly wrong for me, and wrote it up here anyway.

That heatmap can’t be right at all!


(Stefano Giomo) #35

TfmsManager: visually tune your transforms

I’ve published a tool to quick visualize and tune a complex chain of data transforms (Audio & Image).


(Stefano Giomo) #36

How to have a swift kernel in colab:


(William Horton) #37

Ever since getting into deep learning, and making my first PR to pytorch last year, I’ve been interested in digging into what’s behind the scenes of the python wrappers we use, and understanding more about what’s going on at the GPU level.

The result was my talk " CUDA in your Python: Effective Parallel Programming on the GPU", which I had the chance to present at the PyTexas conference this past weekend.

I would love any feedback on the talk, as I’m giving it again at PyCon in ~3 weeks.


(Ivan) #38

Google colab template (link) - exported stuff taken care of so you can pretend you have a local jupyter with all the prev lessons.


(Stephen Johnson) #39

When the development began last fall on 1.0, I decided to try to write my own version in Swift so that I could learn more about how is put together and try to demystify the things that it does as well as learn Pytorch better. Also, it would allow me to continue practicing my Swift skills. Since a couple of the Part 2 lessons are going to be using Swift, I thought I’d share what I’ve created so far for anyone here who is interested. I thought it might be useful for those who want to see how things like callbacks, training loops, closures, etc. can be done in Swift as well as how to run Python and Pytorch code from within Swift. I’ve created a Docker setup for it for ease of installation as well as some examples that can be run. Those are: MNIST, CIFAR-10, Dogs Vs. Cats Redux (Kaggle), Kaggle Planet competition and Pascal VOC 2007. You can also submit the output to Kaggle for the 2 Kaggle competition ones. I’m going to try to add to the readme on how things are architected but for now it just has installation instructions and how to run the examples. Also, unfortunately it only supports CPU for now. Here’s a link to the repo


Help: The data block API in a swifty way
(Jeremy Howard (Admin)) #40

Thanks @stephenjohnson! Would love to hear if you see anything in our dev_swift notebooks that you think could be improved based on your experiences.


(Brad) #41

I think the medium link might be broken. It won’t let me click it


(Stephen Johnson) #42

I’ll take a look and let you know.



I did a small experiment that suggests as networks get deeper we should train them multiple times using different initialisation parameters and use a voting scheme for inference. Below is my rational. Interested in peoples thoughts.

In the previous lessons we learnt that parameter initialisation is very important. However, Kaiming initialisation is still derived from random numbers. Therefore, we should not assume we get a good starting position when we train a network. If we just make one attempt we could get unlucky. If we try multiple attempts it reduces our chances of starting off on the wrong foot. It also means we get to explore different state space in the network because they are designed to minimise loss and that goal starts after initialisation. So if we use different starting positions and save those models for inference we increase our changes of success (because we explored a broader space that allowed the models to collectively calibrate against the data).

I did a small experiment to show how this might play out with Kaiming initialisation. The left and right charts (and green and red histograms) represent the means and standard deviations of the parameter space for each consecutive matrix multiplication. I simulated 1000 initialisations and performed 20 (L) consecutive matrix multiplication operations. What is interesting is the range. As L increases the range in mean and standard deviation increases which suggests we are more likely to randomly choose an unlucky initialisation as L increases. FYI I used some of @jamesd code from his great blog

def kaiming(m,h):
    return np.random.normal(size=m*h).reshape(m,h)*math.sqrt(2./m)

data = []

inputs = np.random.normal(size=512)

for i in range(1000):

    x = inputs.copy()
    for j in range(20):
        a = kaiming(512, 512)
        x = np.maximum(a @ x, 0)

        data[i].append((x.mean(), x.std()))

fig, ax = plt.subplots(1, 2, figsize=(20,10))
ax[0].plot(data[:,:,0].T, '.', color='gray', alpha=0.1)
ax[1].plot(data[:,:,1].T, '.', color='gray', alpha=0.1)

Also a histogram plot.

import seaborn as sns

layers = []
means = []
stds = []

for layer in range(20):
    mean = data[:,layer,0]
    std = data[:,layer,1]
    l_values = len(mean)
    layer += 1
df = pd.DataFrame({'layers': layers, 'means': means, 'stds': stds})

g = sns.FacetGrid(df, row="layers", hue="layers", aspect=15, height=4), 'means', kde=False, bins=100, color='green'), 'stds', kde=False, bins=100, color='red'), y=0, lw=1, clip_on=False);


(Thomas Chambon) #45

Interesting work.

I have a question regarding your suggestion of training multiple NN with different init and ensembling their predictions.

After having a good init (mean close to 0 and std close to one through all the layers), with kaiming or LSUV, what is the point of training the same model multiple times and ensembling their predictions?
If I want to do ensembling, wouldn’t it be much better to train NN with different architectures or hyper parameters to get more diversity (as the goal of ensembling, if I understand correctly, is to get uncorrelated errors)?
I am not sure but I think it would be a more useful use of computation?



Hi, thanks for pointing it out. Just fixed it. :slightly_smiling_face:

1 Like


I purposefully did not mention ensemble because that comes with its own connotations. I don’t see this replacing ensembles. I see this as another option to try and improve a models performance. I guess the proof is in the results. When I get a chance I will try it and report back.


Having thought about it a little more it reminds me of how random forests work. A random forest produce many models (trees) and each model gets a vote. The randomness is in the features applied to each model. In the approach I’ve suggested the randomness is in each models initialisation parameters.

1 Like

(Joseph Catanzarite) #48

This is an intriguing idea, @maral. It would be interesting to see a pilot experiment in which you implement your idea of training models with multiple initializations and demonstrate improved accuracy or reduced training time or both!


(Martin Boyanov) #49

I had an interesting idea where you split the pretrained embeddings matrix in two groups: trainable and frozen. During training, you only update the indices in the vocab which were missing in the pretrained matrix and you leave the others frozen. This allows you to learn the domain specific word embeddings while leaving the more general language model components frozen.
Blog post



(Amit Kayal) #50

one of thing in deep learning always confuses me is to decide on proper use of weight initialization. I did lot of study on this but could not figure out any guidelines…


(Kushajveer Singh) #51

Created a new blog post on Semantic Image Synthesis with Spatially-Adaptive Normalization paper by Nvidia. It introduces SPADE a new normalization block for semantic image synthesis, and it achieves state of the art on various datasets for image generation.
In the blog post, I cover

  1. What is Semantic Image Synthesis:- Brief overview of the field.
  2. New things in the paper
  3. How to train my model?:- How Semantic Image Synthesis models work
  4. Then I dive into the different models that make up the SPADE project namely SPADE, SPADEResblk. Then I introduce Generator and Discriminator Models and the Encoder model for style transfer.
  5. Loss function is discussed in some detail and perceptual loss is also introduced with code. The original Nvidia code for loss function can be found here.
  6. There is a discussion on how to resize segmentation maps and how to initialize my model using He. initialization also.
  7. What is spectral normalization?:- When to use this normalization and a discussion on instance normalization.
1 Like

(Jason Patnick) #52

I’ve been wanting to do something like the debug callback Jeremy did for a little while now. Yesterday I came across this library ppretty and I’ve found it super helpful with printing out what each class has inside of it. This is what I did:
pip install ppretty
put this in

from ppretty import ppretty

def show_obj(obj, depth=1, indent='  ', width=100, seq_length=1000,show_protected=False,
             show_private=True, show_static=True, show_properties=True, show_address=False, str_length=1000):
    print(ppretty(obj, depth=depth,indent=indent,width=width,seq_length=seq_length,
                  show_properties=show_properties,show_address=show_address, str_length=str_length))

and put from exp.nb_000 import * in

now you’ll be able to see what each class has inside it and everything you can get to from inside the class. Here are some examples:


(Jeremy Howard (Admin)) #53

Nice! FYI **kwargs would make that def much cleaner.

1 Like

(Jason Patnick) #54

I had that, but it felt like the function was kind of contradicting itself since it wasn’t showing what it could take. So I went the messy route