[Invitation to open collaboration] Practice what you learn in the course and help animal researchers! šŸµ

I changed things up a little bit so that now everything is runnable on colab

If youā€™d like to run this on colab, you can go to the repository on github, click on one of the notebooks:

and you can click the link / badge at the top of the notebook to run on colab

image

2 Likes

Thanks Radek,

What program did you use to resample all the wav-files to 24 KHz?
For future projects would be handy to have a snippet of code that can resample really fast.

I read the data in and resampled using librosa and wrote it to wav files using librosa as well.

Here is the code:

rate=24414

for fp in wav_files:
    x, _ = librosa.load(fp, sr=rate)
    librosa.output.write_wav(f'macaques/{fp.parent.stem}/{fp.name}', x, rate)
2 Likes

@radek Thank you for doing an amazing job of presenting the problem and setting us up with a notebook! Your explanations and examples of the audio preprocessing are very clear, and make this deep learning problem much more accessible to non-domain experts. Also, I appreciate the shoutout to the ā€œTwo Heartbeats a Minuteā€ Invisibilia episode; I was thinking about that podcast when I saw first saw your post :slight_smile:

It would be sweet to have some sort of a ā€œleaderboardā€ so we know about how well we can do! Iā€™m finding that we can get to an error_rate of just 2.5% by using pretrained xresnet models. Somewhat surprisingly, fine_tune is perfectly happy to work with a pretrained network with Mish activations and self_attention layers shoved into the network (even though they were presumably trained without these modifications).

3 Likes

Is there any information about xresnet? Iā€™d like to know whatā€™s self attention and MISH activations ;).

I also played around with the dataset. When I saw the images of the converted audio files my first thought was, that there is not much information on it -> a lot of black space. so I tired to use a different visualization for the images. (I donā€™t know anything about converting audios to images so I just googled the code).

With resnet18 and without an tranfsormations (no item_tfms or batch_tfms - even resizing reduced the accuracy) I got 99% accuracy in 4 epochs.

(whatā€™s the best way to paste code here?)

def get_x(path):
rate = 24414
num_samples = 18310
n_fft = 1024
hop_length = 30

clip, sample_rate = librosa.load(path, sr=rate)
clip = librosa.util.fix_length(clip, num_samples)
stft = librosa.stft(clip, n_fft=n_fft, hop_length=hop_length)
stft_magnitude, stft_phase = librosa.magphase(stft)
stft_magnitude_db = librosa.amplitude_to_db(stft_magnitude)
img = scale_minmax(stft_magnitude_db, 0, 255).astype(np.uint8)
img = np.flip(img, axis=0) # low frequencies at the bottom
return img.astype(np.uint8)

Next thing I would like to try is clustering the audios so maybe we can find similar sounds of different macaques?

Florian

6 Likes

When I saw the images of the converted audio files my first thought was, that there is not much information on it -> a lot of black space.

Nice work! It makes sense that the Short-time Fourier Transform and the rescaling work wonders. Iā€™ll definitely try your preprocessing steps in my next experiments.

Is there any information about xresnet? Iā€™d like to know whatā€™s self attention and MISH activations ;).

The xresnet architecture is mostly based off modifications in the Bag of Tricks paper. Self-attention is also really useful across all tasks in modern deep learning (e.g., paper). Mish is an activation function written by @Diganta (see paper). A lot of this is encapsulated in a post by @LessW2020 regarding another Fastai competition.

(whatā€™s the best way to paste code here?)

Your code blocks are perfectly fine! Note that you can put ```python in the first row do get syntax highlighting/coloring.

def get_x(path):
    rate = 24414
    [...]
2 Likes

This is seriously amazing! Outstanding job! :blush:

I will put together a ā€˜leaderboardā€™ of sorts, linking to the notebooks in the repository, that is the plan :slightly_smiling_face:. It seems that you are starting to get some very interesting results. Non-mandatory in any shape, but if you were able to, could you please include a few words on the preprocessing that you are doing and whatever else you feel is worth highlighting in your approach? This doesnā€™t have to be super technical. Potentially what we are building here together can be very useful to others, any hints we can leave them along the way can be very helpful.

If you donā€™t think you would have much to write, that is also totally fine. Would love to if we could start putting the work online, adding work to the repo, and getting our feet a little bit wet with the whole process :blush: Happy to help in whatever way I can.

Anyhow, this is looking amazing. Huge congrats @florianl and @jwuphysics! And great info there on xresnets, thx for collating all this John. One thing I would add to the list is that a lot of this was also covered in part2 v3 (part 2 of previous iteration of the course), here is a link to the relevant part of the lecture.

3 Likes

I just found a fasta2 audio library that some forum members wrote:

Think we should check that out :slight_smile:

Hi @radek. Now Iā€™m the one dead in the water and need some installation help. Everything was working. Then I had to re-clone the entire repository ( 2 hours ago) because I messed up the notebooks.

  1. Opening Introduction shows:
Notebook validation failed: {'model_id': 'fa2397bc326c46209d9310a6b91f7425', 'version_major': 2, 'version_minor': 0} is not valid under any of the given schemas:
    {
     "model_id": "fa2397bc326c46209d9310a6b91f7425",
     "version_major": 2,
     "version_minor": 0
    }
  1. I see you are now using torchaudio instead of librosa to process the audio files. I installed torchaudio with:
conda install -c pytorch torchaudio
Downloading and Extracting Packages
torchaudio-0.2.0     | 1.9 MB    | ##################################### | 100% 
sox-14.4.2           | 743 KB    | ##################################### | 100% 
mad-0.15.1b          | 111 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

However, import torchaudio gives:

ImportError Traceback (most recent call last)
in
----> 1 import torchaudio

~/anaconda3/envs/fastai2/lib/python3.7/site-packages/torchaudio/init.py in
3
4 import torch
----> 5 import _torch_sox
6
7 from torchaudio import transforms, datasets, sox_effects, legacy

ImportError: /home/malcolm/anaconda3/envs/fastai2/lib/python3.7/site-packages/_torch_sox.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail36_typeMetaDataInstance_preallocated_7E

There are many forum and google hits on this torch_sox issue, but no solutions that I can comprehend. Any ideas?

At the least, maybe you could restore the librosa cell as an alternative, and I could move forward. Thanks for any help.

By all means, yes! They are doing an amazing job and have awesome tutorials in the repo on working with audio in general. Sometimes it is quite helpful also to do everything yourself, so that you can touch everything and see how it all fits together. I am also not really sure where they are with the port to fastai v2, but I 100% agree, without a doubt worth checking out :slight_smile:

Apologies Malcolm, not really sure what is going on there. If you are in the repo and would like to go to the earlier version, you can do this git checkout 4a0bb4152d566c1d79b97a8e3960777353ba0c68, this is the hash of the commit before I introduced torchaudio.

1 Like

Yesterday I tried to get the Audio module to work but ran into problems because of the different sample rates of the files. Now that Radek made us a new set of files it works like a breeze :-).

Used the tutorial notebook in de fastai2 audio repository and made some minor changes in the settings.

After just 4 epochs on Resnet 18, it comes to an error_rate of 0.004118

(I do not know how to paste code in here)

This is without any transforms, in 24 sec per epoch in Colab.

I will try to get the code available tomorrow as it is quite late now in my timezone.

2 Likes

pip install fastai2 fastcore --upgrade followed by
conda install -c conda-forge librosa worked for me! That is, I can now run the notebook (URLs.MACAQUES as well as import librosa problems, solved). Thanks so much @radek and @Pomo!

Now, I just need to take it easy on the Shift+Enter and figure out what the code means :slight_smile: .

1 Like

FYI, I have the same error. I think I will go back to the version without torch audio for now.

One thing important to note is to use black and white images by doing:
ImageBlock(PILImageBW), Iā€™m still experimenting but this proved better results so far :smile:

3 Likes

just tried the audio module ā€¦ I had worse results . could you please share your code? you can use the preformated text.

Just thought Iā€™d throw in a little quick example of xresnet etc. Note this is not pretrained model and what I found is that epoch for epoch itā€™s fairly similar minus a few training bits that youā€™ll notice. My current setup is:

  • Mish activation
  • Self-Attention
  • Label Smoothing Cross Entropy
  • Ranger optimizer
  • Cosine Annealing fit function

I also normalized our data by taking in the first batch of dataā€™s stats.

For architecture it was a xresnet18 where I modified the first input layer like so (we donā€™t have pretrained weights so itā€™s just converting the conv2d):

l = nn.Conv2d(1,32, kernel_size=(3,3), stride=(2,2),
              padding=(1,1), bias=False)
l.weight = nn.Parameter(l.weight.sum(dim=1, keepdim=True))
net[0][0] = l

In the first epoch alone I was able to get 7% error, with a finish of 2.8% however if you notice I wasnā€™t quite training properly or something because epoch 3 spiked to 18% error. Running another test now :slight_smile:

image

1 Like

Why would this step be required if it is not a pretrained model ? Arenā€™t the weights being initialized randomly

We need it because we donā€™t here. If you run this youā€™d need to rerun init_cnn(net) so they get initialized. Weights are already initialized on the call to xresnet18. I checked on this myself, see this under the __init__ of XResNet:

        super().__init__(
            *stem, nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            *blocks,
            nn.AdaptiveAvgPool2d(1), Flatten(), nn.Dropout(p),
            nn.Linear(block_szs[-1]*expansion, c_out),
        )
        init_cnn(self)

Thanks will check it out and get back. iā€™m still not sure i understand this. :slight_smile:

Iā€™m learning it too. Weā€™d actually want the weights from the original already initialized model here. So it would be:

w = net[0][0].weight
nn.Parameter(w.sum(dim=1, keepdim=True)

@barnacl another thing we can do is simply:

net[0][0] = nn.Conv2d(1, 32, kernel_size=3, stride=2, padding=1)
init_cnn(net)
1 Like