Share your work here ✅

After some effort, I got it partially working. It reached 92.1% accuracy (0.3% bellow my previous model). Some remarks:

  • I think there is some bug in my code, since when I try to run fit_one_cycle, after unfreezing and lr_find I get a “can’t optimize a non-leaf Tensor” error (it only happen when unfreezing and running lr_find before training). I wasn’t able to find out what is causing it.
  • I only created two groups layers: the first containing the tabular and nlp models and the second containing the last linear layers. So, it is leveraging just partially on discriminative learning rates, using the same LR for all the layers inside tabular and nlp models. I think this can be specially harmful for the nlp model, updating the weights aggressively in the early layers, destroying part of the pre-trained weights.

Edited: I managed to set the layer_groups properly. In addition to minor tweaks (increase wd and dropout), I reached 92.3%).

I’d love to get some feedback and possible improvements.

4 Likes

Hi All,
Yet another update on my satellite project. Yesterday’s lecture about PCA got me thinking about latent space representations for urban characteristics of cities as seen from space.

I tried doing PCA on the later layer’s vector representations of cities, but the results were a little disappointing, so in the end I used U-MAP to do my dimension reduction. It’s much faster than T-SNE and I thought the results were pretty cool.
I also flattened the U-MAP representation to a grid using the lapjv python package which finds the grid representation of distance maps using some fancy algorithm.

Here is the result:

The actual image is 80MB so here are some interesting higher res areas:

Ochre roofs and twisty roads:
image

Big backyards:
image

Dense and arid:
image

Grids:
image

You can check the notebook out here

30 Likes

Very interesting!
Thanks for sharing the notebook and the data plus the scrapper as well
PS 37MB Jupyter Notebook!
Biggest of all I have seen till date

Nbviewer Link Of the Same Notebook https://nbviewer.jupyter.org/github/henripal/maps/blob/master/nbs/big_resnet50-pca.ipynb

Using this we can now detect Roof Top Swimming Pools maybe…
Just saying!

1 Like

Sure Amit, let me know if you face any issue during deployment.

@flavioavila particularly interested in how you did the preprocessing/padding etc

Hi everyone,

I’m excited to share new developments on segmentation & classification of buildings from drone/aerial imagery in Zanzibar, Tanzania:

Updated building classifier to work with fastai v1.0.28 and trained the model to >94% accuracy. Looking at train/val loss, could keep training to get even better accuracy. Almost every top-loss prediction error is because of mislabeled ground truth data so some ImageCleaner action would definitely help:


image

Added Grad-CAM code from @henripal (much thanks for the discussion & notebooks!) to aid in model diagnostics and interpretability. Exploring under the hood to see what insights and ideas for improvement we can get about the model’s behavior.

Here’s looking at individual feature map channels ranked in importance as the sum of gradients for that channel calculated by back-propagating from the one-hot-encoded class prediction (a “Complete” building in this case). In other words, what specific feature maps (out of 2048 channels in my case) are most important to classifying this image as “Complete” and what are they looking at (activating on):

image

I’ve also created a new segmentation model based on lesson-3 camvid which handles the first part of the overall task (segment every pixel as building or not building). As mentioned in my 1st post, I originally used the old fastai v0.7 library so I’m bringing everything up to v1 now. Performance is not up to that of my old model yet but I’m haven’t done any optimizations. Off-the-shelf fastai v1.0.28 for binary segmentation using a pretrained resnet34 encoder already gives pretty satisfying results (pixel-wise accuracy > 0.98, dice score > 0.85):

image
image
image

Here are both notebooks:

I’ve also open-sourced my pre-processed training data for both models so anyone who wants to work with these notebooks can have ready-to-train data without going through the somewhat-niche data prep to convert geoJSON polygons into binary masks or crop individual buildings into images. For anyone interested in those steps, I will post them as well in future notebooks. Data download links and more info at the repo here:

Much thanks to the Commission for Lands, Govt. of Zanzibar and WeRobotics via the Open AI Tanzania competition for the original drone imagery and training labels. They are made available under Creative Commons Attribution 4.0 International license.

My overall project goal is to keep improving these models and publish notebooks showing the DL model development as well as all the intermediary geospatial data processing steps to make this:

Interactive demo link: http://alpha.anthropo.co/znz-119

Dave

55 Likes

Awesome @flavioavila! glad to hear that I was somehow helpful :slight_smile:

I tried on the Collaborative Filtering from class 5, on this BookCrossing dataset. I barely copy-pasted jeremey notebook, it’s insane how copy-paste work so well in DL lol
My problem though, is I’m not sure what learning rate to pick when looking at the following plot:
learning_rate
I choosed learn.fit_one_cycle(5, 1e-1) but 0.1 seems to high to me, no?

Now the first results are no so good:

Total time: 19:53
epoch  train_loss  valid_loss
1      15.048554   14.940901   (03:58)
2      15.467329   15.421408   (03:58)
3      14.969577   14.875525   (03:59)
4      14.105888   13.993391   (03:58)
5      13.375766   13.295375   (03:57)

I need to make my head around this, I am not sure how I could interpret the following diagram yet

What are the x and y axis? in the class they were not detailed enough! Here is the notebook https://github.com/dzlab/deepprojects/blob/master/collabfiltering/Collaborative_Filtering_Book_Recommendation.ipynb

1 Like

is there a way to get the coordinates of any of those identified segments ?.

1 Like

Remember MSE is mean squared error. So take sqrt to get something more interpretable. Although you may want to use L1 loss as the loss_func or a metric since that’s easier to understand.

1 Like

I wrote a post summarizing Jeremy’s interview with deep learning researcher Leslie Smith last week:

I discuss:

  • What propelled Leslie down the path toward creating the one-cycle policy.
  • Leslie’s current research direction.
  • Why Leslie’s previous research is useful to me as a practitioner.
  • Why Leslie’s journey is personally meaningful to me.

Thanks for checking it out,
-James

10 Likes

I wrote a blog post about my Thanksgiving Cousin Classifier if you want to check it out!

7 Likes

Nice!

Yes we can. Thresholding, polygonizing, & geo-referencing predicted segments will be the focus of a future notebook.

The short version is that I’m following a Slippy Map tile name convention used by OpenStreetMap and other web maps which enables easy conversion from tiles to lat/lon coordinates and back.

For example:

Each image & corresponding mask in my segmentation dataset has a filename that starts like “grid_001_19_319363_270514_…”:

After “grid_001_”, the numbers correspond to {zoom}_{xtile}_{ytile} information. So in this case, we can use the provided num2deg() conversion function to get the NW corner longitude & latitude for this particular time:

# https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames#Python

import math
def num2deg(xtile, ytile, zoom):
    n = 2.0 ** zoom
    lon_deg = xtile / n * 360.0 - 180.0
    lat_rad = math.atan(math.sinh(math.pi * (1 - 2 * ytile / n)))
    lat_deg = math.degrees(lat_rad)
    return (lat_deg, lon_deg)

zoom = 19
xtile = 319363
ytile = 270514

lat, lon = num2deg(xtile, ytile, zoom)
lat, lon
(-5.737609279566592, 39.28916931152344)

Putting this into Google Maps or OpenStreetMap will bring you to that tile’s NW corner location at that zoom level: https://www.openstreetmap.org/#map=19/-5.73761/39.28917

Doing this for all 4 corners of the tile gives you the geo bounds of that tile.

To assign lon/lat coordinates to detected buildings, we have to match up the display (x, y) coordinates of predicted pixels relative to the lon/lat bounds of that tile. This would happen during the polygonizing step. Too much detail to go into that here (and it’s not really specific to deep learning) so stay tuned for that notebook.

If there’s interest, I can also start a new wiki thread for us to share know-how, applications, and new ideas on geospatial deep learning. I’m learning too (aka fumbling my way into making things work) and I’m sure there are many sources of experience & wisdom among us to share and build on!

10 Likes

That would be great!

4 Likes

I created a notebook to explore visual representations of embeddings.

https://www.kaggle.com/tamlyn/animating-embedding-space

Using a toy problem I created an embedding layer with three dimensions then animated them on a 3D scatter plot. It’s interesting to see how they gradually move from random positions into some kind of order. I just wish I could keep adding dimensions :hypercube:

Please share if you fork the notebook and do something cool, I think there’s lots more that can be done with this.

5 Likes

Thanks, feel free to grab the updated HTML templates from the repo for your project :wink:

3 Likes

Hi, etown

I uploaded the notebook to github: https://github.com/flaviorainhoavila/IEMOCAPspeechEmotionRecognition

I borrowed the audio-spectrogram conversion from https://dzlab.github.io/jekyll/update/2018/11/13/audio-classification/

In order to run the nobeook you’d need to request the data from https://sail.usc.edu/iemocap/

Cheers

3 Likes

Done! https://forums.fast.ai/t/geospatial-deep-learning-resources-study-group/31044

Could you please wikify that post when you have the chance?

2 Likes

I’ve belatedly added the training notebook behind my car classifier app http://whatcar.xyz.

The only real trick I used was to sample an equal number of images from each class, and resampling between fits.

Even though I got 80% accuracy in validation, I only got around 50% in the real world. One reason for this is my photos are different to professionally taken photos from the internet (lighting, obstructions, etc), another is that not all cars are equally likely to be seen. Something to keep in mind if your production data is going to be different to your training data.