Great topic! I’m also working with remote sensing data, focusing now on the wildfires problem. There are many datasets available on Google EarthEngine (like Sentinel 1, 2, 3, 5, Lansat, VIIRS, Modis, and even weather forecasts from GFS), updated regularly. It can be useful to preprocess large datasets and only download the final images.
Cool, what aspects of the wildfires problem are you focused on?
I came across this a few days ago from the NASA Jet Propulsion Lab about the California fires:
Using Sentinel-1 SAR before/during comparisons to estimate structural damage. More maps & explanations here: https://maps.disasters.nasa.gov/arcgis/apps/MapSeries/index.html?appid=8014e6c744a945baa8700797ccffccf6
Here’s some more work mostly with SAR by the ARIA group at NASA/JPL: https://aria.jpl.nasa.gov/case_studies
I’m very interested to see how we could combine Sentinel-1 and 2 data via DL approaches to create this kind of change analysis/damage proxy map at higher accuracy & resolution.
This is a great idea. I’ve spent about 16 months (part-time) of research efforts in Deep Learning enabled Urban Performance Assessment and Design. I’d be glad to brainstorm and discuss with people with similar interests!
My goal is to combine multiple sensors at multiple resolutions to find burned scars and eventually to create products such fire damage rating. For now I’m focusing on moderate resolution (500m/1000m) before moving to higher resolutions with Sentinel-2 and Landsat. I think that combining all available sources of information will be very important in all problems mentioned in this topic.
i want to detect changes in time of some area for flood damage. multispectral would be perfect.
That’s great! Welcome!
I don’t know anything about Urban Performance Assessment and Design so forgive me for a basic question: do you do work related to urban heat island & other microclimate effects and design of urban environments to ameliorate (i.e. new green spaces)? If so, any thoughts on what deep learning applications you see there outside of the usual building/greenspace/land cover automated feature extraction from imagery?
Saw your mention of domain knowledge & expertise in “development of SOTA thermal comfort assessment workflows” from your Introduce Yourself post which was what prompted my curiosity and question.
Is this problem feasible for my undergraduate thesis?
Yes the initial studies will assess Heat Island Effects, generating urban-adjusted climatic data which then can be used to quantify (roughly and at a higher scale, e.g. 500mx500m) thermal comfort. This is let’s say the least computationally expensive study. The thermal comfort assessment method I mention is much (MUCH) more computationally expensive and really can’t scale at the level of the city, however it does provide assessment at scales that make sense (e.g. 2m, 4m grid).
This is the sort of the end game. Generate environmental data on the city level, and at a small enough scale (this does not exist at the moment), and use that to find correlations and causations with other, more readily available data at the urban scale. Hopefully, this leads to us training models that can say something about why an urban area performs better, be able to compare performance between areas, and can help us inform urban design.
Thermal comfort assessment, especially under Climate Change scenarios, will probably become the single most important environmental study on the urban level. So I see a lot of potential in bringing together various different data streams in an effort to find insightful representations that can lead to DL models that try to assess or predict comfort and by extension thermal stress potential.
Thanks for the very clear & thoughtful overview! Definitely see the potential and need.
I’m coming from the healthcare side of things so to translate what I understood into my mental model: I see acute on chronic human physiologic stress in response to elevated ambient temperatures (i.e. heatwaves on top of overall average increase in temps, especially at night-time over consecutive days) as a major public health concern that’s amplified by climate change. This will debilitate people disproportionately because of their different physio baselines (i.e. the elderly and already sick having already low capacity for recuperation being more severely affected) and of the built environment they inhabit on a hyper local scale (differences in heat island effect street by street, building by building).
If I understood correctly, this is similar to what you mean by thermal comfort & thermal stress potential assessment under Climate Change scenarios? Please feel free to correct any misstatements I may have made. Any good references you could share on the topic?
It seems like multi-input models (handling multi-spectral band, multi-scale, multi-modal) is a common thread running through many of the use cases people cited interest in on this thread.
@daveluo That is exactly what I meant yes. Thermal comfort assessment would focus on enabling urban design that provides a pleasant, comfortable and resilient outdoor environment. Thermal stress assessment is more focused on health risks to the population due to CC and how these are affected and distributed by varying urban environments. You put it in a much more eloquent way of course. I think the further ahead we move into CC impacts it becomes increasingly important, or rather necessary, to conduct these studies in much closer collaboration with healthcare and medical professionals.
As for your last comment, the working title of the overarching research I’m briefly describing here has at its focus exactly that term, multi-modal, which refers both to the different types of data available but also to the fact that all, or most, are crucial in themselves in figuring out this whole thing.
The great difficulty of all this stems not only from the scale and computational demands but also from really a lack of fine grained data. The bright side of things is that, from what I can gather from this discussion, the efforts of people working on the geospatial level can be used across quite different aspects/problems.
Glad to see Geo applications discussed here. @Gabriel_Syme I am really interested along the lines of your research and it will be great to discuss further. Thermal comfort in urban areas and how trees can have an impact is mainly my focus. Again, the challenge is as many mentioned here is how to use all available data and synthesize to a model in deep learning. I am following closely these discussions and hope some collaboration efforts will come out of it.
Hi @shakur I’d love to have a chat, pm your details if you want and we can find a time and place.
This repo https://github.com/dbuscombe-usgs/dl_landscapes_paper on landscape classification using NN looks interesting. A very do-able project would be to port this from Tensorflow to PyTorch.
Guys, I am trying to reproduce the amazing building segmentation work @daveluo did in the Zanzibar project.
The deep learning part is straightforward. However, since I have little to no previous experience in geographic data or GIS, I am having a really difficult time understanding the part of the code that uses rasterio to deal with the geotiff data. The idea of having an interactive map that showcases the result also seems daunting to me.
Is there any resource you would recommend to get started on geographic data and related tools/libraries?
Dave, @daveluo , if you don’t mind, would you share with us a little bit about your experience creating the inference notebook? Did you know all the details of how to use rasterio and handle geographic data when you started with the project? If so, is there anything you would recommend to get started?
Thanks George for the kind words and following the notebooks! Sorry for slow reply, been on the road and backlogged on all kinds of stuff.
Re: your Q on rasterio and other geospatial data-specific tools, no I didn’t have prior comprehensive knowledge of rasterio or geopandas. Quite the opposite, I’m figuring stuff out on the fly as the need to accomplish something (like a windowed read of a geotiff) pops up.
So my approach to each geo-processing step is probably suboptimal and definitely inelegant - what you see is usually the result of the 1st or 2nd thing that came to mind that worked well enough that I could move on. Whenever I come up with or see a better way, I’ll refactor. Please let me know of better ways to do things!
I found the geohackweek series of tutorials very helpful: https://geohackweek.github.io/schedule.html
Happy to answer any very specific questions on PM or publicly here if you think the discussion would be generally useful for others on the thread.
Re: the interactive web demo page, I’m overdue on pushing that to the repo (and nb for creating the training images & masks) but in short, I’m using mapbox GL JS to display raster baselayers (as geotiffs) and polygons (as geojson) as new layers on top. Their documentation and code snippets are very good so I pretty much cobble together and slightly adapt a bunch of their boilerplate code: https://docs.mapbox.com/mapbox-gl-js/examples/
Hey Dave, thank you so much for your detailed reply. You must be really good at learning stuff on the fly! I am still far away from that level yet.
I actually did run into a very specific question when training the segmentation model. It is about the choice of the loss function. Here you went for a combo loss function that is the sum of BCE and Dice. How did you come up with that ingenious idea? What is the reason that the choice is important here?
I didn’t come up with the idea of combining BCE and dice loss It was used by many top-placing kaggle participants in recent segmentation competitions (i.e. 1st place solution in 2018 data science bowl) so I thought I’d try it out.
Confirmed with some experimental runs that it worked better than BCE or dice loss alone in my case so I kept it.
Even w/o participating (I mostly haven’t), I try to review the top placing solutions write-ups and published code from relevant Kaggle competitions and other challenges related to geospatial data and computer vision. Great way to keep up with new techniques!
Here is a new satellite image competition being run as part of the Women in Data Science Datathon 2019, hosted by Kaggle. Great to see the inclusivity! The task is to predict the presence of socially questionable oil palm plantations in satellite imagery. It’s fairly ‘easy’ with a ‘short’ 4 week window. And there is swag to win, including easy-to-qualify cloud credit. A nice intro to Kaggle for freshly minted fastai alumni.
It is open to individuals of any gender, and teams with 50%+ female members.
It is pretty interesting. I used flat Cross Entropy, the default option of fast.ai, and still got a better training result than the combo loss in fewer iterations. If you are interested, I could send you my notebook.