Interesting challenge project: xView2

(Jeremy Howard (Admin)) #1

This looks like a really cool project for deepening and testing out your DL skills:

It’s an amazing dataset, with 700,000 building assessment annotations. The goal of the challenge is to improve disaster recovery. If anyone has a go, let us know what you think here!


(Wayne Polatkan) #2

I’m in this. We’ll see how it goes.


(Ritwik Gupta) #3

Hey! I’m the first author on the xBD dataset. If you have any questions about the dataset or the baseline model, please feel free to reach out to me here, join the Discord at, and/or tweet at me @Ritwik_G.

There are so many use cases for xBD beyond building damage classification (which is the purpose of xView 2). If you have any interesting ideas, please reach out to me. I’d love to discuss the viability/provide help.

The preliminary paper that introduced the idea of the dataset, basically a design doc, is available here:

Many things have changed from that version. The final paper is currently under review, and I will put it up on arXiv soon. I’ll post the link here when it’s up.


(Nathan Yee) #4

I’m also one of the people behind the scenes for this project. I’ve mostly been working on the automatic metrics evaluation and verification parts of the challenge. Please feel free to reach out to me here or Nathan-xView on the discord if you have any questions!

Also, a somewhat relevant fun fact: the metrics code that scores everyone’s predictions was written by exporting a .ipynb using my version of that I made when doing this course!



I have a question about using this data in non-competition purposes - like lets say for a commercial detection project. The agreement specifies “You may use the data and content made available on the Challenge Website to prepare for and compete in the Challenge. The xView2 xBD dataset is released to competitors under a Creative Commons Attribution-Noncommercial-Sharealike 4.0 International (CC BY-NC-SA 4.0) license.”

What if I am not a competitor?

Also given fast’s recent focus on what it means to be a responsible researcher - lesson 7 of part one and the most recent posts on ethics:

I would be interested to hear overall thoughts on submitting to a competition sponsored by the “Defense Innovation Unit” - a group that has these companies in its portfolio:


(Ritwik Gupta) #6

The pre-print for the final paper of the xBD dataset is now available on arXiv at

1 Like

(Nathan Yee) #7

Thank you for your questions! I’m glad that people here are thoughtful about ethics as that is something I also care about.

  1. In order to download the data, you have to register as a competitor and agree to the terms. Feel free to reach out to DIU or Maxar/Digital Globe if you have further questions.

  2. The organization I work for has told me that we will only work with the Defense Innovation Unit on projects that are good for the world, such as humanitarian assistance and disaster recovery. If DIU (or anyone) want us to work on a project that doesn’t align with our values, we won’t work on the project. And if the company chooses to do so, I’ll leave the company.

1 Like

(Nirav Nikunj Patel) #8

Hello - my name is Nirav and I am running the competition at DIU.

Really appreciate your questions - to tag along to my colleague Nathan’s comments:

  1. We need to update the language on the website - but you are more than welcome to use the dataset for non-competition purposes as long as they do not violate that Creative Commons license. We will be releasing the dataset to the general public in its entirety (training, test and holdout) once the competition is over. So feel free to register and utilize the training and the test data that is available if you like - there is no requirement for you to make a submission to our leaderboard.

  2. Understood on your point on ethics - The aim of this competition is to explicitly generate accurate building damage assessments for pre and post natural disaster damage assessments, not for any other purpose. We developed the damage scale with the partners you see on our website such as California Air National Guard and FEMA.

Additionally, I think its important to re-emphasize that we will be releasing the dataset fully as open-source to the public with a Non-Commercial restriction after the competition as well. We are also prioritizing the only additional merit based prize for the Top Open Source algorithm. I think these decisions show intent from our side to improve the state of the art in computer vision as it applies to humanitarian assistance and disaster relief.

The federal partners we have for the competition have an operational interest in applying the award winning algorithms for natural and man-made disasters and not for other purposes, otherwise we would not have their support for this initiative.

The Department of Defense often takes a leadership role in humanitarian assistance and disaster relief within the United States and around the world, you can read more here:

You can also read more detail on our problem statement and intent with winning algorithms here:

Happy to answer any more questions!

Thank you,



Thank you for that extensive response.

On part 2 I will leave that for others to decide for themselves.

On the matter of copyright I think it is a rather interesting question - there is an entire industry that facilitates the activities of first responders beyond the DOD, and these could and perhaps should expand our definition of what it means to be social entrepreneurs.

These are often for profit businesses - and this data set could go a long way in helping them improve their services that often save lives. While training models is “cheap”, productionizing and maintaining (concept drift) is not an inexpensive endeavour.

For those interested in thinking about the CC non-commercial license below is the CC opinion on the matter of databases and model training.

Frequently asked questions about data, generally

Which components of databases are protected by copyright?

With databases, there are likely four components to consider: (1) the database model or structure, (2) the data entry and output sheet, (3) field names, and (4) the data or other content.

The database model refers to how a database is structured and organized, including database tables and table indexes. The selection, coordination, and arrangement of the database is subject to copyright if it is sufficiently original. The originality threshold is fairly low in many jurisdictions. For example, while courts in the United States have held that an alphabetical telephone directory was insufficiently original to merit copyright protection, an organized directory of Chinese-American businesses in a particular area did.1 These determinations are very fact-specific (no pun intended) and vary by jurisdiction.

The data entry and output sheets contain questions, and the answers to these questions are stored in a database. For example, a web page asking a scientist to enter a gene’s name, its pathway information, and its ontology would constitute a data entry sheet. The format and layout of these sheets are protected by copyright according to the same standard of originality used to determine if the database model is copyrightable.

Field names describe the contents or data. For example, “address” might be the name of the field for street address information. These are less likely to be protected by copyright because they often lack sufficient originality.

The data or other contents contained in the database are subject to copyright if they are sufficiently creative. Original poems contained in a database would be protected by copyright, but purely factual data (such as gene names or city populations) would not. Facts are not subject to copyright, nor are the ideas underlying copyrighted content.

How do I know whether a particular use of a database is restricted by copyright?

When the database structure or its contents is subject to copyright, reproducing, distributing, or modifying the database will often be restricted by copyright law. However, it is important to note that some uses of a copyrighted database will not be restricted by copyright. It may be possible, for example, to rearrange or modify the uncopyrightable data in a way that does not implicate the copyright in the database structure. For example, while (as noted above) a court in the United States held that a directory of Chinese-American businesses was restricted by copyright, the same court went on to hold that a directory that duplicated hundreds of its listings was not infringing because the listings were categorized and arranged in a sufficiently dissimilar way. In those situations, compliance with the license conditions is not required unless the database contents are themselves restricted by copyright.

Similarly, even where database contents are subject to copyright and published under a CC license, use of the facts and ideas embedded within the contents will not require attribution (or compliance with other applicable license conditions), unless doing so implicates copyright in the database structure as explained above. This important limitation of all CC licenses is highlighted on the license deeds in the Notice section, where we emphasize that compliance with the license is not required for elements of the material in the public domain.

If my use of a database is restricted by copyright, how do I comply with the license?

All CC licenses require that you attribute the licensor when your use involves public sharing. Your other obligations depend on the particular CC license applied to the database. If it is a NC license, any regulated use must be limited to noncommercial purposes only. If a ND is applied, you may produce an adapted database but cannot share it publicly. If it is a ShareAlike (SA) license, you must apply the same or a compatible license to any adaptation of the database you share publicly.

Which components of a database are protected by sui generis database rights?

In contrast to copyright, sui generis database rights are designed to protect a maker’s substantial investment in a database. In particular, the right prevents the unauthorized extraction and reuse of a substantial portion of the contents.

How do I know whether a particular use of a database is restricted by sui generis database rights?

When a database is subject to sui generis database rights, extracting and reusing a substantial portion of the database contents is prohibited absent some express exception.

It is important to remember that sui generis database rights exist in only a few countries outside the European Union, such as Korea and Mexico. Generally, if you are using a CC-licensed database in a location where those rights do not exist, you do not have to comply with license restrictions or conditions unless copyright (or some other licensed right) is implicated.

Note that if you are using a database in a jurisdiction where you must respect database rights, and you receive a CC-licensed work from someone located in a jurisdiction without database rights, you should determine whether database rights exist and have been licensed. If so, you need to properly mark and attribute as the license requires, since the person from whom you received the database may not have been required to keep that information. If you are using a licensed database and you do not have to comply with the license terms because such rights do not exist in your jurisdiction, we recommend that you retain this information where possible. Doing so assists downstream reusers who are required to provide it when they share further.

What constitutes a “substantial portion” of a database?

There is no bright line test for what constitutes a “substantial portion”. The answer will depend on the law in the relevant jurisdiction. Note that what constitutes a substantial portion is determined both quantitatively and qualitatively. Also, using several insubstantial portions can add up to a substantial portion.

If my use of a database is restricted by sui generis database rights, how do I comply with the license?

If the database is released under the current version (4.0) of CC licenses, you must attribute the licensor if you share a substantial portion of the database contents. The other requirements depend on the particular license applied to the database. Under the NC licenses, you may not extract and reuse a substantial portion of the database contents for commercial purposes. The ND licenses prohibit you from including a substantial portion of the database contents in another publicly shared database in which you have sui generis database rights of your own. And finally, the SA licenses require you to apply the same or a compatible license to any database you share publicly and in which you include a substantial portion of the licensed database contents. Note that this does not require you to ShareAlike any copyright or other rights you have in the individual contents of the database.

Artificial intelligence and CC licenses

What are the limits on how CC-licensed works can be used in the development of new technologies, such as training of artificial intelligence software?

The licenses grant permission for reuse in any situation that requires permission under copyright. There are many ways in which CC-licensed work works and even all rights reserved works can be reused without permission. This includes uses that are fair uses, for example.

If someone uses a CC-licensed work with any new or developing technology, and if copyright permission is required, then the CC license allows that use without the need to seek permission from the copyright owner so long as the license conditions are respected. This is one of the enduring qualities of our licenses — they have been carefully designed to work with all new technologies where copyright comes into play. No special or explicit permission regarding new technologies from a copyright perspective is required.

1 Like

(Christian Olivo) #10

I am interested in giving it a shot. However, i, a beginner in AI, am just in lesson 1. Would you advice me to do this xView2 as a side project or a Kaggle competition? I would think the Kaggle competition would have more guidelines to help you out along the way and at the end can see your competitor’s code to learn from. What do you think? would xView2 project be good for me, as a starting project? If it is, then i would be interested in making a small team so we could help each other out.

1 Like

(Nirav Nikunj Patel) #11

Hi Christian,

I don’t think its too late for you to join and to make a submission!

You can work with our baseline here:

And take a look at how we create metrics here:

Thank you,


(Wayne Polatkan) #12

I have a question on the baseline – asking here b/c it’s a lot easier to search answers than discord chat.

How did the CMU team pretrain their baseline model, and to what standard/performance? Checking the xView2-baseline readme, you used a fork of Motoki Kumura’s spacenet UNet repo. This looks like a custom UNet (tho interesting aside: he deletes the 2nd to last activations at each stage of the forward pass – I haven’t seen that before, don’t know if necessary).

Do you load any pretrained weights? ImageNet before pretraining on SpaceNet; any SpaceNet pretrained weights? Are you referring to any specific baselines besides your final baseline weights in the Training the SpaceNet Model section?


(Nirav Nikunj Patel) #13

Hi Wayne!

Did you see this paper preprint:

Might be able to answer some of your questions?

Let me know if there are more questions!

Thank you,