Thank you for that extensive response.
On part 2 I will leave that for others to decide for themselves.
On the matter of copyright I think it is a rather interesting question - there is an entire industry that facilitates the activities of first responders beyond the DOD, and these could and perhaps should expand our definition of what it means to be social entrepreneurs.
These are often for profit businesses - and this data set could go a long way in helping them improve their services that often save lives. While training models is “cheap”, productionizing and maintaining (concept drift) is not an inexpensive endeavour.
For those interested in thinking about the CC non-commercial license below is the CC opinion on the matter of databases and model training.
Frequently asked questions about data, generally
Which components of databases are protected by copyright?
With databases, there are likely four components to consider: (1) the database model or structure, (2) the data entry and output sheet, (3) field names, and (4) the data or other content.
The database model refers to how a database is structured and organized, including database tables and table indexes. The selection, coordination, and arrangement of the database is subject to copyright if it is sufficiently original. The originality threshold is fairly low in many jurisdictions. For example, while courts in the United States have held that an alphabetical telephone directory was insufficiently original to merit copyright protection, an organized directory of Chinese-American businesses in a particular area did.1 These determinations are very fact-specific (no pun intended) and vary by jurisdiction.
The data entry and output sheets contain questions, and the answers to these questions are stored in a database. For example, a web page asking a scientist to enter a gene’s name, its pathway information, and its ontology would constitute a data entry sheet. The format and layout of these sheets are protected by copyright according to the same standard of originality used to determine if the database model is copyrightable.
Field names describe the contents or data. For example, “address” might be the name of the field for street address information. These are less likely to be protected by copyright because they often lack sufficient originality.
The data or other contents contained in the database are subject to copyright if they are sufficiently creative. Original poems contained in a database would be protected by copyright, but purely factual data (such as gene names or city populations) would not. Facts are not subject to copyright, nor are the ideas underlying copyrighted content.
How do I know whether a particular use of a database is restricted by copyright?
When the database structure or its contents is subject to copyright, reproducing, distributing, or modifying the database will often be restricted by copyright law. However, it is important to note that some uses of a copyrighted database will not be restricted by copyright. It may be possible, for example, to rearrange or modify the uncopyrightable data in a way that does not implicate the copyright in the database structure. For example, while (as noted above) a court in the United States held that a directory of Chinese-American businesses was restricted by copyright, the same court went on to hold that a directory that duplicated hundreds of its listings was not infringing because the listings were categorized and arranged in a sufficiently dissimilar way. In those situations, compliance with the license conditions is not required unless the database contents are themselves restricted by copyright.
Similarly, even where database contents are subject to copyright and published under a CC license, use of the facts and ideas embedded within the contents will not require attribution (or compliance with other applicable license conditions), unless doing so implicates copyright in the database structure as explained above. This important limitation of all CC licenses is highlighted on the license deeds in the Notice section, where we emphasize that compliance with the license is not required for elements of the material in the public domain.
If my use of a database is restricted by copyright, how do I comply with the license?
All CC licenses require that you attribute the licensor when your use involves public sharing. Your other obligations depend on the particular CC license applied to the database. If it is a NC license, any regulated use must be limited to noncommercial purposes only. If a ND is applied, you may produce an adapted database but cannot share it publicly. If it is a ShareAlike (SA) license, you must apply the same or a compatible license to any adaptation of the database you share publicly.
Which components of a database are protected by sui generis database rights?
In contrast to copyright, sui generis database rights are designed to protect a maker’s substantial investment in a database. In particular, the right prevents the unauthorized extraction and reuse of a substantial portion of the contents.
How do I know whether a particular use of a database is restricted by sui generis database rights?
When a database is subject to sui generis database rights, extracting and reusing a substantial portion of the database contents is prohibited absent some express exception.
It is important to remember that sui generis database rights exist in only a few countries outside the European Union, such as Korea and Mexico. Generally, if you are using a CC-licensed database in a location where those rights do not exist, you do not have to comply with license restrictions or conditions unless copyright (or some other licensed right) is implicated.
Note that if you are using a database in a jurisdiction where you must respect database rights, and you receive a CC-licensed work from someone located in a jurisdiction without database rights, you should determine whether database rights exist and have been licensed. If so, you need to properly mark and attribute as the license requires, since the person from whom you received the database may not have been required to keep that information. If you are using a licensed database and you do not have to comply with the license terms because such rights do not exist in your jurisdiction, we recommend that you retain this information where possible. Doing so assists downstream reusers who are required to provide it when they share further.
What constitutes a “substantial portion” of a database?
There is no bright line test for what constitutes a “substantial portion”. The answer will depend on the law in the relevant jurisdiction. Note that what constitutes a substantial portion is determined both quantitatively and qualitatively. Also, using several insubstantial portions can add up to a substantial portion.
If my use of a database is restricted by sui generis database rights, how do I comply with the license?
If the database is released under the current version (4.0) of CC licenses, you must attribute the licensor if you share a substantial portion of the database contents. The other requirements depend on the particular license applied to the database. Under the NC licenses, you may not extract and reuse a substantial portion of the database contents for commercial purposes. The ND licenses prohibit you from including a substantial portion of the database contents in another publicly shared database in which you have sui generis database rights of your own. And finally, the SA licenses require you to apply the same or a compatible license to any database you share publicly and in which you include a substantial portion of the licensed database contents. Note that this does not require you to ShareAlike any copyright or other rights you have in the individual contents of the database.
Artificial intelligence and CC licenses
What are the limits on how CC-licensed works can be used in the development of new technologies, such as training of artificial intelligence software?
The licenses grant permission for reuse in any situation that requires permission under copyright. There are many ways in which CC-licensed work works and even all rights reserved works can be reused without permission. This includes uses that are fair uses, for example.
If someone uses a CC-licensed work with any new or developing technology, and if copyright permission is required, then the CC license allows that use without the need to seek permission from the copyright owner so long as the license conditions are respected. This is one of the enduring qualities of our licenses — they have been carefully designed to work with all new technologies where copyright comes into play. No special or explicit permission regarding new technologies from a copyright perspective is required.