I’m working through the FastAI book and learning lots.
In Chapter 5, it mentions ‘localisation’:
“We can see that this dataset provides us with images and annotations directories. The website for the dataset tells us that the annotations directory contains information about where the pets are rather than what they are. In this chapter, we will be doing classification, not localization, which is to say that we care about what the pets are, not where they are. Therefore, we will ignore the annotations directory for now.”
Does anybody have more information on this, please?
For example, supposing I would like to introduce further information in relation to an image, how would I go about doing this?
There’s another reference in Chapter 5:
Now if we are going to understand how to extract the breed of each pet from each image we’re going to need to understand how this data is laid out. Such details of data layout are a vital piece of the deep learning puzzle. Data is usually provided in one of these two ways:
- Individual files representing items of data, such as text documents or images, possibly organized into folders or with filenames representing information about those items
- A table of data, such as in CSV format, where each row is an item which may include filenames providing a connection between the data in the table and data in other formats, such as text documents and images
Hi John, great to hear you’re finding fastai useful and learning a lot! I’m not really sure what your question here is, or what you’re trying to do. Are you trying to learn about localisation (object detection) model? There will be plenty of references on this forum if you search for object detection, most recently the IceVision library
Cheers and good luck in your learning journey!
Thank you for your reply. It is much appreciated.
I wasn’t really sure what to ask but then I came across this paper
I did of course mean location context!
Any idea whether the FastAI API allows for additional data to be added to the model as an input?
@wildman Definitely! You’ll need to use the mid-level API though and write some code potentially to make it work. For example combine image encoding with location encoding by concatenating it, and then feeding into a linear layer. Here is an example of mid-level API, but you’ll need to do a few modifications: https://docs.fast.ai/tutorial.siamese.html
Thanks again! I’ll take a look
Just realised that’s under the ‘advanced’ tab