GLAMs (Galleries, Libraries, Archives and Museums) fastai study group

Uh oh, I think I did the time zone conversion wrong (today I learned GMT is NOT the same as UK time) and missed today’s call. Were people able to meet and talk about Week 1?

Really sorry about the confusion Mike :frowning: I should have specified the time more clearly

A few of us did meet and discuss week 1. We’ll pick up our discussion of week 2’s lesson in ~2 weeks time. I’ll post a quick summary in the main thread tomorrow.

I have set up a public google calendar to share the call times. Hopefully, this will remove ambiguity about the call times. Sorry again!

Definitely not your fault at all, Daniel! I totally outsmarted myself by Googling 17:00 GMT, instead of UK time, which is exactly what you quoted multiple times.

I’ll try to tidy up and post my notebook from Week 1 to the GitHub repo. It will probably be of interest to some of the Kew folks on here, since I attempted to re-construct the herbarium sheet dataset used in this publication: https://bdj.pensoft.net/article/21139/element/5/3782317/ using some new Smithsonian image APIs. I’ll post a dataset link to the GitHub repo too.

I’ll see you all in 2 weeks, and I’ll try to recruit some Smithsonian folks to join too. See you then!

A quick reminder that we’ll have another catch-up call to discuss lesson 2 next Tuesday. You can find this meeting and the following ones in this google calendar.

See some of you next week :slight_smile:

The link for the call tonight:

https://turing-uk.zoom.us/j/91250832467?pwd=cW91aStWWFVXODdPVmo2Q0NBVDNVUT09

Meeting ID: 912 5083 2467
Passcode: 154566

We’ll use zoom this week, some people seemed to struggle to get sound etc working with jisti.

I meant to mention on the call this evening that there will be an intro to Icevision on the fastai discord forums this evening (now basically!) which might be of interest if you want to do object detection. IceVision: Presentation @ the Fastai Discord Forum

A reminder that we’ll have the call for lesson 3 next Tuesday (google calendar link to future calls).

I’ll post the link to the call here on Tuesday :slight_smile:

Link for todays call:

Topic: fastai4glams call
Time: Oct 6, 2020 05:00 PM London

https://turing-uk.zoom.us/j/98336659035?pwd=ODlrWlgxYVd5cFpIZm0rVU05ajZNQT09

Meeting ID: 983 3665 9035
Passcode: 459389

Hi all, I have to make my apologies as I can’t make the call tonight.

Also, I’ve put together a notebook inspired by lesson 1, doing a simple image segmentation example. What would be the easiest way to share it? Add it to the Github Daniel shared before?

1 Like

No problem, hopefully see you at the next one.

Very happy for you to make a pull request to upload it to that repository :slight_smile:

The discussion of resizing images in this weeks video, combined with doing some more work with IIIF recently sparked some interest in the possibility of using IIIF to resize images to prepare them for a training loop. One thing I noted is that some of the resize methods available in fastai are mirrored in the possible requests made via [ IIIF Image API 3.0 for example using IIIF request playground https://www.learniiif.org/image-api/playground

Original image https://stacks.stanford.edu/image/iiif/hg676jb4964%2F0380_796-44/full/max/0/default.jpg

Can be resized to 250x250 in various ways

https://stacks.stanford.edu/image/iiif/hg676jb4964%2F0380_796-44/full/250,250/0/default.jpg (squished aspect ratio)

https://stacks.stanford.edu/image/iiif/hg676jb4964%2F0380_796-44/full/!250,250/0/default.jpg (preserved aspect ratio)

This means you could offload some of the resizing operations to an IIIF server rather than doing it during the training loop. As an example to load images from a Dataframe containing a column with IIIF URLs and some labels can be done in a datablock:

import requests
def get_im(x):
    with requests.get(x,timeout=30) as r:
        return PILImage.create(io.BytesIO(r.content))

iiif = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    splitter=RandomSplitter(valid_pct=0.2),
    get_x=Pipeline([ColReader('iif'),get_im]),
    get_y=ColReader('label'))

This is probably not usually a good idea for the training part, since there will be more latency from grabbing images via the web than a local filesystem, but it might be an option if you are already running a IIIF server which has a fast connection to the machine you are training on?

I created a crude notebook to test the differences in training speed: https://github.com/davanstrien/fastai4GLAMS/blob/master/lessons/03_lesson/iiif_resize_experiment.ipynb

One other issue is that in this simple approach to using IIIF images, is that if any of the URLs are invalid/don’t return an image/you have a network issue, the training loop will break. Unless you are working with a huge volume of images, it’s probably going to be better to use IIIF to download the images locally before training. This could still take advantage of the possibility of making images a reasonable size to start with and reducing the amount of resizing operation which happens in the training loop.

@Danielvs, would you have any interest in moving the GitHub repo to the AI4LAM organization? I volunteered to set it up a while back and act as administrator, and it’s been sitting empty since then. I think this would be a great example use case for the types of content that belong there.

We can discuss more on the call in a little bit.

Happy to move the repo under ai4lam, it would be great if I could still have write permission to that repository if possible though/ Are you okay just to send me an email to confirm that I have the correct GitHub org etc?

Whilst I remember, on the topic of production/deployment, this twitter thread mentions some useful books on the topic of production. I suspect some of the books are too focused on ‘enterprise’ to be readily translated into (most) GLAM settings but some of the ideas if not the specific approaches may translate. I have skim-read Building Machine Learning Powered Applications and it seemed pretty approachable/practical in focus.

I actually sent you an invitation through the GitHub interface towards the end of our call today. That should give you the ability to create new repositories in the AI4LAM organization (as well as move existing repositories).

Let me know if you run into any difficulty with that.

I also sent an invite to @barnabywalker, since he’s a contributor to the current repo.

I also enjoyed Building Machine Learning Powered Applications as a practical walk-through of the machine learning data product lifecycle. I thought it did a good job of using the mindset they mentioned in the most recent FastAI lesson of first creating a baseline model for comparison.

Another great related resource is https://course.fullstackdeeplearning.com/. Lots of really practical advice throughout.

1 Like

Thanks :slight_smile: I decided to fork the repository into the ai4lam org and archive the current version. I’ve had issues with URLs breaking when moving GitHub orgs before so I thought this might be the best solution. I will add something to the README to make it clearer that anyone can make a pull request to add something to the repository. The new repository is here: https://github.com/AI4LAM/fastai4GLAMS

Hello All,

Daniel thank you for organizing this study group. I am a bit late to the course, but I am going to get caught up by October 20th’s meeting.

  • Who you are?
    My name is William Mattingly. I am a medieval historian by training. I have just joined at the Smithsonian Data Science Lab and USHMM as a postdoc fellow.

  • Why you are interested in machine learning/deep learning?
    I have used primarily TensorFlow and Keras for identifying sources in medieval texts.

  • Do you already have some potential problems you are (or would like to) use machine learning for?
    I am interested in exploring FastAi and PyTorch at my new position for various tasks, including NLP and text and image classification for the purposes of cataloging uncatalogued collections.

  • Datasets you are keen to work with? (either labelled or unlabelled)
    I am interested in working with both.

  • Is there anything that you think would help you get prepared to do follow the fastai course (i.e. you are a bit rusty with Python).

That’s great to hear, give me a shout if there is anything we can do to help you catch-up.

Nice to meet you :slight_smile: I usually try and post reminders of upcoming calls etc. here but if you want to get email reminders add your name here https://forms.gle/EGChZsYuEt9sJmFBA

Thanks! I just added my email to the list.