GLAMs (Galleries, Libraries, Archives and Museums) fastai study group

:rotating_light: A reminder that we’ll have the call for lesson 6 this Tuesday (google calendar link to future calls) As usual I’ll post the zoom link here on Tuesday.

Apologies - I can’t attend this one. (But look forward to meeting in fortnight’s time when I’ll be in hotel quarantine in Sydney and looking for relief from boredom and a view of the harbour!)
Susan

Zoom details for this evening:

Topic: fastai4glams call lesson 6
Time: Nov 17, 2020 05:00 PM London

Join Zoom Meeting

Meeting ID: 925 3405 2670
Passcode: 160613

No worries, see you next time and hope the quarantine is not too dull!

A note for the next session (lesson 7) thanks to @AdamF

There appear to be several issues in fastbook/clean/09_Tabular.ipynb that have been reported in the forum, but not in github issues. This issue aggregates them together.

I confirmed that all of them occur in the version included in the fastdotai/fastai-course docker image of 19-Nov-2020. I presume that the official docker image version of the course should run through cleanly without any errors.

  • Some supporting modules are not installed: pip install kaggle waterfallcharts treeinterpreter dtreeviz (forum article)
  • Downloading the kaggle file bluebook-for-bulldozers does not appear to work from python. There are multiple reports of this (including here) with the workaround being to download manually via the browser or via the commandline (kaggle competitions download -c bluebook-for-bulldozers).
  • The code to download also fails because it tries to create a directory but needs a parents=True added to path.mkdir (see).
  • The load/save pickle method should be changed to load_pickle/save_pickle (reported a couple of times on the forum including here)
  • m_rmse(m, xs_filt2, y_filt), m_rmse(m2, valid_xs_time2, valid_y) raises an error. xs_filt2 should be xs_filt. as in m_rmse(m, xs_filt, y_filt), m_rmse(m2, valid_xs_time2, valid_y) (see here).
  • procs_nn = [Categorify, FillMissing, Normalize] causes an error in the following line. The suggested work-around is to remove Normalize from the list (see here)

I haven’t tried running the notebooks yet but hopefully the above will help with debugging any issues you come across

1 Like

Hey @Danielvs. Nice Summary here.
However for the Normalize issue workaround, this solution seems to work better.

thanks for sharing that :slight_smile:

1 Like

:rotating_light: A reminder that we’ll have the call for lesson 7 this Tuesday (google calendar link to future calls) As usual I’ll post the zoom link here on Tuesday.

Apoogies @Danielvs - I won’t make it to the mtg today.
Susan

Link for the call this evening:

Join Zoom Meeting

Meeting ID: 984 9293 3442
Passcode: 474931

No worries, see you next time

Sorry - looks like I’m not going to be able to make it this time. See you next time!

Hi all,

I suggest we take a break with the calls until the new year and start back up on the week of the 12th of January. I personally have some course homework catch-up to do so I will try and use some of my upcoming holidays to get caught up!

I’ll make sure to post a reminder in Jan for the first session of the new year.

I think that’s a great idea. I actually just got caught up on watching all of the lectures on YouTube over the weekend, but it would be nice to go back and try some of the code assignments.

Just to clarify (and maybe for the benefit of anyone new who stumbles upon this thread), which lesson will be covering on January 12?

I hope everyone here has a great holiday break!

I suggest we pick up again from lesson 7 :slight_smile:

Happy new year everyone! Hope everyone managed to have a good break. Just a reminder we’ll meet again on January 12th to discuss lesson 7. As usual, I’ll post zoom link on the day.

There is currently a call from European for the assembly of Artificial Intelligence/ Machine Learning (AI/ML) datasets drawn from the extensive collections on the Europeana website.

I hope some of you will submit an application :slight_smile:

Zoom details for later

https://turing-uk.zoom.us/j/94966641522?pwd=TXR1VW4zWXRhNlhuemdmOGprR29GUT09

Meeting ID: 949 6664 1522
Passcode: 428917

Hi,

I’m afraid I can’t make the call today due to a conflicting call. I’ve just watched lesson 7 which I found really useful and more approachable than some of the other sessions. One thing I’d like to hear if your able to discuss it on the call is, examples of Cultural Heritage datasets that might work well with these tabular prediction methods.

Thanks hope the call goes well.

Glen

I have worked with one very nice tabular dataset at work which should be made public at some point. I will ask for an update on how that is going.

I think one potential in a Cultural Heritage is predicting additional metadata based on already available metadata. I think often the tabular part of the data might also be useful as an additional signal for another type of model e.g. for a text classification model it might be useful for the model to also know the date of publication, publisher etc. A nice example of this (which is much simpler than concatenating different types of models) is outlined in https://www.novetta.com/2019/03/introducing_me_ulmfit/.

There are a few datasets derived from BL’s 19th Century books collection that contain tabular data + some genre predictions: https://bl.iro.bl.uk/work/ns/ff82a4ff-12a3-4abe-8108-2c9b1172ccc4. The labels are definitely noisy but it’s a sort of interesting experiment to see whether some of the tabular fields can give enough information to predict the genre of a book.

The web archive classification dataset is also semi tabular but I think it’s probably tricky to get a good performing model on that but it might be fun to try!

I’ll keep an eye out for more tabular data. It might be nice to pick one or two for us to work on and compare results :slight_smile: