Hi all,
it is good to be here and many thanks for the initiative!
- Who you are?
I am Péter Király, researcher and software developer at the Göttingen campus computer facility. Previously I worked at different LAM institutions (including Europeana) as software developer. I am also an editor of the Code4Lib Journal, so if you plan to publish your result, please consider this forum as well. - Why you are interested in machine learning/deep learning?
My research question is how can we decide if a given metadata record is good or bad. For this I am working with data science techniques including some (unsupervised) machine learning algorithms. For me working with Big Data makes the problems even harder to solve, since lots of Data Science/ML techniques requires special hardware resources, which I do not have access. - Do you already have some potential problems you are (or would like to) use machine learning for?
Just to name a few: Pattern recognition, such as make distinction between metadata values created for machines and for people. Finding similar records. Finding if all the important entities in a record are under authority control. - Datasets you are keen to work with? (either labelled or unlabelled)
I worked with Europeana dataset (I made it downloadable at rnd-2.eanadev.org/europeana-qa/download.php?version=v2020-06), and several libraries’ full MARC catalogues (github.com/pkiraly/metadata-qa-marc#datasources). These are unlabelled datasets in special metadata formats.