Just wondering if anyone has any interest in joining up for the reproducibility challenge this year?
The primary goal of this event is to encourage the publishing and sharing of scientific results that are reliable and reproducible. In support of this, the objective of this challenge is to investigate reproducibility of papers accepted for publication at top conferences by inviting members of the community at large to select a paper, and verify the empirical results and claims in the paper by reproducing the computational experiments, either via a new implementation or using code/data or other information provided by the authors.
Theres plenty of time until submissions, (Dec 4th is early submission deadline, Jan 8th is late submission deadline). Could be fun to tackle a paper that has shown promise and would be a useful addition to fastai
Comment below if you think youāll have a bit of time to spare and what paper you think could be worth reproducing
I would love to do thatā¦ but I dont have datasource or paper to replicate.
But Im interested in ASR (automatic speech recognitionā¦ is STT even a termn?) and TTSā¦
If you people know something that would be nice to reproduce, let me know (if possible but not restricted to main languages with little corpora [in audio or textā¦ example no wikipedia source])ā¦ so that I can choose.
I was thinking about the Reformer, which was presented at ICLR 2020 and could be an interesting and challenging paper to replicate. But Iām also open to other ideas
@DanielLam Probably keeping much of the boilerplate in fastai I think. For example if it is a new model architecture proposed then that can probably be dropped in to fastai pretty easily without too much concern that we donāt use the exact same Pytorch Dataloader, Dataset etc (as long as the preprocessing/augmentation etc are the same of course)
@tyoc213 ASR/TTS are super interesting alright. Iāve been focussed on NLP recently but would enjoy dipping into another area too
@stefan-ai Yes Reformer is very cool, it would be a lot of fun to implement alright. Just wondering when it comes to experiment replication would the GPU compute needed be too much? But happy to give it a shot if you think its manageable! HuggingFace also had a nice blog explaining it: The Reformer - Pushing the limits of language modeling
(@Richard-Wang you should definitely enter your ELECTRA work to the reproducibility challenge too, youāve done phenomenal work on it so far!)
Iāll have a look around at a few NLP/ASR/TTS papers and try find some interesting, useful & low resource ones to replicate. Iāll share here when I do. Probably the Good Readings threads have a few contenders:
Right, thatās a good point I will have a closer look at the paper these days keeping this in mind. Btw, course 4 of the new deeplearning.ai NLP specialization includes an implementation of Reformer in trax. It will certainly be very challenging from a implementation point of view too.
Do you have any other idea which recent NLP paper we could consider?
I m down for it currently working on classification of Covid Detetction using deep learning implemented in Fast AI and trying it to improve the results of a previous paper published on 26 April which had a baseline of F1 score of 97.31% so either gonna be getting closer to itā¦ or gonna try to beat it
This has been in the back of my mind since ICLR, they improved transformer performances by using a new positional embedding type. There were a couple of varieties if I recall, the second also modified the entire Transformer to be able to deal with comlex numbers.
But the results seemed pretty impressive given that it was just a change to the positional embedding. This, or another positional embedding kind of paper, could be worth looking into.
This was another interesting one with a funky layer structure. Again maybe training duration might be an issue, but at least one of their results was trained on a 1080 GPU! But then others were trained on 16 V100s sooā¦worth a look at least!
This sounds interesting! Most of the recent transformer papers are prohibitively compute intensive I guess. Lately Iāve been lookin into transformers for CV, such as this one for example. But not only do they use a ton of compute, but also a private datasetā¦
From the little testing we did, it didnāt perform that well, from the paper it seems to only shine when given huge volumnes of data. However the architecture is super simple so I feel there is a lot to be improved
enwik8: 100 MB (but Iām not entirely sure about this one)
I guess compute requirements and data set sizes are not too large in the world of transformers, but might be still prohibitively large for us here. Not sure how far 300 USD credits on GCP would get us.
I recently came across this program here which gives free TPUs to certain research projects (no idea if this would qualify though): TPU Research Cloud
That paper sounds very interesting too. Would be especially nice to replicate since we would need to work with different models, i.e. fasttext, LSTM, CNN and Transformer. Any idea what the resource requirements would be here?
Can I join. Iām trying out some self supervised learning papersā¦like Simclr and other image representation models like cgd.I am also making a fastai implementation of it as a open source repo. But my models are not working yet thoughš . Iāll try my best to make it work.
@stefan-ai Iām actually doing a little work with Reformer at the moment (well getting a notebook to work with it), so maybe it might be worth giving it a shot. I have a 2080 ti and theres kaggle gpus and tpus and some GCP creditsā¦ With some careful checkpointing and a little patience we might be able to get it done.
Re the cost of the complex embeddings paper, the experiments with the smaller/older models should be fine, but they also have some experiments with Transformer XL, which might be prohibitive to tryā¦
Iām happy to give Reformer a shot, if nothing else weāll learn a huge amount with all of the new ideas they introduced!
@Dean-DAGs the more the merrier! Do you have a particular area or research or paper you are interested in?
@morgan I have not seen the repo but I finished implementing some unsupervised and semi supervised models like SimClr and CGD for image representation learning and unsupervised learning. This is my repo https://github.com/Samjoel3101/Self-Supervised-Learning-fastai2 check it out and tell me you views on it.
Umm, actually I donāt really mind, Iād just be happy to join people working on something and pitching in as I can. To be completely transparent, Iām working on a platform for data science collaboration (https://dagshub.com), so this would be a great learning experience for me.
Then maybe we can organise a quick call on the Discord server to draw up a plan and divide responsibilities? Maybe on Thursday if it suits? Thatāll also give us a bit of time to review the paper again in a bit more depth
Resources
For anyone interested in joining Reformer Reproducibility, be sure to get familiar with it: