I think I’ll tag along if that’s fine!
Myself and @stefan-ai are talking today at 7pm GMT on the “General” voice channel on Discord about where to get started on Reformer if anyone would like to join
Hey, guys! I’ve joined today during a call. Looking forward for the thing!
I also joined the Call today, this will be great Learning experience for me.
Thx for having me,
Looking Forward to the thing!!!
Thanks so much for the discuss yesterday on getting started!
I have documented what I think we agreed on as well as starting an initial list of experiments to here: https://docs.google.com/document/d/1wF83E3B3yXIGZixEgOUJI2T2XXhT1DVCrPXS5Dbsyh8/edit?usp=sharing
Feel free to comment and add to this! If you have a better suggestion on where to document our work and share knowledge thats easier than Google Docs feel free to suggest
I also created a repo for all our code which will be where we’ll host our final challenge submission, DM me here so I can add you as a contributor https://github.com/morganmcg1/reformer-fastai
For our next chat drop your preference in the poll below
- Wednesday Nov 4th
- Thursday Nov 5th
- Friday Nov 6th
- Saturday Nov 7th
Lmk if you have room for one more This sounds like fun.
If not, no worries.
While searching for related stuff I stumbled on this report from Stanford cs224n project. Reporters failed to achieve good results with Reformer on SQuAD 2.0 and complain about slow training:
As we proceeded to adapt the Reformer (a sequence-to-sequence model) for the question-answering task, we found some persistent problems that the paper failed to address. First, the sheer amount of time that the Reformer takes to train is quite long: with default parameters on a standard NV6 GPU,each epoch would take 11 hours! Although the Reformer does save memory, it trains slower than a standard Transformer, mostly because of the time taken hashing, which greatly affected how much training we could do for this project. We tried two models: one that used only one embedding, one Reformer layer, and a masked softmax, and another one where we started with a full BiDAF model that replaced its RNN modeling layer with a Reformer layer. The first model failed more or less immediately. The second model’s result was promising, but training was slow, with smaller gradients per-epoch compared to BiDAF.
Wonder if the problem was in reformer or in their approach, though.
I missed the first conversation but would be happy to join the next and pitch in. What time will the call be? The poll only has dates.
Sorry for the radio silence, was prepping for an interview
@wgpubs yes please join, would be fantastic to have you on board!
@Dean-DAGs Thursday looks like the most popular day, last week as 7pm GMT, so maybe we stick with that for now…
@arampacha Thats super interesting, could be an extension of our work to verify their experience, we could look performance and also timing maybe…
Next Meeting - Thursday 7pm, GMT
Based on the poll results, below are the details for the next meeting
I still have to deliver some clean, baseline Transformer code, from which we can start to tweak bits and pieces to explore the ablation experiments, working on it shortly
Feel free to send me your github username (here or as a DM) and I can add you to the project
Meeting ID: 871 0366 0541
Apologies but I won’t be able to make it today, am on a little trip but the internet is really poor.
If someone could share their own zoom link that would be really appreciated!
Re baseline transformer, @lucidrains has a great simple implementation of the ViT transformer which could serve as a good baseline, you just have to ignore the image embeddings at the start and create your own.
In my notebook below which uses Pytorches nn.transformer I create the positional embedding so some of the below code could be copied.
I’ll be back with better internet tomorrow and will be able to spend time on this then, hopefully you can make some progress today, if anyone can get a baseline transformer up and running tomorrow with one of the datasets (e.g. enwik8-64K) that would be great!
Apologies again for the late notice!
Hi everyone! If nobody is up to setup a zoom call shell we use general voice channel again? Another option might be to wait for Morgan to organize us when he has time
@morgan also I have a question, I found this post by you on performance of huggingface datasets. I tried training lucidrains reformer and observed drop of speed when using larger datasets. Did you resolve that issue, and if so, can you please point me to some useful info on that?
No problem and thanks for sharing that code.
I haven’t made much progress myself since the last call. I’ve been trying to fine-tune a pre-trained reformer from the huggingface model hub on some downstream task, but haven’t succeeded so far.
We can also move the meeting to some other day or next week. What do other folks think? If there’s interest in a meeting today, I can share a zoom link.
Hi @stefan-ai! I’m eager to join meeting today, but I’m also fine with moving meeting to other date. Or doing both.
And I also have more questions than results at the moment.
I’m starting a call under https://us04web.zoom.us/j/72420542182?pwd=UnZmODdZRVFVZkpiRHd4VkRYVmwxdz09
Please feel free to join!
Related to what we discussed today:
Link to lucidrains reformer repo: https://github.com/lucidrains/reformer-pytorch
Trying it on 64k tokens seq_len causal lm: https://colab.research.google.com/gist/arampacha/9cc2fd7b5818c91ce64013b83bcfa567/reformer_wikitext_clm.ipynb
Hi, I was also unable to attend yesterday. Any new ideas from the meeting?
I have started on an LSH exploration. This is perhaps a bit tangent to the main goal of the project, but I have an interest in clustering. I’ll add the notebook to the repo in a /exploration subfolder - have a look if you’re interested!
Also, my thoughts for the next steps are:
- test if the authors repo code works out of the box
- consider lucidrains reformer implementation
- getting the datasets (from hf/datasets I guess?)
- setting up a wandb project (wandb is easy to use with the fastai callback)
- einops transformer implementation (have to look into einops a bit…)
- implement revnet, lsh-attention etc. separately
- run ablations and experiments
Also, should we continue discussion in the forum or set up slack/discord etc.?
D’oh! I managed to delete the notebook before pushing it to git - so the file is lost… Anyway, I was surprised how easy it was to get a basic version of LSH working, basically just following the steps described in the paper with a few lines of code:
To get b hashes, we first fix a random matrix R of size [dk, b/2]. We then define h(x) = arg max([xR; −xR]) where [u; v] denotes the concatenation of two vectors.
In the trax library this method is called hash_vecs(). It has a few tweaks compared to the original LSH-algo, but works out of the box.
@arampacha made a lot of progress, getting a Reformer language model to train successfully on a subset of Wikitext 103. See his post and notebook above.
A couple of other points we discussed:
Training speed could become an issue when training Reformer (could you please share the training stats that you mentioned yesterday, @arampacha?)
Relatively soon - maybe in the next meeting - we should create separate tasks so that we don’t end up all working on the same issues.
Among the first tasks should be to re-create and share the enwiki dataset to make sure everyone is working with the same data and we can save pre-processing time
@arampacha reported an issue when trying to load hugginface’s
google/reformer-enwik8so decided to train from scratch.
The other pre-trained model on huggingface model hub,
google/reformer-crime-and-punishment, uses a different tokenization approach than the enwiki model. Due to conflicting sequence lengths, I didn’t manage to successfully fine-tune the model on downstream tasks.
Since the Reformer paper is very brief and leaves out some important details, we might have to reach out to the authors for clarification. However, let’s first collect our issues before doing so.
@Dean-DAGs potentially has a contact at huggingface and kindly offered to reach out if needed.
Additionally, we could try to replicate these results from training Reformer on SQuAD 2.0
If I missed anything, please add it guys!
@hallvagi: Thanks for sharing your ideas. We agreed on some of these points in our meetings already. This list is a great starting point for formulating specific tasks that team members or smaller groups can start working on. Nice to hear that you had a good experience implementing basic LSH.
I think we’re on a good way. Let’s keep the momentum going and meet again soon to define concrete next steps. Have a nice weekend everybody
PS: I agree that a separate slack/discord channel would be helpful. Could someone set it up?
Getting a chat for effective communication is a good idea. I’ve set up discord server, here is an invite https://discord.gg/mG5GVq3n. Although I have no experience with this stuff, so if anyone is willing to take over, it’s cool. But I think a simple server will do for a start