I would like to ask for someone to volunteer to agree to share course-related files and datasets such as those hosted at files.fast.ai, especially ones that are larger than a couple of hundred of megabytes using BitTorrent files. You will be providing a great deal of help to students who have slow / unstable / censored internet.
This course is taken by many students from all over the world. In some countries and places, internet connection can be very slow, unstable or even censored. I personally experienced trying to download multiple times data sets from Kaggle only for the connection to be interrupted and I have to start over again. Some data set links are blocked by the government (I live in China). I don’t think China is an exception and there are many other places where internet quality is very poor.
I think this would help not only students with bad internet but also ones that have good internet: by downloading just one torrent file you have all the data and don’t need to do it by hand one by one.
You should try out the Kaggle kernels for fast.ai. It will be possible to run the notebooks without having to download the data locally. That said, I understand that it is better to have the data locally.
This is the same answer that I got when I brought this up with some Kaggle competitions. I understand that it is possible to use remote servers to run code and get access to the data. But this answer is not a solution.
I’m also not sure that fastai v1 will work on Kaggle kernels - it would at least require pytorch v1 being installed there, which would require additional steps.
Hosting bittorrent files sounds like a great idea. Hopefully some students are able to assist there once the course starts. Perhaps the easiest way would be to upload here:
Agree, torrents are a good solution even if we just want to speed up downloading times. It could take quite a lot of time to download a huge dataset or weights.
I’m happy to help with it, thank you for proposing the idea.
Looks like uploading to academictorrents.com is restricted to users with academic email addresses.
But anyone with a BitTorrent client can share files. Here are instructions for uTorrent.
I’m happy to seed from my torrentbox 24/7 for the duration of the course but I’m not confident in creating or hosting the torrent files. TLDR +1 seed
@maxim.pechyonkin Could you start seeding it?
I’m not sure how to create the torrent.
I never created torrent, but I’m in for seeding or any support torrents are fast and easy
This is actually what are there for
I can start seeding while I’m back in Moldova with fast internet, before going back to China where I won’t be able to create a torrent due to bad internet. The only problem is I’m not sure which pretrained models and datasets will be used in the course.
@jeremy could you let us know which data and models we are planning to use? Is it a good idea to seed everything from files.fast.ai?
Probably easiest if I just upload them to AcademicTorrents before class.
How is AWS in China? Most of the datasets will be available on S3.
If it is possible to upload them before October 22 then I either can also start seeding or upload them to torrents myself. Internet in Moldova is fast with 100 mbps. After that date I go back to China and internet there is unpredictable.
I can seed or upload torrents from 22nd oct. if that helps let me know. I have roughly 5 to 10 MBps of speed.
Just to be sure. files.fast.ai will keep on updating throughout the course or is it as it is now.
If it won’t change throughout course may be will try to create torrent. Never created it though but will try.
Here’s the datasets for the course (there may also be some from Kaggle): http://course.fast.ai/datasets
Thank you @jeremy, @maxim.pechyonkin.
I’ll try to help seed the torrents 24x7.