The Wikiart dataset

I want to go for a rather challenging dataset and have stumbled upon the wikiart.org dataset. It is the biggest dataset of paintings and has been in use in some papers. However, I cannot for the life of me find a way to download it fast.

What I have tried:

  • Download a version of the Wikiart Dataset used by Chan et al in their ArtGAN paper presented at ICIP2016 presented on their GitHub repository; Problem: extremely slow download speeds (300KB/s)
  • Use the Wikiart Retriever by Lucas David. Also seems to take forever (ran for several hours) and I am not sure it had downloaded any images by that point.

I cannot find a way to download the dataset from wikiart.org directly. Any help would be much appreciated.

2 Likes

For me, right now, the download speed of http://www.cs-chan.com/source/ICIP2017/wikiart.zip is around 900 kB/s and will took 7h.
It may depend on the time of the day and the load of the server.

So that is the best / only way to download the dataset, then?

Try

wget -c <your download url>

It will always start download from last point

M

1 Like

I don’t know the best way. For me, I usually start with cURL. If the download is slow (~KiB/s), I switch to aria2. aria2 is an ultra fast download manager. I download the file directly to my AWS VM. So, this is using AWS fast down/uplink network and the download speed is ~20MiB/s. The ETA of the file is 19 minutes.

aria2c --file-allocation=none -c -x 5 -s 5 http://www.cs-chan.com/source/ICIP2017/wikiart.zip

  • Multi-connection download
  • Multi-threaded
  • Lightweight

I am not affiliated with aria2. Try it out and good luck!

6 Likes

Thanks! I will give it a try. But I think the hosting server just doesn’t go faster.

Following up on the Wikiart dataset, I finally downloaded it and ran the standard fastai lesson 1 process on it. And guess what, I got 60% accuracy predicting the exact right style. I find that very impressive. My attention was drawn to this dataset after reading the paper The Shape of Art History in the Eyes of the Machine and as far as I can see, their best accuracy was 60% as well. (Of course the paper goes much further than just doing classification, but I still find this very impressive).

One lesson I took away: Sometimes you might need to filter out corrupt images, as my learner seemed to crash on .fit_one_cycle when it encountered strange, truncated images.

1 Like

Hello, may I ask from where did you finally download the Dataset? Because I have been searching for a link or anything at wikiart.org, but I didn’t find anything. I was thinking about downloading the Chan et al version of the dataset. Thanks!

Hello, can I ask what techniques do you use to train your model with such large datasets? I want to use this dataset but I am really struggling to upload it on Colab

Update: After me contacting Professor Chan, he was kind enough to update the links here https://github.com/cs-chan/ArtGAN/tree/master/WikiArt%20Dataset

2 Likes

After I download the dataset from this url, it always meet error when I unzip the zip file.I’m a little confused on how to solve this problem now.

I remember having the same problem, but then it somehow went away. I know that isn’t very helpful, but I can share the code I found in my notebook I was working in at the time:

base_dir = '.'
zipfile = base_dir + "/wikiart.zip"
#!wget http://web.fsktm.um.edu.my/~cschan/source/ICIP2017/wikiart.zip -O "{zipfile}" -c

#!unzip -n "{zipfile}" -d "{base_dir}"

csvzipfile = base_dir + "/wikiart_csv.zip"
#!wget http://web.fsktm.um.edu.my/~cschan/source/ICIP2017/wikiart_csv.zip -O "{csvzipfile}" -c
#!unzip "{csvzipfile}" -d "{base_dir}"

The commented parts are the lines I only run once, so you need to uncomment them. You might also need to update the URLs (I’m not sure if they are correct).

Interesting dataset @davidpfahler. Thanks for sharing. I just wonder since the dataset is big (25.4 GB), it does not fit into memory. So did you run it using lesson 1 process in a machine that has > 32GB of ram?

I ran this on an AWS p2.xlarge machine, but I don’t remember how I handled the data loading process.

1 Like

Hello friends,
I would like to know that if there is any way to download or filter out only public domain artworks of wikiart dataset?

Hey @davidpfahler ,
The download link of the WikiArt dataset was working fine, except that starting from today, it is no longer working.

Do you have by chance another link that would allow me to download the dataset ?
Thank in advance !