The Wikiart dataset

davidpfahler · October 24, 2018, 7:16am

I want to go for a rather challenging dataset and have stumbled upon the wikiart.org dataset. It is the biggest dataset of paintings and has been in use in some papers. However, I cannot for the life of me find a way to download it fast.

What I have tried:

Download a version of the Wikiart Dataset used by Chan et al in their ArtGAN paper presented at ICIP2016 presented on their GitHub repository; Problem: extremely slow download speeds (300KB/s)
Use the Wikiart Retriever by Lucas David. Also seems to take forever (ran for several hours) and I am not sure it had downloaded any images by that point.

I cannot find a way to download the dataset from wikiart.org directly. Any help would be much appreciated.

Benoit_c · October 24, 2018, 8:08am

For me, right now, the download speed of http://www.cs-chan.com/source/ICIP2017/wikiart.zip is around 900 kB/s and will took 7h.
It may depend on the time of the day and the load of the server.

davidpfahler · October 24, 2018, 9:48am

So that is the best / only way to download the dataset, then?

Michal_w · October 24, 2018, 10:27am

Try

wget -c <your download url>

It will always start download from last point

M

cedric · October 24, 2018, 11:16am

I don’t know the best way. For me, I usually start with cURL. If the download is slow (~KiB/s), I switch to aria2. aria2 is an ultra fast download manager. I download the file directly to my AWS VM. So, this is using AWS fast down/uplink network and the download speed is ~20MiB/s. The ETA of the file is 19 minutes.

aria2c --file-allocation=none -c -x 5 -s 5 http://www.cs-chan.com/source/ICIP2017/wikiart.zip

Multi-connection download
Multi-threaded
Lightweight

I am not affiliated with aria2. Try it out and good luck!

davidpfahler · October 24, 2018, 6:09pm

Thanks! I will give it a try. But I think the hosting server just doesn’t go faster.

davidpfahler · October 25, 2018, 8:02pm

Following up on the Wikiart dataset, I finally downloaded it and ran the standard fastai lesson 1 process on it. And guess what, I got 60% accuracy predicting the exact right style. I find that very impressive. My attention was drawn to this dataset after reading the paper The Shape of Art History in the Eyes of the Machine and as far as I can see, their best accuracy was 60% as well. (Of course the paper goes much further than just doing classification, but I still find this very impressive).

One lesson I took away: Sometimes you might need to filter out corrupt images, as my learner seemed to crash on .fit_one_cycle when it encountered strange, truncated images.

Orfeas · March 6, 2019, 6:55pm

Hello, may I ask from where did you finally download the Dataset? Because I have been searching for a link or anything at wikiart.org, but I didn’t find anything. I was thinking about downloading the Chan et al version of the dataset. Thanks!

Attol8 · May 5, 2019, 11:09am

Hello, can I ask what techniques do you use to train your model with such large datasets? I want to use this dataset but I am really struggling to upload it on Colab

davidpfahler · July 5, 2019, 11:56am

Update: After me contacting Professor Chan, he was kind enough to update the links here https://github.com/cs-chan/ArtGAN/tree/master/WikiArt%20Dataset

mangata · August 13, 2019, 10:00am

After I download the dataset from this url, it always meet error when I unzip the zip file.I’m a little confused on how to solve this problem now.

davidpfahler · August 14, 2019, 10:26am

I remember having the same problem, but then it somehow went away. I know that isn’t very helpful, but I can share the code I found in my notebook I was working in at the time:

base_dir = '.'
zipfile = base_dir + "/wikiart.zip"
#!wget http://web.fsktm.um.edu.my/~cschan/source/ICIP2017/wikiart.zip -O "{zipfile}" -c

#!unzip -n "{zipfile}" -d "{base_dir}"

csvzipfile = base_dir + "/wikiart_csv.zip"
#!wget http://web.fsktm.um.edu.my/~cschan/source/ICIP2017/wikiart_csv.zip -O "{csvzipfile}" -c
#!unzip "{csvzipfile}" -d "{base_dir}"

The commented parts are the lines I only run once, so you need to uncomment them. You might also need to update the URLs (I’m not sure if they are correct).

minh · August 14, 2019, 11:18am

Interesting dataset @davidpfahler. Thanks for sharing. I just wonder since the dataset is big (25.4 GB), it does not fit into memory. So did you run it using lesson 1 process in a machine that has > 32GB of ram?

davidpfahler · August 14, 2019, 11:33am

I ran this on an AWS p2.xlarge machine, but I don’t remember how I handled the data loading process.

Mayank1 · September 23, 2020, 6:20pm

Hello friends,
I would like to know that if there is any way to download or filter out only public domain artworks of wikiart dataset?

nainiayoub · July 17, 2021, 11:20am

Hey @davidpfahler ,
The download link of the WikiArt dataset was working fine, except that starting from today, it is no longer working.

Do you have by chance another link that would allow me to download the dataset ?
Thank in advance !