Building image datasets from instagram: a How-To

tigerfinch · May 20, 2019, 2:31pm

Hey everyone!

Just started working through the course, and for lesson 1 I decided to use instagram images for a custom dataset. I used a python library called “instaloader” and thought I’d share how I did it, in case it’s helpful for anyone else.

First, you’ll need to install the library (it’s also a cli tool.) It’s not available via the default conda repos, so I just used pip:

pip install instaloader

For my purposes, I just wanted to grab all posts by a list of users, and put them into separate folders, one per user. This makes it easy to load a databunch using the “from_folder” method.

So, first some setup:

from instaloader import *
import os
import pathlib

Make a path object to where you will be downloading, in my case:

data_path = pathlib.Path(os.getcwd())/"data"/"ex1"

Then the meat of the task:

profile_names = ["martythedog08", "marty_the_cockerdale", "monchipaw", "super_whoodle", "essi_dog", "kellyonleash"]
L = instaloader.Instaloader()

for p_name in profile_names:
    profile = Profile.from_username(L.context, p_name)

    for post in profile.get_posts():
        L.download_post(post, target=data_path/p_name)

This will download all their posts and pop them into a subfolder named by user name. Once you have them in this format, you can simply use ImageDataBunch.from_folder like so:

data = ImageDataBunch.from_folder(data_path, valid_pct=0.2, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)

And this will use the subfolder names as classes, and split the dataset into a training and validation set.

Hope this helps someone!

ste · May 20, 2019, 2:50pm

Perhaps @zachcaceres lambdagram could also be useful

tigerfinch · May 20, 2019, 2:51pm

Aha - very nice!