I played Pokemon blue when I was a kid so i stive to be the very best and catch them all. Because of this childhood value, I’m trying to train the very best stool image classifier, I’m still trying to gather enough data for my bristol stool chart classifier and to save time, I wrote a quick little python script to grab photos from a gentleman’s blog who posts high quality photos of his bowel movements each day (I contacted him through email and explained what I’m doing. He said it’s fine to use his photos for this classifier and just to remember him when I’m famous.). I have a time.sleep(1) in there so blogspot doesn’t think I’m a bot and so far it’s working. I’ll post my script in here but it got me thinking. This is something you have to do a bunch for data science does fastai have an image/webscraping utility? I also wonder if I describe what I want to github copilot if it would know of more stool blogs…
I got 1047 images from running the for loop 50 times. so it worked.
import requests
import re
import time
url = "https://dailyscat.blogspot.com/search?updated-max=2022-12-26T06:33:00-08:00&updated-min=2011&max-results=700"
r = requests.get(url)
pics = re.findall('(https://lh3.googleusercontent.com.*\.png)', r.text)
for pic in pics:
print('outside more fist pic', pic)
res = requests.get(pic)
with open(pic.split('/')[-1], 'wb') as wf:
wf.write(res.content)
time.sleep(1)
more = re.findall("blog-pager-older-link\' href=\'(https://dailyscat.blogspot.com/search?.*by-date)", r.text)
for i in range(10):
r = requests.get(more[0]+'=true')
pics = re.findall('(https://lh3.googleusercontent.com.*\.png)', r.text)
for pic in pics:
print('inside more fist pic',
pic)
res = requests.get(pic)
with open(pic.split('/')[-1], 'wb') as wf:
wf.write(res.content)
time.sleep(1)
more = re.findall("blog-pager-older-link\' href=\'(https://dailyscat.blogspot.com/search?.*by-date)", r.text)