https://forums.fast.ai/t/how-to-run-a-trained-model-against-a-custom-image/27617/5?u=devforfu
Also you can try rsync
CLI util if you’re on macOS or Linux.
https://forums.fast.ai/t/how-to-run-a-trained-model-against-a-custom-image/27617/5?u=devforfu
Also you can try rsync
CLI util if you’re on macOS or Linux.
Thanks for sharing. Duplicate Photo Finder is a great tool to find duplicate or identify similar images.
cc: @beecoder
I bet you could build a better one using a CNN @Moody Let me know if you need help getting started.
Hi, I have used your script and it downloads the images but doesn’t split them into train, test, valid folder though it creates them.
Oh, is there anything in the train valid test directories? I have a sneaking suspicion there’s a bug that means the folders are the search phrases? I’ll check later today.
No they are empty, only downloaded_from_google has the images distributed in folder on the basis of classes.
I used to use a library for crawling, icrawler. It supports Google, Bing and Bidu Search Engine Crawling. We can extend to donload from our own custom webpages. A Sample notebook on using it would be https://github.com/nareshr8/Image-Localisation/blob/master/crawler.ipynb
I found the reason it didn’t work, the part the sanity checks and organises the images uses a glob pattern to find the files, which assumes that the file names start with the class name. Since the search term you used didn’t use the class as the first term it didn’t match anything. I’ve changed it to match anything with the search term for now. Slightly brittle, if a file name contains a search term it might be used for several classes – I’ll change it to use some sanitized version of the search terms instead later on.
So the new version of duckgoose (0.1.7) will work, or you can rearrange the search terms to have the class name first.
Thanks for letting me know it didn’t work for you.
Does anyone know how to open images in jupyter notebook while waiting for input? I’m writing a data checking function so you can go through your images by class after downloading and delete the ones that don’t belong.
No luck with
show_image(open_image(img_path))
img = open_image(img_path); img.show()
plt.imshow(np.rollaxis((np.array(open_image(img_path).data) * 255).astype(np.int32), 0, 3))
All 3 ways display after input is received; same behavior on terminal. So far only PIL.Image works:
import PIL.Image
...
img = PIL.Image.open(class_folder_path/f)
...
img.show()
Unfortunately this opens an image using your system’s default viewer, and running img.close()
will not close the window - you have to do it manually. An issue for datasets with hundreds of images.
There is a way that does this, at least from the terminal: and that’s with OpenCV, but I’m hesitating on that since fastai isn’t using OpenCV. That’s something similar to what I did in an old project a while back (tuning bounding boxes in that case; may blog that).
edit: I put together a script with OpenCV of the datacleaner: here’s a video of how it works. Not sure if that works on a cloud instance with no GUI.
On a separate note: you can also get image data from video. Using OpenCV and MSS, you can build a dataset by running video and taking a screenshot of that part of the screen, with labels mapped to the key you press. Here’s how I did that in that same project.
You can build pretty big datasets quickly that way too; your bigger problem will be making sure the data itself is varied enough – since 20 shots of Matt Damon smiling in a 5-second cut are all going to contain basically the same information.
I built a dataset curator to help find and remove both duplicate images and images from outside of the data distribution. It uses the intermediate representations from a pretrained vgg network (similar to content loss when doing style transfer).
Any recommendations around image resizing? I’ve built my own dataset, but the images I grabbed are on the larger side. What’s everyone doing in this context? I know the library can resize, but I’m guessing that’s a costly operation and would be better done once?
Updated my thing using the new stuff I learned tonight. Now the interface doesn’t look like it was made by a child
Kind of a double post (see also Small tool to build image dataset: fastclass).
I wrote a small python package fastclass that tackles two problems I had when building a dataset:
For my example I defined 25 search terms (guitars, it’s also in the GitHub repo under examples)…
The first script fcd pulls from Google, Bing or Baidu (or all 3) and resizes them, too (uses icrawler). Simply write a csv file where each row contains the search terms you want to push to the search engines
Then, the second script fcc launches a Tkinter GUI and you can quickly flick through the produced folders and mark any file for deletion or grade it for instance into various “grades” (grading is optional).
In my case I used 4 grades (and deleted a bunch):
Grade 1: good
Grade 2: only the body of a guitar (still super useful to distinguish between models)
Grade 3: headstock only (not used in first model)
Grade 4: really hard (back of guitar, not used in first model)
I ended up with roughly 9000 images for 11 classes. Quality check takes some time - but it’s worth it!
You simply push a number to mark for grade, d for delete and can always flick back and forth using arrow keys. Once you are done use x to terminate and write report file…
I wrote about it here:
Repo is here:
Notebook with classifier (97% on 11 gibson and fender models is here, I only used grade1+2 images for the classifier for the moment, will experiment with the others later):
Let me know with an issue of via these forums if you find any issues with it. Hopeit’s useful to you…
@jeremy This thread offers better methods than the javascript code in lesson2-download notebook. The javascript code approach is problematic because it doesn’t work in all browsers, fails with blockers, and isn’t a solution for those who don’t have a way to access a browser UI (colab et al).
AFAICT none of the methods presented here are allowed under Google’s Terms of Service. I’m fine with them being discussed here, but I don’t think we should be teaching them in the course.
I’m experimenting with Crestle for uploading and syncing. Since they offer a Terminal session, I’ve been able to use GoodSync effectively. The benefits of GoodSync are their support of all platforms, powerful features, good UI (Windows, Mac), compatible with cloud backed storage services (Dropbox, Google Cloud, Microsoft OneDrive, etc), works well. I’m only able to get 700Kbps uploads/sync. I haven’t identified the bottleneck and haven’t benchmarked against similar services. It might be as fast as it gets.
FWIW, I’ve cobbled together a python program that copies the contents of a Google Drive. I’m experimenting with it on Crestle. It may be useful for people who prefer working with Google Drive. I’d prefer to be able to mount a Google Drive, as can be done on Colab, instead of just a file copy method. GoodSync is a better choice for most.
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
def main():
gauth = GoogleAuth()
# Try to load saved client credentials
gauth.LoadCredentialsFile("mycreds.txt")
if gauth.credentials is None:
# Authenticate if they're not there
#gauth.LocalWebserverAuth()
gauth.CommandLineAuth()
elif gauth.access_token_expired:
# Refresh them if expired
gauth.Refresh()
else:
# Initialize the saved creds
gauth.Authorize()
# Save the current credentials to a file
gauth.SaveCredentialsFile("mycreds.txt")
drive = GoogleDrive(gauth)
local_expanded_path = os.path.expanduser('~/data')
copy_directory(drive, 'root', local_expanded_path)
def copy_directory(drive, source_id, local_path):
print(f'source_id:{source_id} local_path:{local_path}')
try:
os.makedirs(local_path,exist_ok=True)
except OSError as e:
print(f'makedirs failed: {local_path} errno:{e.errno}')
file_list = drive.ListFile({'q': "'{source_id}' in parents".format(source_id=source_id)}).GetList()
for f in file_list:
print(f["title"], f["id"], f["mimeType"])
if f["title"].startswith("."):
continue
fname = os.path.join(local_path, f['title'])
if f['mimeType'] == 'application/vnd.google-apps.folder':
copy_directory(drive, f['id'], fname)
else:
file = drive.CreateFile({'id': f['id']})
file.GetContentFile(fname)
if __name__ == "__main__":
main()
Curious about your Olsen twin project… We tried a “Chrisifier” (Chris Pine/Evans/Pratt/Hemsworth) and were able to get the error down to around 25% using the standard pipeline from the bears notebook from class. So decent accuracy, but far from perfect. How accurate were you able to get your Olsen twins model? Any tricks you’d be willing to share? Apart from the obvious - gather more data (we only have about 200 images of each Chris) we were thinking maybe it’d be possible to pretrain on a large facial recognition dataset.
Hey thanks for putting this together.
I’m Currently having an issue getting 2 of my classes images to d/l. Output suggests the script is running correctly. The dirs are created but the output_path
dir is empty on inspection after running.
My problem is with ‘laver’ and ‘badderlocks’ classes. All others have d/l’d successfully. Can you point me in right direction?
NB here
Even,
Not sure if you found one, but I use https://www.bricelam.net/ImageResizer/.
Easy to use and works well.