I’m trying to see what the limits are to progressive resizing, and would like to plot out the number of images of a particular size, like in the screenshoted example from last year’s course below. Given a DataBunch, what’s the easiest way to get the filenames used so I can plot their sizes out?
Just to add to this, I enjoy working with dataframes (can easily filter on specific img requirements and pull the remaining filenames), so an alternative (for viewing both sides) could be this
Always appreciate the more pythonic way of doing things. This led to some great albeit obvious insights (and correct me if I’m wrong):
dataset items are tuples! i.e. (image (size), breed index), which led me to…
data.train_ds[0][0] (the image) is the transformed version of the original image data.train_ds.ds[0][0]
when viewing the cropped version in data.train_ds[0][0], it changes every time. I’m putting all the pieces together now and presume that the “DatasetTfm” contains the original dataset and a transformer “tfm”, which is applies to the original image (data.train_ds.ds) and stored in data.train_ds.
Apologies if I’ve just stated the obvious, but might be insightful for anyone else like me starting out.
You are extremely not wrong, and have summarize this beautifully! I’m not sure we’ve done a great job of explaining this lower-level details in the docs, so if you want to really test your understanding, feel free to try adding more info to the docs to help others too! (And do let me know if you decide to do this, and want help understanding how to contribute).