Thanks, @prairieguy, for the constructive feedback! I appreciate you testing it so thoroughly.
To me, I see that you brought up two separate issues:
- Bing Search API does not always return the
count
that give it, and
- Duplicate
content_url
's can be returned.
I think these should be handled separately.
Bing Search API returns less images
I did some additional testing and I believe this occurs because we’ve reached the maximum Bing will return.
results = L(client.images.search(query="grizzly bear", min_height=128, min_width=128, count=150, offset=1500).value)
len(results)
Ouput:
54
Here I called the client.images.search()
function directly, without using search_images_bing()
. I used a count
of 150 and a high offset
of 1500. Only 54 items were returned.
Let’s try again with a higher offset
.
results2 = L(client.images.search(query="grizzly bear", min_height=128, min_width=128, count=150, offset=1800).value)
len(results2)
Output:
54
After increasing the offset, I got the same same number of results as before. I did some random sampling and these two lists seem to have the same content_url
's.
print(results.attrgot("content_url")[42])
print(results2.attrgot("content_url")[42])
Output:
https://www.freedomskateshop.at/media/images/product/1800x1200/1grizzly_woodland_camo_cut_out_skateboard_griptape.jpg
https://www.freedomskateshop.at/media/images/product/1800x1200/1grizzly_woodland_camo_cut_out_skateboard_griptape.jpg
I think what is going on here is that once Bing has reached the maximum number of images it has for that search term, it will continue to return the same set of images after a certain offset
.
One solution for this is to perhaps just show a warning message to the user, but return the images anyways. Something like:
import logging
if len(L(chain(*imgs))) < total_count:
logging.warning(f"Bing only found {len(L(chain(*imgs)))} images for '{term}'. Total requested was {total_count}.")
Duplicate image URLs
For the second issue of duplicate image URLs, I think it should be the user’s responsibility to remove the duplicates. The search_images_bing()
function should just return the images that Bing provides. It would be analogous to browsing Bing Image Search manually and seeing duplicate images. The user can decide to keep or remove the duplicate image URLs.
Again, I appreciate you taking the time to read and test my code, which is why I wanted to in turn give my detailed response. Would love to get your thoughts on this and I’ll modify the snippet in my original posting accordingly so that others can benefit!