Getting more than 150 images using search_images_bing

Oh, if you’re making local changes then it’ll depend… If you have

%load_ext autoreload
%autoreload 2

at the top of your notebooks then it should automatically reload any files you edit, if not you’ll have to restart the kernel. That said, if you want to play around with your own version of the image search code, there’s really no need for you to edit fastai files, just take a copy of the function code and play around in your own notebook.

I restarted the kernel, the whole GPU etc… and it still doesn’t recognize it. I made a new function in utils.py :frowning: I think it might have to do with : !pip install -Uqq fastbook and where it is installed from… @joedockrill

I can’t find the article anymore but the stack post told that things behave differently when pip install or setup.py is used. You know anything about it?

I am getting this error, it doesn’t work anymore?

This should help anyone who’s looking for an answer:

from itertools import chain

from azure.cognitiveservices.search.imagesearch import ImageSearchClient as api
from msrest.authentication import CognitiveServicesCredentials as auth


def search_images_bing_many(key, term, total_count=150, min_sz=224):
    """Search for images using the Bing API
    
    :param key: Your Bing API key
    :type key: str
    :param term: The search term to search for
    :type term: str
    :param total_count: The total number of images you want to return (default is 150)
    :type total_count: int
    :param min_sz: the minimum height and width of the images to search for (default is 128)
    :type min_sz: int
    :returns: An L-collection of ImageObject
    :rtype: L
    """
    headers = {"Ocp-Apim-Subscription-Key":key}
    search_url = "https://api.bing.microsoft.com/v7.0/images/search"

    max_count = 150

    imgs = []
    for offset in range(0, total_count, max_count):
        if ((total_count - offset) > max_count):
            count = max_count
        else:
            count = total_count - offset

        params = {'q':term, 'count':count, 'min_height':min_sz, 'min_width':min_sz, 'offset': offset}
        response = requests.get(search_url, headers=headers, params=params)
        search_results = response.json()
        imgs.append(L(search_results['value']))

    return L(chain(*imgs)).attrgot('contentUrl').unique()

now you can do (an example for plants that assumes a list of PLANT_NAMES exists):

plants = map(lambda plant: plant.lower(), PLANT_NAMES)
plants = set(plants)

i = 0
for o in plants:
    i= i+1
    print(f"Downloading images for {o} ({i}/{len(plants)})")
    dest = (data_path/o)
    dest.mkdir(exist_ok=True)
    imgs_urls = search_images_bing_many(key, f'{o}', total_count=449)
    download_images(dest, urls=img_urls)

Hope that helps! Please note I have changed the return type and not updated the docstring.

1 Like

An alternative to this approach is to give slightly different searches using synonyms or sub-categories. e.g. Plane & aircraft or Black Bear in woods & Black Bear in snow

You can then perform the search several times (with different search terms) and use concat of L to combine them before downloading the images.

Something like this:

destas =['boeing 737 max', '747','Boeing 777','Airbus A330','Boeing 787']
destas2=[d+' aircraft' for d in destas]
destas1=[d+' plane' for d in destas]

for i,oo in enumerate(destas):

  dest = (path/oo)
  dest.mkdir(exist_ok=True)

  o = destas1[i]
  results1 = search_images_bing(key, f'{o}')

  o = destas2[i]
  results2 = search_images_bing(key, f'{o}')
  results = L(results1,results2).concat()

  download_images(dest, urls=results.attrgot('contentUrl'))

now I get ErrorResponseException: Operation returned an invalid status code ‘PermissionDenied’

I just edited the original code to support offset:

def search_images_bing(key, term, min_sz=128, max_images=150, offset=0):    
     params = {'q':term, 'count':max_images, 'min_height':min_sz, 'min_width':min_sz, 'offset':offset}
     headers = {"Ocp-Apim-Subscription-Key":key}
     search_url = "https://api.bing.microsoft.com/v7.0/images/search"
     response = requests.get(search_url, headers=headers, params=params)
     response.raise_for_status()
     search_results = response.json()    
     return L(search_results['value'])

then I just loop over offsets:

n_offsets = 10
for o in bear_types:
    dest = (path/o)
    dest.mkdir(exist_ok=True)
    results = []
    for offset in range(n_offsets):
        results += search_images_bing(key, f'{o} bear', offset=offset)#, total_count=1500)
    print(len(results))
    download_images(dest, urls=results.attrgot('contentUrl'))
1 Like

Same here! :frowning: But I think your suggested code/solution is working like a charm!! Thanks!

This is the only way to use Bing query operators and exclude certain search terms with bing using the minus sign. If you do not make a custom method like above it will not work. It would be amazing if this replaced the existing method