Application of Chrome driver and Selenium to build an image data set

Hi, everyone. I recently learned a new technique for building an image data set on the web, and I wanted to share it with you all. In brief, my problem required me to take screenshots of certain parts of the DOM. My training data were essentially screenshots of certain parts of different websites (e.g. Facebook, YouTube).

What I first did was install chrome driver and the Selenium package. Chrome driver is a standalone server for automated testing of web apps, and Selenium is a Python package that helps run commands on that Chrome server - for example, taking a screenshot.

The syntax for starting an automated browser is as follows:

driver = webdriver.Chrome(‘path/to/chromedriver’)
driver.get(‘https://somewebsite.com/’);

Then, you would use a CSS selector to capture the location and size of a particular DOM element you want to capture.

dom_element = driver.find_element_by_css_selector(’#id’)

location = dom_element.location
left = location[‘x’]
top = location[‘y’]

size = dom_element.size
right = location[‘x’] + size[‘width’]
bottom = location[‘y’] + size[‘height’]

Now you have the top, left, bottom, right coordinates of your DOM element of interest. Finally, you take the screenshot of the whole rendered page using the get_screenshot_as_png() method on your selenium instance.

png = driver.get_screenshot_as_png( )

Next, you use those position elements to crop exactly the DOM area that you want to capture.

im = Image.open(BytesIO(png))
im = im.crop((left, top, right, bottom))
im.save(‘path/to/image’)

And voila! You now have an image of a very specific area on the web page. This method has worked very well for me. Just remember to put some logic in your actual implementation to test for whether the website has fully rendered. Note that some websites take some time to load everything, and you don’t want to take a screenshot of a loading page.

Happy coding!

2 Likes