How to extract data from JSON?

Hi!
I wanted to extract URLs from a JSON file. The URLs have to be extracted based on their labels, ‘with helmet’ or ‘without helmet’ and be put into different files. I am finding it difficult to do that with Python, can someone please help me?

This is how the JSON looks:

{“content”: “http://com.dataturks.a96-i23.open.s3.amazonaws.com/2c9fafb06477f4cb01648e3c0da400d1/c01d5d27-5c77-422c-a5f0-07b8169736b6___stock-photo-attractive-young-girl-on-scooter-stopped-on-the-road-232916281.jpg",“annotation”:[{“label”:["With Helmet”],“notes”:"",“points”:[{“x”:0.5967894239848914,“y”:0.1641086186540732},{“x”:0.704438149197356,“y”:0.3116883116883117}],“imageWidth”:450,“imageHeight”:360}],“extras”:null,“metadata”:{“first_done_at”:1533223539000,“last_updated_at”:1533223539000,“sec_taken”:11,“last_updated_by”:“F8QVa4yeXLS7pjDSSRcsQJAapC43”,“status”:“done”,“evaluation”:“NONE”}}

Can somone please show me how can I extract the URLs?
PS: I am currently in my sophomore year and am just getting started therefore, sorry if this is a question too stupid to ask.

1 Like

Use deserialization to turn the JSON string into a native python object. Once you have the information in object form, you can more easily work on it.

import json

info = json.loads(json_string)
print(info["content"], info["annotation"][0]["label"])

You can also make it a bit cleaner to work with by using a simple hook when loading

class AsAttributes(object):
    def __init__(self, dict_):
        self.__dict__.update(dict_)

info = json.loads(json_string, object_hook=AsAttributes)
print(info.content, info.annotation[0].label)
1 Like

Hi Marcus!
Thank you for your reply. But when implementing the way suggested by you, I am getting the following error:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Any suggestions on how to fix it?

1 Like

Hey there - have you found a solution to the json problem? I’m experiencing something similar here: I want to work with this kaggle twitter sarcasm dataset which only has 3 features, but unfortunately I just don’t know how to get this out of the json-file.
Searched the forum and had also a look on the pd site - but haven’t found any solution so far that’s working.

Any help would be highly appreciated <3

During the data extraction process suddenly lenovo screen flicker started and it was causing a problem. Please help me to fix the problem so that I can extract it again.

Hey!
Sorry for the Super Late reply:) Yes, I did solve it. It was that my JSON data was in a weird fromat. It was inside a dictionary > list > dictionary. Also, it did not have comsistent editing as well as missing values. Once I accounted for all of these exceptions, it worked well!
Do let me know how I can help!

hey abhijith! Can you tell me how to load the dataturks json file of helmet detection in python.