Skip to content Skip to sidebar Skip to footer

How Do I Start Pulling Apart This Block Of Json Data?

I'd like to make a program that makes offline copies of math questions from Khan Academy. I have a huge 21.6MB text file that contains data on all of their exercises, but I have no

Solution 1:

Ok, the repeated pattern appears to be this: http://pastebin.com/4nSnLEFZ

To get an idea of the structure of the response, you can use JSONlint to copy/paste portions of your string and 'validate'. Even if the portion you copied is not valid, it will still format it into something you can actually read.

First I have used requests library to pull the JSON for you. It's a super-simple library when you're dealing with things like this. The API is slow to respond because it seems you're pulling everything, but it should work fine.

Once you get a response from the API, you can convert that directly to python objects using .json(). What you have is essentially a mixture of nested lists and dictionaries that you can iterate through and pull specific details. In my example below, my_list2 has to use a try/except structure because it would seem that some of the entries do not have two items in the list under translated_problem_types. In that case, it will just put 'None' instead. You might have to use trial and error for such things.

Finally, since you haven't used JSON before, it's also worth noting that it can behave like a dictionary itself; you are not guaranteed the order in which you receive details. However, in this case, it seems the outermost structure is a list, so in theory it's possible that there is a consistent order but don't rely on it - we don't know how the list is constructed.

import requests

api_call = requests.get('https://www.khanacademy.org/api/v1/exercises')
json_response = api_call.json()

# Assume we first want to list "author name" with "author key"# This should loop through the repeated pattern in the pastebin # access items as a dictionary
my_list1 = []

for item in json_response:
    my_list1.append([item['author_name'], item['author_key']])

print my_list1[0:5]

# Now let's assume we want the 'sha' of the SECOND entry in translated_problem_types# to also be listed with author name

my_list2 = []

for item in json_response:
    try:
        the_second_entry = item['translated_problem_types'][0]['items'][1]['sha']
    except IndexError:
        the_second_entry = 'None'

    my_list2.append([item['author_name'], item['author_key'], the_second_entry])
print my_list2[0:5]

Post a Comment for "How Do I Start Pulling Apart This Block Of Json Data?"