Parse Xml To Pandas Data Frame In Python

Question

I am trying to read the XML file and convert it to pandas. However it returns empty data This is the sample of xml structure: .format(instance.tag, ikey, ivalue)) # Loop inside every instance instance_dict = get_children_info(list(instance), instance_dict) #consolidator_dict.update({ivalue: instance_dict.copy()}) consolidator_dict[ivalue] = instance_dict.copy() df = pd.DataFrame(consolidator_dict).T df = df[df_cols] return df

Run the following to generate the desired output.

xml_source = r'grade_data.xml'
df_cols = ["ID", "TaskID", "DataSource", "ProblemDescription", "Question", "Answer",
           "ContextRequired", "ExtraInfoInAnswer", "Comments", "Watch", 'ReferenceAnswers']

df = xml2df(xml_source, df_cols, source_is_file = True)
df

Method: 2

Given you have the xml_string, you could convert xml >> dict >> dataframe. run the following to get the desired output.

Note: You will need to install xmltodict to use Method-2. This method is inspired by the solution suggested by @martin-blech at How to convert XML to JSON in Python? [duplicate] . Kudos to @martin-blech for making it.

pip install -U xmltodict

Solution

defread_recursively(x, instance_dict):  
    #print(x)
    txt = ''for key in x.keys():
        k = key.replace("@","")
        if k in df_cols: 
            ifisinstance(x.get(key), dict):
                instance_dict, txt = read_recursively(x.get(key), instance_dict)
            #else:                
            instance_dict.update({k: x.get(key)})
            #print('{}: {}'.format(k, x.get(key)))else:
            #print('else: {}: {}'.format(k, x.get(key)))# dig deeper if value is another dictifisinstance(x.get(key), dict):
                instance_dict, txt = read_recursively(x.get(key), instance_dict)                
            # add simple text associated with elementif k=='#text':
                txt = x.get(key)
        # update text to corresponding parent element    if (k!='#text') and (txt!=''):
            instance_dict.update({k: txt})
    return (instance_dict, txt)

You will need the function read_recursively() given above. Now run the following.

import xmltodict, json

o = xmltodict.parse(xml_string) # INPUT: XML_STRING#print(json.dumps(o)) # uncomment to see xml to json converted string

consolidated_dict = dict()
oi = o['Instances']['Instance']

for x in oi:
    instance_dict = dict()
    instance_dict, _ = read_recursively(x, instance_dict)
    consolidated_dict.update({x.get("@ID"): instance_dict.copy()})
df = pd.DataFrame(consolidated_dict).T
df = df[df_cols]
df

Python Playground

Parse Xml To Pandas Data Frame In Python

Method: 2

Solution 2:

Post a Comment for "Parse Xml To Pandas Data Frame In Python"