Create A Pandas Dataframe From A Nested Xml File
Here is a small section of an xml file. I would like to create a database from this with each tag unique columns names and non-duplicated data. Tried using lxml and the best I hav
Solution 1:
Consider nested xpath loops where first you loop through every <SCRSGT>
nodes and then extract all SCRSGT's children using an inner dictionary that iteratively appends to a list for DataFrame
call:
from lxml import etree as et
import pandas as pd
trees = et.parse('test.xml')
d = []
for srcsgt in trees.xpath('//SRCSGT'): # ITERATE THROUGH ROOT'S CHILDREN
inner = {}
for elem in srcsgt.xpath('//*'): # ITERATE THROUGH ROOT'S DESCENDANTS PER CHILDif len(elem.text.strip()) > 0: # KEEP ONLY NODES WITH NON-ZERO LENGTH TEXT
inner[elem.tag] = elem.text
d.append(inner)
df = pd.DataFrame(d)
Output
print(df)
# ADDRESS AGENCY ARCHDATE CLASSCOD \# 0 Jigjhgjas@va.gov Department of Veterans Affairs 12172017 H # CONTACT DATE \# 0 COiyiyS, JUhhiuN<a href="mailto:Juggyui@va.gov... 11112017 # DESC LINK \# 0 CONTRACT SPECIALIST https://www.fbo.gov/spg/VA/CaVAMC532/CaVAMC532... # LOCATION NAICS \# 0 Department of Veterans Affairs Medical Center 238210 # OFFADD OFFICE \# 0 Department of Veterans Affairs;400 Fort Hill A... Canandaigua VAMC # PACKAGE RECOVERY_ACT RESPDATE SETASIDE SOLNBR \# 0 Attachment N 11172017 N/A 9069 # SUBJECT ZIP # 0 H--3 YEAR TESTING/MAINTENANCE OF ELECTRICAL EQ... 14424
Post a Comment for "Create A Pandas Dataframe From A Nested Xml File"