Parse All Xml Files In A Directory Python
Hi I'm trying to parse all XML files in a given directory using python. I am able to parse one file at a time but that would be 'impossible' for me to do due to the large number of
Solution 1:
@Kevin was correct in his comment that this error relates to the ElementTree
object not being able to parse the document correctly. Something is not "true XML
", and it could be something as simple as just an odd, non-unicode character or something.
What you can try to do to help debug is:
import xml.etree.ElementTree as ET
import os
directory = "C:/Users/danie/Desktop/NLP/blogs/"defclean_dir(directory):
path = os.listdir(directory)
print(path)
for filename in path:
try:
tree = ET.parse(filename)
root = tree.getroot()
doc_parser(root)
except:
print("ERROR ON FILE: {}".format(filename))
post_list = []
defdoc_parser(root):
for child in root.findall('post'):
post_list.append(child.text)
clean_dir(directory)
print(post_list[0])
Adding in a try...except
statement will try each of the files, and if there is an error, print out which file is causing the error.
I don't have any data to test, but this should fix the error.
Post a Comment for "Parse All Xml Files In A Directory Python"