Skip to content Skip to sidebar Skip to footer

Windows Processes Xml File To Pandas Dataframe?

i would like to convert the results of the following windows command to a pandas dataframe. raw data is generated with this command on windows machine wmic process get Caption, Pr

Solution 1:

Welcome! This was an interesting question. This isn't perfect but hopefully it helps

I wanted to try to avoid hard coding any columns of interest.

Assumptions - This file will have a predictable pattern of field names.

I used xml.etree.ElementTree, I find this to a straight forward library

import xml.etree.ElementTree as ET

reference the xml file

file = '/location/to/file/RunningProcess.xml'

Create flattened DataFrame. I personally find this easier to parse than working entirely within the xml pulling the XML

First create a flatted list

tree = ET.parse(file)
root = tree.getroot()

ls_processes = []

for COMMAND in root.iter('COMMAND'):
    for RESULTS in COMMAND.iter('RESULTS'):
        for PROPERTY in RESULTS.iter('PROPERTY'):

            VALUE = PROPERTY.find('VALUE') 

            if VALUE is not None:
                print(PROPERTY.attrib['NAME'],'|',PROPERTY.attrib['TYPE'],'|', VALUE.text )
                ls_processes.append([PROPERTY.attrib['NAME'],PROPERTY.attrib['TYPE'], VALUE.text])
            else:
                print(PROPERTY.attrib['NAME'],'|',PROPERTY.attrib['TYPE'],'|', "NO VALUE")
                ls_processes.append([PROPERTY.attrib['NAME'],PROPERTY.attrib['TYPE'], 'NO VALUE'])

This will produce something which looks a bit like this

Caption | string | System Idle Process
CommandLine | string | NO VALUE
CreationDate | datetime | 20191002111400.978894+060
HandleCount | uint32 | 0
KernelModeTime | uint64 | 159488690156250
OtherOperationCount | uint64 | 0 

Transform into a Dataframe

df_processes = pd.DataFrame(ls_processes)

Rename columns to make the Dataframe easier to work with

df_processes.columns = ['data','type','value']

Create a list of columns of interest

ls_columns = ['Caption', 'ProcessId', 'ParentProcessId', 'CommandLine', 'CreationDate', 'KernelModeTime', 'UserModeTime', 'ThreadCount', 'HandleCount', 'WorkingSetSize', 'PeakWorkingSetSize', 'VirtualSize', 'PeakVirtualSize', 'PageFaults', 'PageFileUsage', 'PeakPageFileUsage', 'ReadOperationCount', 'WriteOperationCount', 'OtherOperationCount']

Create Dataframe columns of each column of interest

ls_processes = []
for column in ls_columns:
    print(column)
    ls_row = []
    for index, row in df_processes.iterrows():
        if row['data'] == column: 
            ls_row.append(row['value'])

    df = pd.DataFrame(ls_row)
    ls_processes.append(df)

Concat the Dataframes together by columns

df_processes_flat = pd.concat(ls_processes, axis = 1 ) 

Add the column names using the list previously created

df_processes_flat.columns = ls_columns

You'll end up with a Dataframe which looks like this

enter image description here

I would say these steps aren't possible the most elegant but hopefully it's clear whats going on.


Post a Comment for "Windows Processes Xml File To Pandas Dataframe?"