Skip to content Skip to sidebar Skip to footer

Rearrange Data For Pandas Dataframe?

I received a tab delimited file from a server that outputs answers to questions on a per respondent basis. I'd like to import the data into a pandas dataframe where the columns are

Solution 1:

The nontrivial step is delineating each respondent's block. What about rewriting the file to prefix each line with the respondent's ID? For example, in the case of "Anonymous," I see "2072".

import re

f = open('new_file', 'w')
for line inopen('filename'):
    # line might be like [####] Student_Name or Q-...
    m = re.match('\[(\d+)\] .*', line)
    if m:
        # Line is like [####] Student_name.
        respondent_id = m.group(1)
        continue# Line is like Q-...# Write new line like #### Q-...
    f.write(str(respondent_id) + line)

Then use pandas read_csv to load this revised file, assigning to the index the first two columns. (They will be a MultiIndex.) Then use unstack to pivot the index of Qs into columns.

(Full Disclosure: I tested the regex, but I haven't tested it all.)

Solution 2:

Here's what worked for me:

import re

f = open('new_file', 'w')
for line inopen('filename'):
    m = re.match('\[\d+\]*', line)
    if m:
        respondent_id = m.group()
        continue
    f.write(str(respondent_id) + line)

Post a Comment for "Rearrange Data For Pandas Dataframe?"