Skip to content Skip to sidebar Skip to footer

How Can I Combine Csv Files And Add Header Rows With Python?

I have 50 csv files of price index data from St.Louis Fred, the format of each is like this: And I want to combine multiple csv files and add one more row of header to them to ach

Solution 1:

The repetition of the DATE column doesn't make sense. Unless there is some specific purpose. Also, while merging you want to be careful that the data on a particular line belongs to the same date.

Its better to use pandas if you are merging using DATE as the index and merge using OUTER method. So, the values from the same date are on the same lines.

import pandas as pd;

df1 = pd.read_table('file1.csv', sep=',')
df2 = pd.read_table('file2.csv', sep=',')
df3 = pd.read_table('file3.csv', sep=',')

So, basically load all the files you have as data frame. Then merge the files using merge or reduce function.

data_frames = [df1, df2, df3]

you can add as many data-frames in the above code.

Then merge them. To keep the values that belong to the same date you need to merge it on the DATE

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['DATE'],
                                            how='outer'), data_frames)

Then write the merged data to the csv file.

pd.DataFrame.to_csv(df_merged, 'merged.txt', sep=',', na_rep='.', index=False)

This should give you

DATE VALUE1 VALUE2 VALUE3 ....

Solution 2:

This will vertically concatenate all the files in the provided directory (so you don't have to specify them in code). The files can have any number of columns and it can handle spaces in the values. However the files must all have the same number of rows.

It uses only modules csv and os.

import os
import csv

dir_base = r'H:\apps\xp\Desktop\localrepo\Temp'
dir_name = '-test2'
output_name = 'output.csv'

path = os.path.join(dir_base, dir_name)
out_path = os.path.join(dir_base, output_name)


def_extend(lines, lineno, line):
    try:
        lines[lineno].extend(line)
    except IndexError:
        lines.append(line)


defmain():
    lines = []

    # read and generate new filefor root, dirs, files in os.walk(path):
        for f in files:
            withopen(os.path.join(root, f), 'r') as csvfile:
                f_in = csv.reader(csvfile)
                for lineno, line inenumerate(f_in, start=1):
                    if lineno == 1:
                        header = [''] * len(line)
                        header[0] = f
                        _extend(lines, 0, header)
                    _extend(lines, lineno, line)

    # print new filewithopen(out_path, 'w', newline='\n') as csvfile:
        csv.writer(csvfile).writerows(lines)


if __name__ == '__main__':
    main()

Output looks like this: enter image description here

If your "csv" files have other delimiters (hence not technically "c"sv files), just change this part of the code csv.reader(csvfile) to indicate the delimiter, eg, csv.reader(csvfile, delimiter='|').

Hope it helps!

Solution 3:

Pandas is great solution, but if you want a python std lib solution:

import csv
from itertools import chain

csv_input_filenames = [
    'csvfile1.csv',
    'csvfile2.csv',
    'csvfile3.csv',
]
csv_output_filename = 'csv_out.csv'# get the csv data
csv_files = [open(file_name) for file_name in csv_input_filenames]
csv_handles = [csv.reader(csv_file) for csv_file in csv_files]
rows = (list(chain(*row)) for row inzip(*csv_handles))

# write combined outputwithopen(csv_output_filename, 'wb') as csv_file:
    filenames_header = list(chain(
        *zip(csv_input_filenames, [''] * len(csv_input_filenames))))

    csv_writer = csv.writer(csv_file)
    csv_writer.writerow(filenames_header)

    for row in rows:
        csv_writer.writerow(row)

# close input filesfor csv_file in csv_files:
    csv_file.close()

Post a Comment for "How Can I Combine Csv Files And Add Header Rows With Python?"