Skip to content Skip to sidebar Skip to footer

Modifying Multiple .csv Files From Same Directory In Python

I need to modify multiple .csv files in my directory. Is it possible to do it with a simple script? My .csv columns are in this order: X_center,Y_center,X_Area,Y_Area,Classifica

Solution 1:

First off, I think your problem lay in opening '*.csv' in the loop instead of opening file. Also though, I would recommend never overwriting your original input files. It's much safer to write copies to a new directory. Here's a modified version of your script which does that.

import os
import csv
import argparse

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", required=True)
ap.add_argument("-o", "--output", required=True)
args = vars(ap.parse_args())


if os.path.exists(args["output"]) and os.path.isdir(args["output"]):
        print("Writing to {}".format(args["output"]))
else:
        print("Cannot write to directory {}".format(args["output"]))
        exit()

for file in os.listdir(args["input"]):
    if file.endswith(".csv"):
        print("{} ...".format(file))
        with open(os.path.join(args["input"],file), 'r') as infile, open(os.path.join(args["output"], file), 'w') as outfile:
            fieldnames = ['Classification','X_center','Y_center','X_Area','Y_Area']
            writer = csv.DictWriter(outfile, fieldnames=fieldnames)
            writer.writeheader()
            for row in csv.DictReader(infile):
                writer.writerow(row)
        outfile.close()

To use it, create a new directory for your outputs and then run like so:

python this.py -i input_dir -o output_dir

Note: From your question you seemed to want each file to be modified in place so this does basically that (outputs a file of the same name, just in a different directory) but leaves your inputs unharmed. If you actually wanted all the files reordered into a single file as your code open('reordered.csv', 'a') implies, you could easily do that by moving the output initialization code so it is executed before entering the loop.


Solution 2:

Using pandas & pathlib.

from pathlib import Path # available in python 3.4 + 
import pandas as pd
dir = r'c:\path\to\csvs' # raw string for windows.
csv_files = [f for f in Path(dir).glob('*.csv')] # finds all csvs in your folder.


cols = ['Classification','X_center','Y_center','X_Area','Y_Area']

for csv in csv_files: #iterate list
    df = pd.read_csv(csv) #read csv
    df[cols].to_csv(csv.name,index=False)
    print(f'{csv.name} saved.')

naturally, if there a csv without those columns then this code will fail, you can add a try/except if that's the case.


Post a Comment for "Modifying Multiple .csv Files From Same Directory In Python"