Skip to content Skip to sidebar Skip to footer

For Loop Across Multiple Folders To Resample Datetime In Multiple Csv Files And Export With File Name Automatically Generated

I have many dataframes (csv files) located in various folders within my documents on my computer. All csv files have the same number of columns, where the name of each column is th

Solution 1:

It's hard to find out where your problem really is :)

But python has something like os.walk, let me provide you an example:

importosroot_directory='/home/xyz/some_root_dir/'

def is_csv(fname):
    return fname.endswith('.csv')

csv_files = []

for directory, subdirectories, files_names in os.walk(root_directory):
    for fname in files_names:
        ifis_csv(fname):
            csv_files.append(
                {
                    'directory': directory,
                    'fname': fname
                }
            )

print(csv_files)

And this in my test case:

[
    {'directory': '/home/xyz/some_root_dir', 'fname': 'my.csv'},
    {'directory': '/home/xyz/some_root_dir/test2/test31', 'fname': 'myohter3.csv'}
    {'directory': '/home/xyz/some_root_dir/test2/test31', 'fname': 'myohter.csv'}
]

This will for sure help you with getting all the csv files - you can modify the is_csv method for your needs. I am not able to help you with aggregating data :)

But once you read all the data it shouldn't be much of a problem.

Ok, now fun begins. I do it very fast - and probably can be written better, but it is good start point, we have file list from the previous step, let make next steps:

import csv
import os
from datetime import datetime

data = {}

# gather the data;for fdata in csv_files:
    withopen(os.path.join(fdata['directory'], fdata['fname']), 'r') as f:
        reader = csv.reader(f, delimiter='|', quotechar='"')
        rows = list(reader)
        data[fdata['fname']] = rows  # we can't store it per datetime here, because can lost data# ok we have a data now in format:# {#     'other3.csv': [#         ['Datetime', 'Egen1_NotCum_kWh', 'Egen2_NotCum_kWh', 'Egen3_NotCum_kWh'],#         ['2016-09-04 13:45:00', '643.23', '649', '654'],#         ['2016-09-04 14:00:00', '612.21', '672', '666'],#         ['2016-09-04 14:15:00', '721.3', '719', '719'],#         ['2016-09-04 14:30:00', '730', '721', '725'],#         ['2016-09-04 14:45:00', '745', '725', '731']],#     'my.csv': ...# }# convert the string data to python Datetime

DATETIME_FORMAT = "%Y-%m-%d %H:%M:%S"for fname, inner_data in data.iteritems():
    for row in inner_data[1:]:  # skip headers
        p_datetime = datetime.strptime(row[0], DATETIME_FORMAT)
        row[0] = p_datetime

# now the aggregates;defget_all_rows_in_dates(start_date, end_date, data):
    headers = data[data.keys()[0]][0]
    data_rows = []
    for fname, inner_data in data.iteritems():
        for row in inner_data[1:]:  # skip the headerif start_date <= row[0] < end_date:
                data_rows.append(row)

    return headers, data_rows

defaggregate_col_12(values):
    values  = map(float, values)
    returnsum(values)

defaggregate_col_3(values):
    values  = map(float, values)
    returnsum(values) / float(len(values))

defcount_aggregates(rows_in_dates, start_date):
    col1 = []
    col2 = []
    col3 = []
    for row in rows_in_dates[1:]:  # skip headers
        col1.append(row[1])
        col2.append(row[2])
        col3.append(row[3])
    return [start_date.strftime(DATETIME_FORMAT),
        aggregate_col_12(col1), aggregate_col_12(col2), aggregate_col_3(col3)]


defwrite_results(headers, aggregate, fname):
    data = []
    data.append(headers)
    data.append(aggregate)
    withopen(fname, 'w+') as f:
        writer = csv.writer(f, delimiter='|', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        writer.writerows(data)


start_date = datetime(2016, 9, 4, 13, 0, 0)
end_date = datetime(2016, 9, 4, 14, 0, 0)

headers, to_aggregate = get_all_rows_in_dates(
    start_date,
    end_date,
    data)

aggregates = count_aggregates(to_aggregate, start_date)
write_results(headers, aggregates, 'from_{}_to_{}.csv'.format(
    start_date.strftime(DATETIME_FORMAT),
    end_date.strftime(DATETIME_FORMAT),
))

Take care - user an appropriate delimiter and quotechar in your code. And this is only begining - you can use it as a start - the daily aggregate - should be achievable with this code, but if you want for example have a csv with per second row for an hour - you need to wrap it a little.

If you have any questions - please do.

Post a Comment for "For Loop Across Multiple Folders To Resample Datetime In Multiple Csv Files And Export With File Name Automatically Generated"