For Loop Across Multiple Folders To Resample Datetime In Multiple Csv Files And Export With File Name Automatically Generated
Solution 1:
It's hard to find out where your problem really is :)
But python has something like os.walk, let me provide you an example:
importosroot_directory='/home/xyz/some_root_dir/'
def is_csv(fname):
return fname.endswith('.csv')
csv_files = []
for directory, subdirectories, files_names in os.walk(root_directory):
for fname in files_names:
ifis_csv(fname):
csv_files.append(
{
'directory': directory,
'fname': fname
}
)
print(csv_files)
And this in my test case:
[
{'directory': '/home/xyz/some_root_dir', 'fname': 'my.csv'},
{'directory': '/home/xyz/some_root_dir/test2/test31', 'fname': 'myohter3.csv'}
{'directory': '/home/xyz/some_root_dir/test2/test31', 'fname': 'myohter.csv'}
]
This will for sure help you with getting all the csv files - you can modify the is_csv method for your needs. I am not able to help you with aggregating data :)
But once you read all the data it shouldn't be much of a problem.
Ok, now fun begins. I do it very fast - and probably can be written better, but it is good start point, we have file list from the previous step, let make next steps:
import csv
import os
from datetime import datetime
data = {}
# gather the data;for fdata in csv_files:
withopen(os.path.join(fdata['directory'], fdata['fname']), 'r') as f:
reader = csv.reader(f, delimiter='|', quotechar='"')
rows = list(reader)
data[fdata['fname']] = rows # we can't store it per datetime here, because can lost data# ok we have a data now in format:# {# 'other3.csv': [# ['Datetime', 'Egen1_NotCum_kWh', 'Egen2_NotCum_kWh', 'Egen3_NotCum_kWh'],# ['2016-09-04 13:45:00', '643.23', '649', '654'],# ['2016-09-04 14:00:00', '612.21', '672', '666'],# ['2016-09-04 14:15:00', '721.3', '719', '719'],# ['2016-09-04 14:30:00', '730', '721', '725'],# ['2016-09-04 14:45:00', '745', '725', '731']],# 'my.csv': ...# }# convert the string data to python Datetime
DATETIME_FORMAT = "%Y-%m-%d %H:%M:%S"for fname, inner_data in data.iteritems():
for row in inner_data[1:]: # skip headers
p_datetime = datetime.strptime(row[0], DATETIME_FORMAT)
row[0] = p_datetime
# now the aggregates;defget_all_rows_in_dates(start_date, end_date, data):
headers = data[data.keys()[0]][0]
data_rows = []
for fname, inner_data in data.iteritems():
for row in inner_data[1:]: # skip the headerif start_date <= row[0] < end_date:
data_rows.append(row)
return headers, data_rows
defaggregate_col_12(values):
values = map(float, values)
returnsum(values)
defaggregate_col_3(values):
values = map(float, values)
returnsum(values) / float(len(values))
defcount_aggregates(rows_in_dates, start_date):
col1 = []
col2 = []
col3 = []
for row in rows_in_dates[1:]: # skip headers
col1.append(row[1])
col2.append(row[2])
col3.append(row[3])
return [start_date.strftime(DATETIME_FORMAT),
aggregate_col_12(col1), aggregate_col_12(col2), aggregate_col_3(col3)]
defwrite_results(headers, aggregate, fname):
data = []
data.append(headers)
data.append(aggregate)
withopen(fname, 'w+') as f:
writer = csv.writer(f, delimiter='|', quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer.writerows(data)
start_date = datetime(2016, 9, 4, 13, 0, 0)
end_date = datetime(2016, 9, 4, 14, 0, 0)
headers, to_aggregate = get_all_rows_in_dates(
start_date,
end_date,
data)
aggregates = count_aggregates(to_aggregate, start_date)
write_results(headers, aggregates, 'from_{}_to_{}.csv'.format(
start_date.strftime(DATETIME_FORMAT),
end_date.strftime(DATETIME_FORMAT),
))
Take care - user an appropriate delimiter and quotechar in your code. And this is only begining - you can use it as a start - the daily aggregate - should be achievable with this code, but if you want for example have a csv with per second row for an hour - you need to wrap it a little.
If you have any questions - please do.
Post a Comment for "For Loop Across Multiple Folders To Resample Datetime In Multiple Csv Files And Export With File Name Automatically Generated"