Skip to content Skip to sidebar Skip to footer

How To Scrape .csv Files From A Url, When They Are Saved In A .zip File In Python?

I am trying to scrape some .csv files from a website. I currently have a list of links: master_links = [ 'http://mis.nyiso.com/public/csv/damlbmp/20161201damlbmp_zone_csv.zip'

Solution 1:

You can do that with a custom file reader for pandas.read_csv() like:

Code:

deffetch_multi_csv_zip_from_url(url, filenames=(), *args, **kwargs):
    assert kwargs.get('compression') isNone
    req = urlopen(url)
    zip_file = zipfile.ZipFile(BytesIO(req.read()))

    if filenames:
        names = zip_file.namelist()
        for filename in filenames:
            if filename notin names:
                raise ValueError(
                    'filename {} not in {}'.format(filename, names))
    else:
        filenames = zip_file.namelist()

    return {name: pd.read_csv(zip_file.open(name), *args, **kwargs)
            for name in filenames}

Some Docs: (ZipFile) (BytesIO) (urlopen)

Test Code:

try:
    from urllib.request import urlopen
except ImportError:
    from urllib2 import urlopen
from io import BytesIO
import zipfile
import pandas as pd

master_links = [
    'http://mis.nyiso.com/public/csv/damlbmp/20161201damlbmp_zone_csv.zip',
    'http://mis.nyiso.com/public/csv/damlbmp/20160301damlbmp_zone_csv.zip',
    'http://mis.nyiso.com/public/csv/damlbmp/20160201damlbmp_zone_csv.zip']

dfs = fetch_multi_csv_zip_from_url(master_links[0])
print(dfs['20161201damlbmp_zone.csv'].head())

Results:

         Time Stamp    Name   PTID  LBMP($/MWHr)  \
012/01/201600:00  CAPITL  6175721.94112/01/201600:00  CENTRL  6175416.85212/01/201600:00  DUNWOD  6176020.85312/01/201600:00  GENESE  6175316.16412/01/201600:00     H Q  6184415.73   

   Marginal Cost Losses($/MWHr)  Marginal Cost Congestion($/MWHr)01.21                              -4.4510.11                              -0.4521.58                              -2.993                          -0.49                              -0.364                          -0.550.00

Post a Comment for "How To Scrape .csv Files From A Url, When They Are Saved In A .zip File In Python?"