Skip to content Skip to sidebar Skip to footer

Create A Dataframe From Html Table In Python

I'm trying to extract info from multiple tables, like the one below. I'm trying to extract the address, lot number, guide price, description - should I simply do a regular expressi

Solution 1:

Use beautifulSoup to parse html:

Using your posted html as an example:

from bs4 importBeautifulSoup

soup = BeautifulSoup(html)

s = (soup.find_all("p"))
for ele ins:
     print(ele.text.strip()

DescriptionLeasehold 2nd FloorStudioFlatUnmodernisedVacantGuidePrice
                               £450,000PlusLotNumber2AuctioneerSavills (London - National)
VendorHousingAssociationAuctionDate28October2014LeaseDetails125Yr, commencing 01/01/2013 (GR250.PA)

Solution 2:

If the tables are reasonably formatted, you can use pandas' read_html method. It will return a list of dataframes, one for each table found.

pandas.read_html(html_string_or_url)

If pandas can't read it, you need to parse it manually. You should use an HTML parser library like Beautiful Soup.

Post a Comment for "Create A Dataframe From Html Table In Python"