Create A Dataframe From Html Table In Python
I'm trying to extract info from multiple tables, like the one below. I'm trying to extract the address, lot number, guide price, description - should I simply do a regular expressi
Solution 1:
Use beautifulSoup to parse html:
Using your posted html as an example:
from bs4 importBeautifulSoup
soup = BeautifulSoup(html)
s = (soup.find_all("p"))
for ele ins:
print(ele.text.strip()
DescriptionLeasehold 2nd FloorStudioFlatUnmodernisedVacantGuidePrice
£450,000PlusLotNumber2AuctioneerSavills (London - National)
VendorHousingAssociationAuctionDate28October2014LeaseDetails125Yr, commencing 01/01/2013 (GR.£250.PA)
Solution 2:
If the tables are reasonably formatted, you can use pandas' read_html
method. It will return a list of dataframes, one for each table found.
pandas.read_html(html_string_or_url)
If pandas can't read it, you need to parse it manually. You should use an HTML parser library like Beautiful Soup.
Post a Comment for "Create A Dataframe From Html Table In Python"