Beautiful Soup Scraping Table
I have this small piece of code to scrape table data from a web site and then display in a csv format. The issue is that for loop is printing the records multiple time . I am not s
Solution 1:
from bs4 import BeautifulSoup
import requests
url = requests.get("https://www.top500.org/list/2018/06/")
soup = BeautifulSoup(url.content, 'html.parser')
table = soup.find_all('table', attrs={'class':'table table-condensed table-striped'})
for i in table:
tr = i.find_all('tr')
for x in tr:
print(x.text)
Or the best way to parse table using pandas
import pandas as pd
table = pd.read_html('https://www.top500.org/list/2018/06/', attrs={
'class': 'table table-condensed table-striped'}, header = 1)
print(table)
Solution 2:
It's printing much of the data multiple times because the newtext
variable, which you are printing after getting the text of each <td></td>
, is just accumulating all the values. Easiest way to get this to work is probably to just move the line print(newtxt)
outside of both for
loops - that is, leave it totally unindented. You should then see a list of all the text, with that from each row on a new line, and that from each individual cell in a row separated by commas.
Post a Comment for "Beautiful Soup Scraping Table"