Beautifulsoup Identifying Only Few Elements In The Page
I did web scraping on a site. It is taking only 1st 20 elements in the page. The remaining elements will be loaded if we scroll down. How to scrape those elements too? Is there any
Solution 1:
Use selenium to scroll down and then you can scrape the contents
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome(executable_path=os.path.join(os.getcwd(),'chromedriver'))
browser.get(link)
body = browser.find_element_by_tag_name("body")
no_of_pagedowns = 2#Enter number of pages that you would like to scroll herewhile no_of_pagedowns:
body.send_keys(Keys.PAGE_DOWN)
no_of_pagedowns-=1
Solution 2:
There are two different approach to this.
The first: Use a web scraping by retrieving a data API behind the site. You will need to understand what is bringing the new information for the site after the scroll. To understand that, open your browser dev tools (F12 in Chrome) in network area and observe what is being called after the scroll.
The second: Use Selenium to open a browser instance and load the page like a normal browser, scroll the page, and retrieve the information.
Post a Comment for "Beautifulsoup Identifying Only Few Elements In The Page"