Want To Scrape All The Specific Href From The A Tag
I have search the specific brand Samsung , for this number of products are search ,I just wanted to scrape all the href from the of the search products with the product name . ent
Solution 1:
Couple of things. You are trying to mix bs4 syntax with selenium which is causing your current error. Additionally, you are targeting potentially dynamic values. Finally, there are anti-scraping measures which may later impact on your work.
Ignoring the last, a more robust, syntax appropriate version, might be:
div = driver.find_elements_by_css_selector('[data-tracking="product-card"]')
links = [i.find_element_by_css_selector('[age="0"]').get_attribute('href') for i in div]
print(links)
You could reduce this just to a list comprehension with a different css selector combination e.g.:
links = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[data-tracking="product-card"] div:nth-child(1) > [href*=search]')]
For that last one, you can return dict with product name as follows:
{i.find_element_by_tag_name('img').get_attribute('alt'):i.get_attribute('href') for i in driver.find_elements_by_css_selector('[data-tracking="product-card"] div:nth-child(1) > [href*=search]')}
As a dataframe:
import pandas as pd
pd.DataFrame([(i.find_element_by_tag_name('img').get_attribute('alt'), i.get_attribute('href')) for i in driver.find_elements_by_css_selector('[data-tracking="product-card"] div:nth-child(1) > [href*=search]')], columns = ['Title', 'Link'])
Post a Comment for "Want To Scrape All The Specific Href From The A Tag"