Skip to content Skip to sidebar Skip to footer

Want To Scrape All The Specific Href From The A Tag

I have search the specific brand Samsung , for this number of products are search ,I just wanted to scrape all the href from the of the search products with the product name . ent

Solution 1:

Couple of things. You are trying to mix bs4 syntax with selenium which is causing your current error. Additionally, you are targeting potentially dynamic values. Finally, there are anti-scraping measures which may later impact on your work.

Ignoring the last, a more robust, syntax appropriate version, might be:

div = driver.find_elements_by_css_selector('[data-tracking="product-card"]')
links = [i.find_element_by_css_selector('[age="0"]').get_attribute('href') for i in div]
print(links)

You could reduce this just to a list comprehension with a different css selector combination e.g.:

links = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[data-tracking="product-card"] div:nth-child(1) > [href*=search]')]

For that last one, you can return dict with product name as follows:

{i.find_element_by_tag_name('img').get_attribute('alt'):i.get_attribute('href') for i in driver.find_elements_by_css_selector('[data-tracking="product-card"] div:nth-child(1) > [href*=search]')}

As a dataframe:

import pandas as pd

pd.DataFrame([(i.find_element_by_tag_name('img').get_attribute('alt'), i.get_attribute('href')) for i in driver.find_elements_by_css_selector('[data-tracking="product-card"] div:nth-child(1) > [href*=search]')], columns = ['Title', 'Link'])

Post a Comment for "Want To Scrape All The Specific Href From The A Tag"