How To Scrape Data Using Next Button With Ellipsis Using Scrapy

August 30, 2023 Post a Comment

I need to continuously get the data on next button <1 2 3 ... 5> but there's no provided href link in the source also there's also elipsis. any idea please? here's my code de

Solution 1:

It seems this pagination uses additional request to API. So, there are two ways:

Use Splash/Selenium to render pages by pattern of QHarr;
Make same calls to API. Check developer tools, you will find POST-request https://www.forever21.com/us/shop/Catalog/GetProducts will all proper params (they are too long, so I will not post full list here).

Solution 2:

The url changes so you can specify page number and results per page in the url e.g.

https://www.forever21.com/uk/shop/catalog/category/f21/sale/#pageno=2&pageSize=120&filter=price:0,250

As mentioned by @vezunchik and OP feedback, this approach requires selenium/splash to allow js to run on the page. If you were going down that route you could just click the next ( .p_next) until you get the end page as it is easy to grab the last page number (.dot + .pageno)from the document.

I appreciate you are trying with scrapy.

Demo of the idea with selenium in case helps.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url_loop = 'https://www.forever21.com/uk/shop/catalog/category/f21/sale/#pageno={}&pageSize=120&filter=price:0,250'
url = 'https://www.forever21.com/uk/shop/catalog/category/f21/sale'
d = webdriver.Chrome()
d.get(url)

d.find_element_by_css_selector('[onclick="fnAcceptCookieUse()"]').click() #get rid of cookies
items =  WebDriverWait(d,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#products .p_item")))
d.find_element_by_css_selector('.selectedpagesize').click()
d.find_elements_by_css_selector('.pagesize')[-1].click() #set page result count to 120
last_page = int(d.find_element_by_css_selector('.dot + .pageno').text) #get last pageif last_page > 1:
    for page inrange(2, last_page + 1):
        url = url_loop.format(page)
        d.get(url)
        try:
            d.find_element_by_css_selector('[type=reset]').click() #reject offerexcept:
            pass# do something with pagebreak#delete later

Python Playground

How To Scrape Data Using Next Button With Ellipsis Using Scrapy

Solution 1:

Solution 2:

Post a Comment for "How To Scrape Data Using Next Button With Ellipsis Using Scrapy"