Skip to content Skip to sidebar Skip to footer

Get Final Url After Timed Delay Or Redirect

I am trying to scrape a website, but when I open the webpage it has 5 seconds redirect delay, i.e. you have to wait for 5 sec and then the real page loads. I have tried the followi

Solution 1:

It looks like etherscan.io is protected by Cloudflare, and Cloudflare is causing the delayed redirect that you are seeing. One of the purposes of Cloudflare is to prevent bots from making automated requests to the site (which seems a lot like what you are doing).

Getting around Cloudflare will not be easy. First, you'll need to make your requests 'look like' they are coming from a real browser - meaning that the tool that you are using to make these requests needs to present the same request headers that a real browser would, handle cookies like a browser would, run javascript like a browser would, etc.

Even if you succeed in doing all of the above, Cloudflare is likely to block your requests (or challenge them) after certain number of requests have been made over some period of time.

Solution 2:

If you really really are set on using something other than selenium or the API (which would make the most sense), you could take a look at this. It's a scraper meant to handle cloudflare sites, but it requires some other things (most notably Node.js) to run. While this is pretty neat, seems like a pain when there are easier solutions.

Post a Comment for "Get Final Url After Timed Delay Or Redirect"