How To Handle Invalidschema Exception
Solution 1:
Inspect what is returned by each function. In this case, the function in your first script will never run. The reason because get_info
takes in a URL, not anything else. So obviously you are going to hit an error when you run get_info(elem)
where elem
is a list of items that are selected by soup.select()
.
You should already know the above though because you are iterating over the results from the second script which just returns the list to get the href
elements. So if you want to use get_info
in your first script, apply it on the items not the list, you can use a list comprehension in this case.
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return [get_info(urljoin(link,e.get("href"))) for e in elem]
def get_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)
link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'
for review in get_links(link):
print(review)
Now you know the first function still returns a list, but with get_info
applied to its elements, which is how it works rite? get_info
accepts a URL not a list. From there since you have already applied the url_join
and get_info
in get_links
, you can loop it over to print the results.
Post a Comment for "How To Handle Invalidschema Exception"