Skip to content Skip to sidebar Skip to footer

Extract Title With Beautifulsoup

I have this from urllib import request url = 'http://www.bbc.co.uk/news/election-us-2016-35791008' html = request.urlopen(url).read().decode('utf8') html[:60] from bs4 import Beau

Solution 1:

To navigate the soup, you need a BeautifulSoup object, not a string. So remove your get_text() call to the soup.

Moreover, you can replace raw.find_all('title', limit=1) with find('title') which is equivalent.

Try this :

from urllib import request
url = "http://www.bbc.co.uk/news/election-us-2016-35791008"
html = request.urlopen(url).read().decode('utf8')
html[:60]

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
title = soup.find('title')

print(title) # Prints the tagprint(title.string) # Prints the tag string content

Solution 2:

You can directly use "soup.title" instead of "soup.find_all('title', limit=1)" or "soup.find('title')" and it'll give you the title.

from urllib import request
url = "http://www.bbc.co.uk/news/election-us-2016-35791008"
html = request.urlopen(url).read().decode('utf8')
html[:60]

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
title = soup.title
print(title)
print(title.string)

Solution 3:

Make it simple as that:

soup = BeautifulSoup(htmlString, 'html.parser')
title = soup.title.text

Here, soup.title returns a BeautifulSoup element which is the title element.

Solution 4:

In some pages I had the NoneType problem. A suggestion is:

soup = BeautifulSoup(data, 'html.parser')
if (soup.title isnot None):
    title = soup.title.string

Post a Comment for "Extract Title With Beautifulsoup"