Python How To Add Exception?
Solution 1:
You need to add something to set data[bold_time]
:
if td.find('strong'):
bold_time = cur_time
data[bold_time] = ????? # whatever it should be
cur_time += datetime.timedelta(hours=1)
This should avoid both the NameError
and KeyError
exceptions as long as the word strong
is found. You still might want to code defensively and handle one or both of them gracefully. That what exception where meant to do, handle those exceptional cases that shouldn't happen...
Solution 2:
I had read your previous post before it disappeared, and then I've read this one. I find it a pity to use BeautifulSoup for your goal, because, from the code I see, I find its use complicated, and the fact is that regexes run roughly 10 times faster than BeautifulSoup.
Here's the code with only re
, that furnishes the data you are interested in.
I know, there will people to say that HTML text can't be parsed by regexs. I know, I know... but I don't parse the text, I directly find the chunks of text that are interesting. The source code of the webpage of this site is apparently very well structured and it seems there is little risk of bugs. Moreover, tests and verification can be added to keep watch on the source code and to be instantly informed on the possible changings made by the webmaster in the webpage
import re
from httplib import HTTPConnection
hypr = HTTPConnection(host='app2.nea.gov.sg',
timeout = 300)
rekete = ('/anti-pollution-radiation-protection/''air-pollution/psi/''psi-readings-over-the-last-24-hours')
hypr.request('GET',rekete)
page = hypr.getresponse().read()
patime = ('PSI Readings.+?''width="\d+%" align="center">\r\n'' *<strong>Time</strong>\r\n'' *</td>\r\n''((?: *<td width="\d+%" align="center">''<strong>\d+AM</strong>\r\n'' *</td>\r\n)+.+?)''width="\d+%" align="center">\r\n'' *<strong>Time</strong>\r\n'' *</td>\r\n''((?: *<td width="\d+%" align="center">''<strong>\d+PM</strong>\r\n'' *</td>\r\n)+.+?)''PM2.5 Concentration')
rgxtime = re.compile(patime,re.DOTALL)
patline = ('<td align="center">\r\n'' *<strong>'# next line = group 1'(North|South|East|West|Central|Overall Singapore)''</strong>\r\n'' *</td>\r\n''((?: *<td align="center">\r\n'# group 2 start' *[.\d-]+\r\n'#' *</td>\r\n)*)'# group 2 end' *<td align="center">\r\n'' *<strong style[^>]+>''([.\d-]+)'# group 3'</strong>\r\n'' *</td>\r\n')
rgxline = re.compile(patline)
rgxnb = re.compile('<td align="center">\r\n'' *([.\d-]+)\r\n'' *</td>\r\n')
m= rgxtime.search(page)
a,b = m.span(1) # m.group(1) contains the data AM
d = dict((mat.group(1),
rgxnb.findall(mat.group(2))+[mat.group(3)])
for mat in rgxline.finditer(page[a:b]))
a,b = m.span(2) # m.group(2) contains the data PMfor mat in rgxline.finditer(page[a:b]):
d[mat.group(1)].extend(rgxnb.findall(mat.group(2))+[mat.group(3)])
print'last 3 values'for k,v in d.iteritems():
print'%s : %s' % (k,v[-3:])
Post a Comment for "Python How To Add Exception?"