Python Regex Expression For Extracting Hashtags From Text
I'm processing some tweets I mined during the election and I need to a way to extract hashtags from tweet text while accounting punctuation, non-unicode characters, etc while still
Solution 1:
Yah, about the solution not involving regex. ;)
# -*- coding: utf-8 -*-import string
tweets = []
a = "I'm with HER! #NeverTrump #DumpTrump #imwithher🇺🇸 @ Williamsburg, Brooklyn"# filter for printable characters then
a = ''.join(filter(lambda x: x in string.printable, a))
print a
for tweet in a.split(' '):
if tweet.startswith('#'):
tweets.append(tweet.strip(','))
print tweets
and tada: ['#NeverTrump', '#DumpTrump', '#imwithher']
Post a Comment for "Python Regex Expression For Extracting Hashtags From Text"