Skip to content Skip to sidebar Skip to footer

Python Regex Expression For Extracting Hashtags From Text

I'm processing some tweets I mined during the election and I need to a way to extract hashtags from tweet text while accounting punctuation, non-unicode characters, etc while still

Solution 1:

Yah, about the solution not involving regex. ;)

# -*- coding: utf-8 -*-import string 
tweets = []

a = "I'm with HER! #NeverTrump #DumpTrump #imwithher🇺🇸 @ Williamsburg, Brooklyn"# filter for printable characters then
a = ''.join(filter(lambda x: x in string.printable, a))

print a

for tweet in a.split(' '):
    if tweet.startswith('#'):
        tweets.append(tweet.strip(','))

print tweets

and tada: ['#NeverTrump', '#DumpTrump', '#imwithher']

Post a Comment for "Python Regex Expression For Extracting Hashtags From Text"