Skip to content Skip to sidebar Skip to footer

Removing Emojis From A String In Python

I found this code in Python for removing emojis but it is not working. Can you help with other codes or fix to this? I have observed all my emjois start with \xf but when I try to

Solution 1:

On Python 2, you have to use u'' literal to create a Unicode string. Also, you should pass re.UNICODE flag and convert your input data to Unicode (e.g., text = data.decode('utf-8')):

#!/usr/bin/env pythonimport re

text = u'This dog \U0001f602'print(text) # with emoji

emoji_pattern = re.compile("["u"\U0001F600-\U0001F64F"# emoticonsu"\U0001F300-\U0001F5FF"# symbols & pictographsu"\U0001F680-\U0001F6FF"# transport & map symbolsu"\U0001F1E0-\U0001F1FF"# flags (iOS)"]+", flags=re.UNICODE)
print(emoji_pattern.sub(r'', text)) # no emoji

Output

This dog 😂
This dog 

Note: emoji_pattern matches only some emoji (not all). See Which Characters are Emoji.

Solution 2:

I am updating my answer to this by @jfs because my previous answer failed to account for other Unicode standards such as Latin, Greek etc. StackOverFlow doesn't allow me to delete my previous answer hence I am updating it to match the most acceptable answer to the question.

#!/usr/bin/env pythonimport re

text = u'This is a smiley face \U0001f602'print(text) # with emojidefdeEmojify(text):
    regrex_pattern = re.compile(pattern = "["u"\U0001F600-\U0001F64F"# emoticonsu"\U0001F300-\U0001F5FF"# symbols & pictographsu"\U0001F680-\U0001F6FF"# transport & map symbolsu"\U0001F1E0-\U0001F1FF"# flags (iOS)"]+", flags = re.UNICODE)
    return regrex_pattern.sub(r'',text)

print(deEmojify(text))

This was my previous answer, do not use this.

defdeEmojify(inputString):
    return inputString.encode('ascii', 'ignore').decode('ascii')

Solution 3:

Complete Version of remove Emojis ✍ 🌷 📌 👈🏻 🖥

import re
defremove_emojis(data):
    emoj = re.compile("["u"\U0001F600-\U0001F64F"# emoticonsu"\U0001F300-\U0001F5FF"# symbols & pictographsu"\U0001F680-\U0001F6FF"# transport & map symbolsu"\U0001F1E0-\U0001F1FF"# flags (iOS)u"\U00002500-\U00002BEF"# chinese charu"\U00002702-\U000027B0"u"\U00002702-\U000027B0"u"\U000024C2-\U0001F251"u"\U0001f926-\U0001f937"u"\U00010000-\U0010ffff"u"\u2640-\u2642"u"\u2600-\u2B55"u"\u200d"u"\u23cf"u"\u23e9"u"\u231a"u"\ufe0f"# dingbatsu"\u3030""]+", re.UNICODE)
    return re.sub(emoj, '', data)

Solution 4:

If you are not keen on using regex, the best solution could be using the emoji python package.

Here is a simple function to return emoji free text (thanks to this SO answer):

import emoji
defgive_emoji_free_text(text):
    allchars = [strforstrin text.decode('utf-8')]
    emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
    clean_text = ' '.join([strforstrin text.decode('utf-8').split() ifnotany(i instrfor i in emoji_list)])
    return clean_text

If you are dealing with strings containing emojis, this is straightforward

>> s1 = "Hi 🤔 How is your 🙈 and 😌. Have a nice weekend 💕👭👙">> print s1
Hi 🤔 How is your 🙈 and 😌. Have a nice weekend 💕👭👙
>> print give_emoji_free_text(s1)
Hi How is your and Have a nice weekend

If you are dealing with unicode (as in the exmaple by @jfs), just encode it with utf-8.

>> s2 = u'This dog \U0001f602'
>> print s2
This dog 😂
>> print give_emoji_free_text(s2.encode('utf8'))
This dog

Edits

Based on the comment, it should be as easy as:

defgive_emoji_free_text(text):
    return emoji.get_emoji_regexp().sub(r'', text.decode('utf8'))

Solution 5:

If you're using the example from the accepted answer and still getting "bad character range" errors, then you're probably using a narrow build (see this answer for more details). A reformatted version of the regex that seems to work is:

emoji_pattern = re.compile(
    u"(\ud83d[\ude00-\ude4f])|"# emoticonsu"(\ud83c[\udf00-\uffff])|"# symbols & pictographs (1 of 2)u"(\ud83d[\u0000-\uddff])|"# symbols & pictographs (2 of 2)u"(\ud83d[\ude80-\udeff])|"# transport & map symbolsu"(\ud83c[\udde0-\uddff])"# flags (iOS)"+", flags=re.UNICODE)

Post a Comment for "Removing Emojis From A String In Python"