How To Remove All The Escape Sequences From A List Of Strings?
Solution 1:
If you want to strip out some characters you don't like, you can use the translate function to strip them out:
>>>s="\x01\x02\x10\x13\x20\x21hello world">>>print(s)
!hello world
>>>s
'\x01\x02\x10\x13 !hello world'
>>>escapes = ''.join([chr(char) for char inrange(1, 32)])>>>t = s.translate(None, escapes)>>>t
' !hello world'
This will strip out all these control characters:
001101 SOH (startof heading)
002202 STX (startof text)
003303 ETX (endof text)
004404 EOT (endof transmission)
005505 ENQ (enquiry)
006606 ACK (acknowledge)
007707 BEL '\a' (bell)
010808 BS '\b' (backspace)
011909 HT '\t' (horizontal tab)
012100A LF '\n' (new line)
013110B VT '\v' (vertical tab)
014120C FF '\f' (form feed)
015130D CR '\r' (carriage ret)
016140E SO (shift out)
017150F SI (shift in)
0201610 DLE (data link escape)
0211711 DC1 (device control 1)
0221812 DC2 (device control 2)
0231913 DC3 (device control 3)
0242014 DC4 (device control 4)
0252115 NAK (negative ack.)
0262216 SYN (synchronous idle)
0272317 ETB (endof trans. blk)
0302418 CAN (cancel)
0312519 EM (endof medium)
032261A SUB (substitute)
033271B ESC (escape)
034281C FS (file separator)
035291D GS (group separator)
036301E RS (record separator)
037311F US (unit separator)
For Python newer than 3.1, the sequence is different:
>>>s="\x01\x02\x10\x13\x20\x21hello world">>>print(s)
!hello world
>>>s
'\x01\x02\x10\x13 !hello world'
>>>escapes = ''.join([chr(char) for char inrange(1, 32)])>>>translator = str.maketrans('', '', escapes)>>>t = s.translate(translator)>>>t
' !hello world'
Solution 2:
Something like this?
>>>from ast import literal_eval>>>s = r'Hello,\nworld!'>>>print(literal_eval("'%s'" % s))
Hello,
world!
Edit: ok, that's not what you want. What you want can't be done in general, because, as @Sven Marnach explained, strings don't actually contain escape sequences. Those are just notation in string literals.
You can filter all strings with non-ASCII characters from your list with
defis_ascii(s):
try:
s.decode('ascii')
returnTrueexcept UnicodeDecodeError:
returnFalse
[s for s in ['william', 'short', '\x80', 'twitter', '\xaa',
'\xe2', 'video', 'guy', 'ray']
if is_ascii(s)]
Solution 3:
You could filter out "words" that are not alphanumeric using a list comprehension and str.isalnum()
:
>>> l = ['william', 'short', '\x80', 'twitter', '\xaa', '\xe2', 'video', 'guy', 'ray']
>>> [word for word in l if word.isalnum()]
['william', 'short', 'twitter', 'video', 'guy', 'ray']
If you wish to filter out numbers, too, use str.isalpha()
instead:
>>> l = ['william', 'short', '\x80', 'twitter', '\xaa', '\xe2', 'video', 'guy', 'ray', '456']
>>> [word for word in l if word.isalpha()]
['william', 'short', 'twitter', 'video', 'guy', 'ray']
Solution 4:
This cannot be done, at least at the broad scope you are asking. As others have mentioned, runtime python doesn't know the difference between the something with escape sequences, and something without.
Example:
print ('\x61' == 'a')
prints True
. So there's no way to find the difference between these two strings, unless you try some static analysis of your python script.
Solution 5:
I had similar issues while converting from hexadimal to String.This is what finally worked in python Example
list_l = ['william', 'short', '\x80', 'twitter', '\xaa', '\xe2', 'video', 'guy', 'ray']
decode_data=[]
for l in list_l:
data =l.decode('ascii', 'ignore')
ifdata != "":
decode_data.append(data)
# output :[u'william', u'short', u'twitter', u'video', u'guy', u'ray']
Post a Comment for "How To Remove All The Escape Sequences From A List Of Strings?"