Command-line Arguments As Bytes Instead Of Strings In Python3
Solution 1:
When I receive a filename argument which is invalid, however, it is handed to me as a unicode string with strange characters like \udce8.
Those are surrogate characters. The low 8 bits is the original invalid byte.
See PEP 383: Non-decodable Bytes in System Character Interfaces.
Solution 2:
Don't go against the grain: filenames are strings, not bytes.
You shouldn't use a bytes
when you should use a string
. A bytes
is a tuple of integers. A string
is a tuple of characters. They are different concepts. What you're doing is like using an integer when you should use a boolean.
(Aside: Python stores all strings in-memory under Unicode; all strings are stored the same way. Encoding specifies how Python converts the on-file bytes into this in-memory format.)
Your operating system stores filenames as strings under a specific encoding. I'm surprised you say that some filenames have different encodings; as far as I know, the filename encoding is system-wide. Functions like open
default to the default system filename encoding, for example.
Post a Comment for "Command-line Arguments As Bytes Instead Of Strings In Python3"