Skip to content Skip to sidebar Skip to footer

Convert An Int Value To Unicode

I am using pyserial and need to send some values less than 255. If I send the int itself the the ascii value of the int gets sent. So now I am converting the int into a unicode val

Solution 1:

In Python 2 - Turn it into a string first, then into unicode.

str(integer).decode("utf-8")

Best way I think. Works with any integer, plus still works if you put a string in as the input.

Updated edit due to a comment: For Python 2 and 3 - This works on both but a bit messy:

str(integer).encode("utf-8").decode("utf-8") 

Solution 2:

Just use chr(somenumber) to get a 1 byte value of an int as long as it is less than 256. pySerial will then send it fine.

If you are looking at sending things over pySerial it is a very good idea to look at the struct module in the standard library it handles endian issues an packing issues as well as encoding for just about every data type that you are likely to need that is 1 byte or over.

Solution 3:

Use the chr() function instead; you are sending a value of less than 256 but more than 128, but are creating a Unicode character.

The unicode character has to then be encoded first to get a byte character, and that encoding fails because you are using a value outside the ASCII range (0-127):

>>> str(unichr(169))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError:'ascii' codec can't encode character u'\xa9' in position 0: ordinal not in range(128)

This is normal Python 2 behaviour; when trying to convert a unicode string to a byte string, an implicit encoding has to take place and the default encoding is ASCII.

If you were to use chr() instead, you create a byte string of one character and that implicit encoding does not have to take place:

>>> str(chr(169))
'\xa9'

Another method you may want to look into is the struct module, especially if you need to send integer values greater than 255:

>>> struct.pack('!H', 1000)
'\x03\xe8'

The above example packs an integer into a unsigned short in network byte order, for example.

Solution 4:

I think that the best solution is to be explicit and say that you want to represent a number as a byte (and not as a character):

>>>import struct>>>struct.pack('B', 128)>>>'\x80'

This makes your code work in both Python 2 and Python 3 (in Python 3, the result is, as it should, a bytes object). An alternative, in Python 3, would be to use the new bytes([128]) to create a single byte of value 128.

I am not a big fan of the chr() solutions: in Python 3, they produce a (character, not byte) string that needs to be encoded before sending it anywhere (file, socket, terminal,…)—chr() in Python 3 is equivalent to the problematic Python 2 unichr() of the question. The struct solution has the advantage of correctly producing a byte whatever the version of Python. If you want to send data over the serial port with chr(), you need to have control over the encoding that must take place subsequently. The code might work when the default encoding used by Python 3 is UTF-8 (which I think is the case), but this is due to the fact that Unicode characters of code point smaller than 256 can be coded as a single byte in UTF-8. This adds an unnecessary layer of subtlety and complexity that I do not recommend (it makes the code harder to understand and, if necessary, debug).

So, I strongly suggest that you use the approach above (which was also hinted at by Steve Barnes and Martijn Pieters): it makes it clear that you want to produce a byte (and not characters). It will not give you any surprise even if you run your code with Python 3, and it makes your intent clearer and more obvious.

Post a Comment for "Convert An Int Value To Unicode"