Skip to content Skip to sidebar Skip to footer

Strange Vanishing Of Cr In Strings Coming From A Copy Of A File's Content Passed To Raw_input()

Trying to clear up the reasons of what seemed to be a bug, I finally bumped into a weird behaviour of the raw_input() function in Python 2.7: it removes the CR characters of pairs

Solution 1:

sys.stdin is opened in text mode (you can check this by displaying sys.stdin.mode and seeing that it is 'r'). If you open any file in text mode in Python, then the platform native line ending (\r\n for Windows) will be converted to a simple line feed (\n) in the Python string.

You can see this in operation by opening your PASTED.txt file using mode 'r' instead of 'rb'.

Solution 2:

After my post, I could look up from my code, and I indeed noticed that the modification of data copied from a file and passed to raw_input() is the same as the modification of newlines that Python performs when it reads data directly in a file, which is evidenced here:

withopen("TestWindows.txt", 'wb') as f:
    f.write("PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  ")

print"\n- Following string have been written in TestWindows.txt in mode 'wb' :\n"+\
      "PACIFIC \\r  ARCTIC \\n  ATLANTIC \\r\\n  "print"\n- data got by reading the file TestWindows.txt in 'rb' mode :"withopen("TestWindows.txt", 'rb') as f:
    print"    repr(data)==",repr(f.read())

print"\n- data got by reading the file TestWindows.txt in 'r' mode :"withopen("TestWindows.txt", 'r') as f:
    print"    repr(data)==",repr(f.read())

print"\n- data got by reading the file TestWindows.txt in 'rU' mode :"withopen("TestWindows.txt", 'rU') as f:
    print"    repr(data)==",repr(f.read())

result:

- Following string have been written in TestWindows.txt in mode 'wb' :
PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  

- data got by reading the file TestWindows.txt in'rb' mode :
    repr(data)== 'PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  '

- data got by reading the file TestWindows.txt in'r' mode :
    repr(data)== 'PACIFIC \r  ARCTIC \n  ATLANTIC \n  '

- data got by reading the file TestWindows.txt in'rU' mode :
    repr(data)== 'PACIFIC \n  ARCTIC \n  ATLANTIC \n  '

First, the file PASTED.txt has the same content as the file PRIM.txt, resulting from copying PRIM.txt's content and pasting it in PASTED.txt without transiting in a Python string. So, when data goes from a file to another file transiting only by clipboard, it isn't modified. This fact proves that the content of PRIM.txt stands uncorrupted in the clipboard where the copying put the data.

Secondly, data going from a file to a Python string via clipboard and raw_input() is modified; hence the modification takes place between the clipboard and the Python string. So I thought that raw_input() might do the same interpretation of data received from the clipboard than the Python interpreter does when it receives data from a reading of file.

Then, I embroidered on the idea that the replacement of \r\n with \n is due to the fact that a data of "Windows nature" becomes a data of "Python nature" and that a clipboard doesn't introduce a modification in data because it is a part under control of the Windows operating system.

Alas, the fact that data copied from the screen and passed to raw_input() doesn't undergo transformation of the newlines \r\n , despite the fact that this data transits through Windows's clipboard, breaks my tiny concept.

Then I thought that Python knows the nature of a data not because of its source, but because of information contained in the data; such information is a 'format'. I found the following page concerning Windows's clipboard and there are indeed several formats for the information recorded by a clipboard:

http://msdn.microsoft.com/en-us/library/ms648709(v=vs.85).aspx

Maybe, the explanation of the modification of \r\n by Python is linked to these formats existing in clipboard and maybe not. But I don't understand enough all this mess and I am far to be sure.

Is anybody able to explain all the above observations ?

.

.

Thank you for your answer, ncoghlan. But I don't think it's the reason:

  • sys.stdin has no attribute mode

  • sys.stdin refers to the keyboard, as far as I undesrtand. However, in my code, data doesn't come from a typing on the keyboard but from a pasting via the clipboard. It's different.

The key point is that I don't understand how the Python interpeter could differentiates a data coming from clipboard having been copied from a file and a data coming from clipboard having been copied from the screen

Post a Comment for "Strange Vanishing Of Cr In Strings Coming From A Copy Of A File's Content Passed To Raw_input()"