getclip and putclip garble unicode characters

Mark Geisert mark@maxrnd.com
Mon Jul 5 10:04:21 GMT 2021


Replying to myself...

Mark Geisert wrote:
> Hi Leonid (?),
> 
> Миронов Леонид Владимирович via Cygwin wrote:
>> getclip and putclip from cygutils-extra garble unicode characters: non-latin 
>> characters copied to clipboard in windows are replaced with question marks when 
>> retrieved with getclip in cygwin, and non-latin characters copied to clipboard 
>> using putclip are pasted it in windows looking like utf-8 displayed in cp1252 
>> but can be retrieved with getclip exactly as pasted, so it looks like the 
>> problem is not in the way the data is copied but in the way cygwin and windows 
>> communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI 
>> codepage is set to cp1251 - 1251, not 1252.
> 
> Thanks for the report.  I will investigate.

I believe I have a local testcase similar to your report: If I select a region of 
text on a message displayed from the Cygwin mailing list digest, and that message 
has Cyrillic characters in it, getclip replaces those characters with '?' on output.

Since Thomas suggested an alternative, using 'cat < /dev/clipboard', I tried that 
as well and see that here UTF-8 is output and the Cyrillic characters are intact.

So I've modified getclip to understand what MS calls CF_UNICODETEXT from the 
clipboard and have it converted to UTF-8 for output.  Thus my new getclip can 
duplicate what the alternative does.  (What getclip could understand previously 
was CF_TEXT ("normal" ANSI characters) or CYGWIN_NATIVE (an internal Cygwin format 
that makes your putclip + getclip example work)).

How about I generate a test version of the cygutils package with this updated 
getclip and you can see if it solves your issue?
Stay tuned,

..mark


More information about the Cygwin mailing list