Paste new button, ANSI and Unicode clipboard data

Bug reports will be moved here when the described bug has been fixed

Moderators: white, Hacker, petermad, Stefan2

User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Paste new button, ANSI and Unicode clipboard data

Post by *DrShark »

First, I'll describe the issue on examples:
1. On Windows OS with Russian locale* and active English keyboard layout, copy this button from Chrome browser and paste it in TC's button bar.
It will be pasted like this:
TOTALCMD#BAR#DATA
cmd /c
echo.>>???_?????.??????????
shell32.dll
??????? ???????? ?????

1
-1
However, when pasting in Notepad or Akelpad the button code will be correct. With Ctrl+Shift+V or Ctrl+Shift+Ins Akelpad allows to insert data as ANSI clipboard data, and this will insert above code with ??????? marks.
2. On other side, some (mostly old) programs while active English keyboard copy cyrillic text in a way that target program inserts it wrong, for example the word
тест
will be pasted like
òåñò
and pasting from ANSI clipboard (mentioned Akelpad's Ctrl+Shift+V or Ctrl+Shift+Ins) is fixing this.

The explanation from Akelpad author is available here (Google Translate; russian original)

*May depend on value of parameters 1252 and 1250 in registry hive
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
In russian Windows they often remapped to 1251 to solve cyrillic display issues.

To solve the issue, I suggest:
1) use Unicode clipboard data to insert a button using button bar's Paste context menu item;
2) use ANSI clipboard data to insert a button using Paste menu while holding Crtl+Shift buttons;
3) in other Commander's editable fields make Ctrl+V or Ctrl+Ins to insert Unicode clipboard data, and Ctrl+Shift+V or Ctrl+Shift+Ins to insert ANSI one (Akelpad's behaviour).
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

It is strange BTW why all is OK when I copy button from Firefox and paste it to TC with English keyboard layout - it means that Firefox copies data in Unicode format and TC uses it...
So perhaps it is a Chrome problem...

But if Notepad and AkelPad paste text properly...

I don't have Chrome to check.


Well, I've tried with Vivaldi and here some details:

Firefox copes text in following formats (checked with Nirsoft's InsideClipboard tool):
CF_TEXT
CF_UNICODETEXT
HTML Format
text/html
some its own ones
Vivaldi (Chromium) copies following formats:
CF_TEXT
CF_OEMTEXT
CF_UNICODETEXT
CF_LOCALE
HTML Format
I think that problem is caused by CF_OEMTEXT...
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

The problem is that TC doesn't currently support Unicode in the button bar "paste" function. It encodes characters from other codepage as UTF-8 with BOM. When you set the default locale to Western while copying Cyrillic, it will not work.
Author of Total Commander
https://www.ghisler.com
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Post by *DrShark »

MVV wrote:I think that problem is caused by CF_OEMTEXT...
You are right. Maybe TC inserts OEM data instead of ANSI. I have some additional checking with a program which has an issue that described in second example of the first topic post and which with when Western keyboard layout active puts OEM data to clipboard this way (info from InsideClipboard):

Code: Select all

00000000   54 4F 54 41 4C 43 4D 44 23 42 41 52 23 44 41 54    TOTALCMD#BAR#DAT
00000010   41 20 0D 0A 63 6D 64 20 2F 63 20 0D 0A 65 63 68    A ..cmd /c ..ech
00000020   6F 2E 3E 3E A8 AC EF 5F E4 A0 A9 AB A0 2E E0 A0    o.>>Ё¬п_д ©« .а 
00000030   E1 E8 A8 E0 A5 AD A8 A5 20 0D 0A 73 68 65 6C 6C    биЁаҐ­ЁҐ ..shell
00000040   33 32 2E 64 6C 6C 20 0D 0A 81 EB E1 E2 E0 AE A5    32.dll ..Ѓлбв஥
00000050   20 E1 AE A7 A4 A0 AD A8 A5 20 E4 A0 A9 AB A0 20     ᮧ¤ ­ЁҐ д ©«  
00000060   0D 0A 0D 0A 31 20 0D 0A 2D 31 00 00 00             ....1 ..-1...   
Such button is pasted in TC with correct russian text while Notepad/Akelpad shows data after Ctrl+V like that:

Code: Select all

TOTALCMD#BAR#DATA 
cmd /c 
echo.>>èìÿ_ôàéëà.ðàñøèðåíèå 
shell32.dll 
Áûñòðîå ñîçäàíèå ôàéëà 

1 
-1:
So it seems that correction that AkelPad makes with Ctrl+Shift+V is done by TC by default when pasting to the buttonbar. Or it is related to clipboard data format index (data from InsideClipboard):
Format ID| Name| Handle type| Memor| Index

Code: Select all

1	CF_TEXT	Memory	109	1	
7	CF_OEMTEXT	Memory	109	3	
13	CF_UNICODETEXT	Memory	218	4	
16	CF_LOCALE	Memory	4	2	
On the other hand Chrome-based browsers put to clipboard OEM data text in this case (when Western keyboard layout is active) like that (info from InsideClipboard):

Code: Select all

TOTALCMD#BAR#DATA 
cmd /c 
echo.>>???_?????.?????????? 
shell32.dll 
??????? ???????? ????? 

1 
-1
Chrome's clipboard data formats (info from InsideClipboard):
Format ID| Name| Handle type| Memory| Index

Code: Select all

1	CF_TEXT	Memory	107	4	
7	CF_OEMTEXT	Memory	107	5	
13	CF_UNICODETEXT	Memory	214	1	
16	CF_LOCALE	Memory	4	3	
49378	HTML Format	Memory	8 688	2
2ghisler(Author)
In the light of above info maybe while pasting Unicode data in buttonbar is not implemented, you can check OEM data format and if it is Chome-like then use ANSI data (CF_TEXT) instead?
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

I don't think that TC should use CF_OEMTEXT at all if CF_TEXT is available.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

DrShark wrote:
MVV wrote:I think that problem is caused by CF_OEMTEXT...
You are right. Maybe TC inserts OEM data instead of ANSI.
CF_OEMTEXT would only be a problem if TC voluntarily uses that format. But I don't think that this is the case if you have all the other ones available.
The clipboard works for text simpler than you think.
When copying to clipboard, most applications will only use one format (most likely Unicode/UTF-16 nowadays) and set

Code: Select all

SetClipboardData(CF_UNICODETEXT, handle)
Then Windows will automatically generate

Code: Select all

CF_TEXT
CF_OEMTEXT
CF_LOCALE
The same happens for e.g. a DIB (device-independent bitmap, i.e. a picture copied to clipboard):
You set
CF_DIB
to the clipboard, and Windows automatically generates
CF_BITMAP
CF_DIBV5

For opening/reading the clipboard, you'd use EnumClipboardFormats() to check for what is available.


So as Ghisler explained: The button bar only supports ANSI. If you voluntarily changed your locale to non-Russian, Windows can't map most of the characters to either an ANSI or OEM text and has to use replacement characters instead, which is the question mark by default. It doesn't matter if TC uses the pre-made CF_TEXT or CF_OEMTEXT (both already have the question marks) or recodes CF_UNICODETEXT to ANSI by itself: in all cases you have no correct mapping and see the question marks - simple as that.
So, no TC bug, but a simple limitation.

DrShark wrote:and pasting from ANSI clipboard (mentioned Akelpad's Ctrl+Shift+V or Ctrl+Shift+Ins) is fixing this.
...
3) in other Commander's editable fields make Ctrl+V or Ctrl+Ins to insert Unicode clipboard data, and Ctrl+Shift+V or Ctrl+Shift+Ins to insert ANSI one (Akelpad's behaviour).
This is only able to work in Akelpad because you told it which codepage to use for that (Settings -> Codepage recognition/Default codepage). So for TC you'd need to do the same: tell it which codepage to use for such special paste from clipboard.
Last edited by milo1012 on 2017-03-20, 21:08 UTC, edited 1 time in total.
TC plugins: PCREsearch and RegXtract
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Post by *DrShark »

milo1012 wrote:When copying to clipboard, most applications will only use one format (most likely Unicode/UTF-16 nowadays)
...
Then Windows will automatically generate [others]
...
It doesn't matter if TC uses the pre-made CF_TEXT or CF_OEMTEXT (both already have the question marks)
You're correct and that's the case for Chrome. Other apps (like mentioned Firefox, or my IE 9 which also has correct CF_TEXT data), probably genearate each clipboard data format manually so they don't have this problem with question marks. So to solve the issue with apps like Chrome we'll have to wait for Unicode support in the button bar "paste" function.
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

DrShark wrote:Other apps (like mentioned Firefox, or my IE 9 which also has correct CF_TEXT data), probably genearate each clipboard data format manually so they don't have this problem with question marks.
It doesn't matter if you generate them manually or automatically:
Every app which does not let you specify the specific codepage (i.e. locale) to generate non-Unicode text for Clipboard has this problem, and uses the system-set codepage instead.

But we're talking about the "sender" side, while the point of this thread is the "receiving" TC side. Like I said earlier, TC would need a (global) setting of which codepage to use for such alternative paste from clipboard, otherwise it would be useless.
TC plugins: PCREsearch and RegXtract
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

milo1012,
I have Firefox and Vivaldi (Chromium), I copy same text from mentioned link, and only the one copied from Chromium has CF_OEMTEXT - why Windows doesn't generate this one for Firefox?

And, I have Russian locale, but button from Firefox is pasted correctly while the one from Chromium is pasted with a bunch of ?s instead of Cyrillic.

BTW CF_TEXT data copied from Vivaldi already contains ?s (so TC has no chances if it uses only this format), while the same one copied with Russian keyboard layout contains correct letters. At the same time CF_TEXT data copied from Firefox is always in proper encoding regardless of keyboard layout... Does that mean that Firefox sets both CF_TEXT and CF_UNICODETEXT while Chromium - only CF_UNICODETEXT one?

I've also noticed that data from Chromium has CF_LOCALE format (Firefox doesn't set it), and it is 0x409 for English layout and 0x419 for Russian one...
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

To my knowledge, only CF_TEXT is generated from CF_UNICODETEXT and vice versa, not CF_OEMTEXT.
Author of Total Commander
https://www.ghisler.com
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

MVV wrote:I have Firefox and Vivaldi (Chromium), I copy same text from mentioned link, and only the one copied from Chromium has CF_OEMTEXT - why Windows doesn't generate this one for Firefox?
Either because they removed it afterwards, or because they don't use CF_UNICODETEXT in the first place.
There are a lot of combinations and conversion formats
https://msdn.microsoft.com/en-us/library/windows/desktop/ms649013.aspx
MVV wrote:And, I have Russian locale, but button from Firefox is pasted correctly while the one from Chromium is pasted with a bunch of ?s instead of Cyrillic.
First of all I think this combination is special. You somehow managed to make the system default codepage non-Cyrillic (that's why most programs produce the ?s) but on the other hand the correct cp is still available for lookup somehow.
Concerning Firefox, I guess it's because Firefox identifies the text encoding for the website at hand (View -> Text Encoding) and applies the correct conversion codepage (NLS) (if available on the system).
You could look at the Firefox/Gecko source for this, but this is actually not the topic, since the suggestion was to implement the special ANSI paste for TC, but for this TC needs to know the cp to use for this.
MVV wrote:At the same time CF_TEXT data copied from Firefox is always in proper encoding regardless of keyboard layout...
But wouldn't TC, even if it would use the correct CF_TEXT data, still apply the wrong codepage and make it wrong again (the 1251 -> 1252 / тест -> òåñò example)?
It's really hard to emulate your system setting on a non-Russian Windows.
On my system (English, cp 1252) the 1251 page is available, but Firefox still does not use it for Cyrillic websites when copying text to clipboard.
MVV wrote:Does that mean that Firefox sets both CF_TEXT and CF_UNICODETEXT while Chromium - only CF_UNICODETEXT one?
The latter is quite possible, for the first one I'm not sure (we'd need to look into the source).


ghisler(Author) wrote:To my knowledge, only CF_TEXT is generated from CF_UNICODETEXT and vice versa, not CF_OEMTEXT.
CF_OEMTEXT and CF_LOCALE are always created on Windows 7 and above when using only CF_UNICODETEXT for a clipboard fill (prefixed EmptyClipboard() of course), I just checked it with my own code. Not sure about older OS and non-English Windows though.
TC plugins: PCREsearch and RegXtract
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Post by *DrShark »

milo1012 wrote:the suggestion was to implement the special ANSI paste for TC, but for this TC needs to know the cp to use for this
Actually this topic covers two different issues with different suggestions to solve them (I wasn't sure about it when decided to create a bugreport). After your and Christain's explanations the issues/suggestions are following:
1) for apps like Chrome that doesn't have own option to use codepage for non-unicode text (or it isn't properly set) and use CF_UNICODETEXT, the issue is only with TC's button bar which uses CF_ANSI (CF_OEM has nothing to do with it). The solution is to wait until button bar will be able paste CF_UNICODETEXT, if Christian will add that.
2) for old apps that doesn't use CF_UNICODETEXT to copy text data and for TC to paste it correctly in the places where it accepts CF_UNICODETEXT (buttonbar currently isn't one of them) add an option to paste as CF_ANSI (Akelpad's Ctrl+Shift+V (Ins)). Like you noted, this will also require for TC to have own option to use locale for non-unicode text (edit: like I noted above the button bar accepts CF_ANSI from such apps correctly, so I'm not sure whether TC needs any additional option to use locale for non-unicode text to solve this second issue). To test second issue I used Plaj On-Line 1.0 from 1999 (ru-to-ua and ua-to-ru offline translator), it is not freeware but AFAIR had a trial time to test it).
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

milo1012 wrote:You somehow managed to make the system default codepage non-Cyrillic (that's why most programs produce the ?s)
It is definitely not my case. Usually I see ?s only when I copy text from old non-Unicode programs when English keyboard layout is active (which is the default one). But system default codepage is 1251.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

2MVV
I think that we need to clarify the terms:
1st, the system codepage (both for ANSI and OEM), which is either set due to the specific Windows version and installation/user creation (and probably the basic regional settings in control panel (1st tab) too), or by overriding it by the "Language for non-Unicode Programs" setting (last tab), which will either install or set to use a different NLS (national language support) file, regardless of the installation or the setting on the first tab.
I think this is called the "System" locale (or "User" locale, as each user can have it's own language pack setting on NT 6.0+ systems IIRC).
2nd, the so called "Input" locale (which results in it's own codepage for conversion), which is basically the keyboard layout.

Update: I just found the most fitting MSDN article, which explains it in even more detail:
https://msdn.microsoft.com/en-us/library/windows/desktop/dd319088.aspx

MVV wrote:
milo1012 wrote:You somehow managed to make the system default codepage non-Cyrillic (that's why most programs produce the ?s)
It is definitely not my case.
Well, maybe not the system cp, but the locale is generated auomatically in clipboard:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms649013.aspx wrote:The data is a handle to the locale identifier associated with text in the clipboard. When you close the clipboard, if it contains CF_TEXT data but no CF_LOCALE data, the system automatically sets the CF_LOCALE format to the current input language. You can use the CF_LOCALE format to associate a different locale with the clipboard text.
So I guess the following happens: for the automatic conversion from CF_UNICODETEXT, Windows seems to use the input locale which was set due to your kb layout at that time, and therefore some non-Cyrillic codepage. Result: question mark replacement.
Firefox seems to use the User locale or some automatic detection, regardless of the Input locale, for setting CF_TEXT manually (no automatic conversion). Result: correct text in CF_TEXT.
MVV wrote:Usually I see ?s only when I copy text from old non-Unicode programs when English keyboard layout is active (which is the default one). But system default codepage is 1251.
Obviously, since the same automatic conversion happens (CF_LOCALE), but this time from non-Unicode to Unicode, i.e. from CF_TEXT to CF_UNICODETEXT.
TC plugins: PCREsearch and RegXtract
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

milo1012 wrote:I think that we need to clarify the terms
That's correct, and I talk about two terms: system default codepage and keyboard layout.
Well, maybe not the system cp, but the locale is generated auomatically in clipboard:
Yes, Windows converts ANSI text to Unicode according to current keyboard layout... But it doesn't describe why Firefox does copy ANSI text properly in case of English keyboard layout while Chromium doesn't (perhaps Firefox uses codepage information from web page, it is the only reasonable explanation that comes to my mind).
Post Reply