This forum uses cookies. Click X button to hide this message. What is stored? 
Total Commander Forum Index Total Commander
Forum - Public Discussion and Support
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Paste new button, ANSI and Unicode clipboard data
Goto page 1, 2  Next
 
Post new topic   Reply to topic    Total Commander Forum Index -> TC9.0x bug reports (English) Printable version
View previous topic :: View next topic  
Author Message
DrShark
Power Member
Power Member


Joined: 03 Nov 2006
Posts: 699
Location: Kyiv, 68/262

PostPosted: Sun Mar 19, 2017 5:06 pm    Post subject: Paste new button, ANSI and Unicode clipboard data Reply with quote

First, I'll describe the issue on examples:
1. On Windows OS with Russian locale* and active English keyboard layout, copy this button from Chrome browser and paste it in TC's button bar.
It will be pasted like this:
Quote:
TOTALCMD#BAR#DATA
cmd /c
echo.>>???_?????.??????????
shell32.dll
??????? ???????? ?????

1
-1

However, when pasting in Notepad or Akelpad the button code will be correct. With Ctrl+Shift+V or Ctrl+Shift+Ins Akelpad allows to insert data as ANSI clipboard data, and this will insert above code with ??????? marks.
2. On other side, some (mostly old) programs while active English keyboard copy cyrillic text in a way that target program inserts it wrong, for example the word
Quote:
тест

will be pasted like
Quote:

and pasting from ANSI clipboard (mentioned Akelpad's Ctrl+Shift+V or Ctrl+Shift+Ins) is fixing this.

The explanation from Akelpad author is available here (Google Translate; russian original)

*May depend on value of parameters 1252 and 1250 in registry hive
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
In russian Windows they often remapped to 1251 to solve cyrillic display issues.

To solve the issue, I suggest:
1) use Unicode clipboard data to insert a button using button bar's Paste context menu item;
2) use ANSI clipboard data to insert a button using Paste menu while holding Crtl+Shift buttons;
3) in other Commander's editable fields make Ctrl+V or Ctrl+Ins to insert Unicode clipboard data, and Ctrl+Shift+V or Ctrl+Shift+Ins to insert ANSI one (Akelpad's behaviour).
_________________
XP Pro SP3 rus 32 bit, Vista Home Premium SP2 rus 32 bit
TC #149847 Personal licence

Cuz we're all in this together, We're here to make it right
Back to top
View user's profile Send private message
MVV
Power Member
Power Member


Joined: 03 Aug 2008
Posts: 7700
Location: Russian Federation

PostPosted: Mon Mar 20, 2017 1:18 am    Post subject: Reply with quote

It is strange BTW why all is OK when I copy button from Firefox and paste it to TC with English keyboard layout - it means that Firefox copies data in Unicode format and TC uses it...
So perhaps it is a Chrome problem...

But if Notepad and AkelPad paste text properly...

I don't have Chrome to check.


Well, I've tried with Vivaldi and here some details:

Firefox copes text in following formats (checked with Nirsoft's InsideClipboard tool):
Quote:
CF_TEXT
CF_UNICODETEXT
HTML Format
text/html
some its own ones

Vivaldi (Chromium) copies following formats:
Quote:
CF_TEXT
CF_OEMTEXT
CF_UNICODETEXT
CF_LOCALE
HTML Format

I think that problem is caused by CF_OEMTEXT...
_________________
TCFS2 + TCFS2Tools: Full-screen mode for TC etc (forum)
TOTALCMD.NET: AskParam, CopyTree, NTLinks, Sudo, VirtualPanel
Back to top
View user's profile Send private message Send e-mail
ghisler(Author)
Site Admin
Site Admin


Joined: 04 Feb 2003
Posts: 33427
Location: Switzerland

PostPosted: Mon Mar 20, 2017 8:35 am    Post subject: Reply with quote

The problem is that TC doesn't currently support Unicode in the button bar "paste" function. It encodes characters from other codepage as UTF-8 with BOM. When you set the default locale to Western while copying Cyrillic, it will not work.
_________________
Author of Total Commander
http://www.ghisler.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
DrShark
Power Member
Power Member


Joined: 03 Nov 2006
Posts: 699
Location: Kyiv, 68/262

PostPosted: Mon Mar 20, 2017 10:24 am    Post subject: Reply with quote

MVV wrote:
I think that problem is caused by CF_OEMTEXT...
You are right. Maybe TC inserts OEM data instead of ANSI. I have some additional checking with a program which has an issue that described in second example of the first topic post and which with when Western keyboard layout active puts OEM data to clipboard this way (info from InsideClipboard):
Code:
00000000   54 4F 54 41 4C 43 4D 44 23 42 41 52 23 44 41 54    TOTALCMD#BAR#DAT
00000010   41 20 0D 0A 63 6D 64 20 2F 63 20 0D 0A 65 63 68    A ..cmd /c ..ech
00000020   6F 2E 3E 3E A8 AC EF 5F E4 A0 A9 AB A0 2E E0 A0    o.>>Ёп_д .а
00000030   E1 E8 A8 E0 A5 AD A8 A5 20 0D 0A 73 68 65 6C 6C    биЁаҐЁҐ ..shell
00000040   33 32 2E 64 6C 6C 20 0D 0A 81 EB E1 E2 E0 AE A5    32.dll ..ЃлбваҐ
00000050   20 E1 AE A7 A4 A0 AD A8 A5 20 E4 A0 A9 AB A0 20     б ЁҐ д  
00000060   0D 0A 0D 0A 31 20 0D 0A 2D 31 00 00 00             ....1 ..-1...   
Such button is pasted in TC with correct russian text while Notepad/Akelpad shows data after Ctrl+V like that:
Code:
TOTALCMD#BAR#DATA
cmd /c
echo.>>_.
shell32.dll


1
-1:
So it seems that correction that AkelPad makes with Ctrl+Shift+V is done by TC by default when pasting to the buttonbar. Or it is related to clipboard data format index (data from InsideClipboard):
Format ID| Name| Handle type| Memor| Index
Code:
1   CF_TEXT   Memory   109   1   
7   CF_OEMTEXT   Memory   109   3   
13   CF_UNICODETEXT   Memory   218   4   
16   CF_LOCALE   Memory   4   2   


On the other hand Chrome-based browsers put to clipboard OEM data text in this case (when Western keyboard layout is active) like that (info from InsideClipboard):
Code:
TOTALCMD#BAR#DATA
cmd /c
echo.>>???_?????.??????????
shell32.dll
??????? ???????? ?????

1
-1


Chrome's clipboard data formats (info from InsideClipboard):
Format ID| Name| Handle type| Memory| Index
Code:
1   CF_TEXT   Memory   107   4   
7   CF_OEMTEXT   Memory   107   5   
13   CF_UNICODETEXT   Memory   214   1   
16   CF_LOCALE   Memory   4   3   
49378   HTML Format   Memory   8688   2


2ghisler(Author)
In the light of above info maybe while pasting Unicode data in buttonbar is not implemented, you can check OEM data format and if it is Chome-like then use ANSI data (CF_TEXT) instead?
_________________
XP Pro SP3 rus 32 bit, Vista Home Premium SP2 rus 32 bit
TC #149847 Personal licence

Cuz we're all in this together, We're here to make it right
Back to top
View user's profile Send private message
MVV
Power Member
Power Member


Joined: 03 Aug 2008
Posts: 7700
Location: Russian Federation

PostPosted: Mon Mar 20, 2017 12:02 pm    Post subject: Reply with quote

I don't think that TC should use CF_OEMTEXT at all if CF_TEXT is available.
_________________
TCFS2 + TCFS2Tools: Full-screen mode for TC etc (forum)
TOTALCMD.NET: AskParam, CopyTree, NTLinks, Sudo, VirtualPanel
Back to top
View user's profile Send private message Send e-mail
milo1012
Power Member
Power Member


Joined: 02 Feb 2012
Posts: 1003

PostPosted: Mon Mar 20, 2017 2:17 pm    Post subject: Reply with quote

DrShark wrote:
MVV wrote:
I think that problem is caused by CF_OEMTEXT...
You are right. Maybe TC inserts OEM data instead of ANSI.

CF_OEMTEXT would only be a problem if TC voluntarily uses that format. But I don't think that this is the case if you have all the other ones available.
The clipboard works for text simpler than you think.
When copying to clipboard, most applications will only use one format (most likely Unicode/UTF-16 nowadays) and set
Code:
SetClipboardData(CF_UNICODETEXT, handle)

Then Windows will automatically generate
Code:
CF_TEXT
CF_OEMTEXT
CF_LOCALE

The same happens for e.g. a DIB (device-independent bitmap, i.e. a picture copied to clipboard):
You set
CF_DIB
to the clipboard, and Windows automatically generates
CF_BITMAP
CF_DIBV5

For opening/reading the clipboard, you'd use EnumClipboardFormats() to check for what is available.


So as Ghisler explained: The button bar only supports ANSI. If you voluntarily changed your locale to non-Russian, Windows can't map most of the characters to either an ANSI or OEM text and has to use replacement characters instead, which is the question mark by default. It doesn't matter if TC uses the pre-made CF_TEXT or CF_OEMTEXT (both already have the question marks) or recodes CF_UNICODETEXT to ANSI by itself: in all cases you have no correct mapping and see the question marks - simple as that.
So, no TC bug, but a simple limitation.


DrShark wrote:
and pasting from ANSI clipboard (mentioned Akelpad's Ctrl+Shift+V or Ctrl+Shift+Ins) is fixing this.
...
3) in other Commander's editable fields make Ctrl+V or Ctrl+Ins to insert Unicode clipboard data, and Ctrl+Shift+V or Ctrl+Shift+Ins to insert ANSI one (Akelpad's behaviour).

This is only able to work in Akelpad because you told it which codepage to use for that (Settings -> Codepage recognition/Default codepage). So for TC you'd need to do the same: tell it which codepage to use for such special paste from clipboard.
_________________
TC plugins: PCREsearch and RegXtract


Last edited by milo1012 on Mon Mar 20, 2017 3:08 pm; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail
DrShark
Power Member
Power Member


Joined: 03 Nov 2006
Posts: 699
Location: Kyiv, 68/262

PostPosted: Mon Mar 20, 2017 3:06 pm    Post subject: Reply with quote

milo1012 wrote:
When copying to clipboard, most applications will only use one format (most likely Unicode/UTF-16 nowadays)
...
Then Windows will automatically generate [others]
...
It doesn't matter if TC uses the pre-made CF_TEXT or CF_OEMTEXT (both already have the question marks)
You're correct and that's the case for Chrome. Other apps (like mentioned Firefox, or my IE 9 which also has correct CF_TEXT data), probably genearate each clipboard data format manually so they don't have this problem with question marks. So to solve the issue with apps like Chrome we'll have to wait for Unicode support in the button bar "paste" function.
_________________
XP Pro SP3 rus 32 bit, Vista Home Premium SP2 rus 32 bit
TC #149847 Personal licence

Cuz we're all in this together, We're here to make it right
Back to top
View user's profile Send private message
milo1012
Power Member
Power Member


Joined: 02 Feb 2012
Posts: 1003

PostPosted: Mon Mar 20, 2017 3:13 pm    Post subject: Reply with quote

DrShark wrote:
Other apps (like mentioned Firefox, or my IE 9 which also has correct CF_TEXT data), probably genearate each clipboard data format manually so they don't have this problem with question marks.

It doesn't matter if you generate them manually or automatically:
Every app which does not let you specify the specific codepage (i.e. locale) to generate non-Unicode text for Clipboard has this problem, and uses the system-set codepage instead.

But we're talking about the "sender" side, while the point of this thread is the "receiving" TC side. Like I said earlier, TC would need a (global) setting of which codepage to use for such alternative paste from clipboard, otherwise it would be useless.
_________________
TC plugins: PCREsearch and RegXtract
Back to top
View user's profile Send private message Send e-mail
MVV
Power Member
Power Member


Joined: 03 Aug 2008
Posts: 7700
Location: Russian Federation

PostPosted: Tue Mar 21, 2017 1:17 am    Post subject: Reply with quote

milo1012,
I have Firefox and Vivaldi (Chromium), I copy same text from mentioned link, and only the one copied from Chromium has CF_OEMTEXT - why Windows doesn't generate this one for Firefox?

And, I have Russian locale, but button from Firefox is pasted correctly while the one from Chromium is pasted with a bunch of ?s instead of Cyrillic.

BTW CF_TEXT data copied from Vivaldi already contains ?s (so TC has no chances if it uses only this format), while the same one copied with Russian keyboard layout contains correct letters. At the same time CF_TEXT data copied from Firefox is always in proper encoding regardless of keyboard layout... Does that mean that Firefox sets both CF_TEXT and CF_UNICODETEXT while Chromium - only CF_UNICODETEXT one?

I've also noticed that data from Chromium has CF_LOCALE format (Firefox doesn't set it), and it is 0x409 for English layout and 0x419 for Russian one...
_________________
TCFS2 + TCFS2Tools: Full-screen mode for TC etc (forum)
TOTALCMD.NET: AskParam, CopyTree, NTLinks, Sudo, VirtualPanel
Back to top
View user's profile Send private message Send e-mail
ghisler(Author)
Site Admin
Site Admin


Joined: 04 Feb 2003
Posts: 33427
Location: Switzerland

PostPosted: Tue Mar 21, 2017 2:20 am    Post subject: Reply with quote

To my knowledge, only CF_TEXT is generated from CF_UNICODETEXT and vice versa, not CF_OEMTEXT.
_________________
Author of Total Commander
http://www.ghisler.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
milo1012
Power Member
Power Member


Joined: 02 Feb 2012
Posts: 1003

PostPosted: Tue Mar 21, 2017 10:39 am    Post subject: Reply with quote

MVV wrote:
I have Firefox and Vivaldi (Chromium), I copy same text from mentioned link, and only the one copied from Chromium has CF_OEMTEXT - why Windows doesn't generate this one for Firefox?

Either because they removed it afterwards, or because they don't use CF_UNICODETEXT in the first place.
There are a lot of combinations and conversion formats
https://msdn.microsoft.com/en-us/library/windows/desktop/ms649013.aspx
MVV wrote:
And, I have Russian locale, but button from Firefox is pasted correctly while the one from Chromium is pasted with a bunch of ?s instead of Cyrillic.

First of all I think this combination is special. You somehow managed to make the system default codepage non-Cyrillic (that's why most programs produce the ?s) but on the other hand the correct cp is still available for lookup somehow.
Concerning Firefox, I guess it's because Firefox identifies the text encoding for the website at hand (View -> Text Encoding) and applies the correct conversion codepage (NLS) (if available on the system).
You could look at the Firefox/Gecko source for this, but this is actually not the topic, since the suggestion was to implement the special ANSI paste for TC, but for this TC needs to know the cp to use for this.
MVV wrote:
At the same time CF_TEXT data copied from Firefox is always in proper encoding regardless of keyboard layout...

But wouldn't TC, even if it would use the correct CF_TEXT data, still apply the wrong codepage and make it wrong again (the 1251 -> 1252 / тест -> example)?
It's really hard to emulate your system setting on a non-Russian Windows.
On my system (English, cp 1252) the 1251 page is available, but Firefox still does not use it for Cyrillic websites when copying text to clipboard.
MVV wrote:
Does that mean that Firefox sets both CF_TEXT and CF_UNICODETEXT while Chromium - only CF_UNICODETEXT one?

The latter is quite possible, for the first one I'm not sure (we'd need to look into the source).



ghisler(Author) wrote:
To my knowledge, only CF_TEXT is generated from CF_UNICODETEXT and vice versa, not CF_OEMTEXT.

CF_OEMTEXT and CF_LOCALE are always created on Windows 7 and above when using only CF_UNICODETEXT for a clipboard fill (prefixed EmptyClipboard() of course), I just checked it with my own code. Not sure about older OS and non-English Windows though.
_________________
TC plugins: PCREsearch and RegXtract
Back to top
View user's profile Send private message Send e-mail
DrShark
Power Member
Power Member


Joined: 03 Nov 2006
Posts: 699
Location: Kyiv, 68/262

PostPosted: Tue Mar 21, 2017 11:10 am    Post subject: Reply with quote

milo1012 wrote:
the suggestion was to implement the special ANSI paste for TC, but for this TC needs to know the cp to use for this
Actually this topic covers two different issues with different suggestions to solve them (I wasn't sure about it when decided to create a bugreport). After your and Christain's explanations the issues/suggestions are following:
1) for apps like Chrome that doesn't have own option to use codepage for non-unicode text (or it isn't properly set) and use CF_UNICODETEXT, the issue is only with TC's button bar which uses CF_ANSI (CF_OEM has nothing to do with it). The solution is to wait until button bar will be able paste CF_UNICODETEXT, if Christian will add that.
2) for old apps that doesn't use CF_UNICODETEXT to copy text data and for TC to paste it correctly in the places where it accepts CF_UNICODETEXT (buttonbar currently isn't one of them) add an option to paste as CF_ANSI (Akelpad's Ctrl+Shift+V (Ins)). Like you noted, this will also require for TC to have own option to use locale for non-unicode text (edit: like I noted above the button bar accepts CF_ANSI from such apps correctly, so I'm not sure whether TC needs any additional option to use locale for non-unicode text to solve this second issue). To test second issue I used Plaj On-Line 1.0 from 1999 (ru-to-ua and ua-to-ru offline translator), it is not freeware but AFAIR had a trial time to test it).
_________________
XP Pro SP3 rus 32 bit, Vista Home Premium SP2 rus 32 bit
TC #149847 Personal licence

Cuz we're all in this together, We're here to make it right
Back to top
View user's profile Send private message
MVV
Power Member
Power Member


Joined: 03 Aug 2008
Posts: 7700
Location: Russian Federation

PostPosted: Tue Mar 21, 2017 1:42 pm    Post subject: Reply with quote

milo1012 wrote:
You somehow managed to make the system default codepage non-Cyrillic (that's why most programs produce the ?s)

It is definitely not my case. Usually I see ?s only when I copy text from old non-Unicode programs when English keyboard layout is active (which is the default one). But system default codepage is 1251.
_________________
TCFS2 + TCFS2Tools: Full-screen mode for TC etc (forum)
TOTALCMD.NET: AskParam, CopyTree, NTLinks, Sudo, VirtualPanel
Back to top
View user's profile Send private message Send e-mail
milo1012
Power Member
Power Member


Joined: 02 Feb 2012
Posts: 1003

PostPosted: Tue Mar 21, 2017 5:20 pm    Post subject: Reply with quote

2MVV
I think that we need to clarify the terms:
1st, the system codepage (both for ANSI and OEM), which is either set due to the specific Windows version and installation/user creation (and probably the basic regional settings in control panel (1st tab) too), or by overriding it by the "Language for non-Unicode Programs" setting (last tab), which will either install or set to use a different NLS (national language support) file, regardless of the installation or the setting on the first tab.
I think this is called the "System" locale (or "User" locale, as each user can have it's own language pack setting on NT 6.0+ systems IIRC).
2nd, the so called "Input" locale (which results in it's own codepage for conversion), which is basically the keyboard layout.

Update: I just found the most fitting MSDN article, which explains it in even more detail:
https://msdn.microsoft.com/en-us/library/windows/desktop/dd319088.aspx


MVV wrote:
milo1012 wrote:
You somehow managed to make the system default codepage non-Cyrillic (that's why most programs produce the ?s)

It is definitely not my case.

Well, maybe not the system cp, but the locale is generated auomatically in clipboard:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms649013.aspx wrote:
The data is a handle to the locale identifier associated with text in the clipboard. When you close the clipboard, if it contains CF_TEXT data but no CF_LOCALE data, the system automatically sets the CF_LOCALE format to the current input language. You can use the CF_LOCALE format to associate a different locale with the clipboard text.

So I guess the following happens: for the automatic conversion from CF_UNICODETEXT, Windows seems to use the input locale which was set due to your kb layout at that time, and therefore some non-Cyrillic codepage. Result: question mark replacement.
Firefox seems to use the User locale or some automatic detection, regardless of the Input locale, for setting CF_TEXT manually (no automatic conversion). Result: correct text in CF_TEXT.

MVV wrote:
Usually I see ?s only when I copy text from old non-Unicode programs when English keyboard layout is active (which is the default one). But system default codepage is 1251.

Obviously, since the same automatic conversion happens (CF_LOCALE), but this time from non-Unicode to Unicode, i.e. from CF_TEXT to CF_UNICODETEXT.
_________________
TC plugins: PCREsearch and RegXtract
Back to top
View user's profile Send private message Send e-mail
MVV
Power Member
Power Member


Joined: 03 Aug 2008
Posts: 7700
Location: Russian Federation

PostPosted: Wed Mar 22, 2017 1:09 am    Post subject: Reply with quote

milo1012 wrote:
I think that we need to clarify the terms

That's correct, and I talk about two terms: system default codepage and keyboard layout.

Quote:
Well, maybe not the system cp, but the locale is generated auomatically in clipboard:

Yes, Windows converts ANSI text to Unicode according to current keyboard layout... But it doesn't describe why Firefox does copy ANSI text properly in case of English keyboard layout while Chromium doesn't (perhaps Firefox uses codepage information from web page, it is the only reasonable explanation that comes to my mind).
_________________
TCFS2 + TCFS2Tools: Full-screen mode for TC etc (forum)
TOTALCMD.NET: AskParam, CopyTree, NTLinks, Sudo, VirtualPanel
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic    Total Commander Forum Index -> TC9.0x bug reports (English) All times are GMT - 6 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Impressum: This site is maintained by Ghisler Software GmbH

Using phpBB © 2001-2005 phpBB Group