[OT] How to check if a file is ASCII or UNICODE

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: Hacker, petermad, Stefan2, white

Post Reply
User avatar
tbeu
Power Member
Power Member
Posts: 1354
Joined: 2003-07-04, 07:52 UTC
Location: Germany
Contact:

[OT] How to check if a file is ASCII or UNICODE

Post by *tbeu »

I need to check if a file is plain ASCII or UNICODE encoded. There is the WIN32 API function IsTextUnicode which I believe is only working if _UNICODE is defined. But how can I check if _UNICODE is not defined?

Thanks,
tbeu
TC plugins: Autodesk 3ds Max / Inventor / Revit Preview, FileInDir, ImageMetaData (JPG Comment/EXIF/IPTC/XMP), MATLAB MAT-file Viewer, Mover, SetFolderDate, Solid Edge Preview, Zip2Zero and more
User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

Post by *Alextp »

2tbeu
I think it should work in all cases. Where did you see requirement of "_UNICODE"?
User avatar
Lefteous
Power Member
Power Member
Posts: 9536
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

2tbeu
I need to check if a file is plain ASCII or UNICODE encoded.
In this case I would just check the BOM:
http://unicode.org/unicode/faq/utf_bom.html#22
User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

Post by *Alextp »

2Lefteous
BOM may be absent in text. See desc of IsUnicodeText: it performs ont only BOM check but also some adv tests.
User avatar
tbeu
Power Member
Power Member
Posts: 1354
Joined: 2003-07-04, 07:52 UTC
Location: Germany
Contact:

Post by *tbeu »

Alextp wrote:Where did you see requirement of "_UNICODE"?
I got it from here. However, it is said that unicode must be defined, not _UNICODE.
TC plugins: Autodesk 3ds Max / Inventor / Revit Preview, FileInDir, ImageMetaData (JPG Comment/EXIF/IPTC/XMP), MATLAB MAT-file Viewer, Mover, SetFolderDate, Solid Edge Preview, Zip2Zero and more
User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

Post by *Alextp »

It must be defined for MFC app to be Unicode app. The func IsTextUnicode doesn't depend on this; just give it a stream of bytes and it works.
User avatar
tbeu
Power Member
Power Member
Posts: 1354
Joined: 2003-07-04, 07:52 UTC
Location: Germany
Contact:

Post by *tbeu »

Alex, maybe you can again help me. In fact, I want to rebuild TC's check of text and binary files. ASAIK, TC checks the first 8kiB of a file for characters 0x00 to 0x05. But this way, UNICODE files are detected as text whereas I need to consider them as binary data.
TC plugins: Autodesk 3ds Max / Inventor / Revit Preview, FileInDir, ImageMetaData (JPG Comment/EXIF/IPTC/XMP), MATLAB MAT-file Viewer, Mover, SetFolderDate, Solid Edge Preview, Zip2Zero and more
User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

Post by *Alextp »

But this way, UNICODE files are detected as text whereas I need to consider them as binary data.
You mean they are detected as binary but you need them as text?

You can call IsTextUnicode to determine Unicode. When result is false you can use check for bytes 0x00 to 0x05...
Is this not enought.
User avatar
tbeu
Power Member
Power Member
Posts: 1354
Joined: 2003-07-04, 07:52 UTC
Location: Germany
Contact:

Post by *tbeu »

Alextp wrote:You mean they are detected as binary but you need them as text?
No, the other way round. I need them as binary data, not as text. Checking for 0x00 to 0x05 always fails, when I open/read the file by fopen/fread/fgetc or CreateFile/ReadFile. Such, UNICODE files are always detected as text which is not what I want.
TC plugins: Autodesk 3ds Max / Inventor / Revit Preview, FileInDir, ImageMetaData (JPG Comment/EXIF/IPTC/XMP), MATLAB MAT-file Viewer, Mover, SetFolderDate, Solid Edge Preview, Zip2Zero and more
User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

Post by *Alextp »

Checking for 0x00 to 0x05 always fails, when I open/read the file by fopen/fread/fgetc or CreateFile/ReadFile.
It must not; I also use CreateFile/ReadFile and check is ok. You can see source of my Viewer component: functions IsFileUnicode/IsFileText perform this check ok.
Post Reply