Well, it doesn't detect any file encoding and it doesn't claim to do that. Hence the field is called "BOM Type" and not "Encoding" or similar.AntonyD wrote: 2023-11-17, 14:50 UTCThose. OR the plugin correctly processes all test files, including information on encodings and markers.
In my opinion it does exactly that. It's required to check for a UTF-16 BOM to know how to correctly interpret the following stream of bytes. If it didn't check for a BOM, UTF-16 files would be detected as binary, even they have a BOM. When the plugin knows a file has a BOM, why not return that information in a field?For example, ONLY for line breaks. And it doesn’t even try to determine anything related to encodings.
Correct. There is no way to tell such files apart from binary files without more sophisticated detection algorithms. That's the thing with "checking if a file is text or binary". You have to be told how to interpret the data, which is exactly what a BOM does. If there is no BOM, there's no way of knowing. And I'm not going to implement statistical analysis or something. BTW, UTF-8 doesn't have this problem because there's only a single endianness (LE).UTF-16BE-noBOM; UTF-16LE-noBOM
these 2 TEXT files were detected as binary.
I have an idea how to add UTF-32 support, but I won't promise anything just yet.
Regards
Dalai