LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: white, Hacker, petermad, Stefan2

User avatar
Dalai
Power Member
Power Member
Posts: 9393
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Dalai »

AntonyD wrote: 2023-11-17, 14:50 UTCThose. OR the plugin correctly processes all test files, including information on encodings and markers.
Well, it doesn't detect any file encoding and it doesn't claim to do that. Hence the field is called "BOM Type" and not "Encoding" or similar.
For example, ONLY for line breaks. And it doesn’t even try to determine anything related to encodings.
In my opinion it does exactly that. It's required to check for a UTF-16 BOM to know how to correctly interpret the following stream of bytes. If it didn't check for a BOM, UTF-16 files would be detected as binary, even they have a BOM. When the plugin knows a file has a BOM, why not return that information in a field?
UTF-16BE-noBOM; UTF-16LE-noBOM
these 2 TEXT files were detected as binary.
Correct. There is no way to tell such files apart from binary files without more sophisticated detection algorithms. That's the thing with "checking if a file is text or binary". You have to be told how to interpret the data, which is exactly what a BOM does. If there is no BOM, there's no way of knowing. And I'm not going to implement statistical analysis or something. BTW, UTF-8 doesn't have this problem because there's only a single endianness (LE).

I have an idea how to add UTF-32 support, but I won't promise anything just yet.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Fla$her
Power Member
Power Member
Posts: 2318
Joined: 2020-01-18, 04:03 UTC

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Fla$her »

Dalai wrote: 2023-11-16, 22:40 UTC We'll see if I expand the BOM detection in the future. UTF-16 detection was important because of the difference in bytes per line-break character compared to ANSI/UTF-8.
And why is there a line break if the quote was about BOM? The marker is located simply and quickly. So I ask, why not look for all possible BOMs, regardless of the rarity of the encodings given in my link?
Overquoting is evil! 👎
User avatar
Dalai
Power Member
Power Member
Posts: 9393
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Dalai »

Though I'm aware it's not an easy topic, I'm wondering if I'm expressing myself poorly.

Let's assume that my plugin would check for the existence of UTF-7 and UTF-1 BOMs. Now what about the line break count of such files? Don't you think that users are right to assume the line break count to be correct? I can't count any line breaks if I don't know how to interpret a stream of bytes, it's that simple. It's true that detecting a BOM is fast and simple. It's also the prerequisite to counting line breaks, I think you'll agree on that. But detecting a BOM without a correct line break count would be ... silly, or even stupid IMO.

And to add to that, I need to be able to verify anything I implement. Currently I can't even do that for UTF-32 because I don't have a freeware (or otherwise non-paid) editor that can deal with such an encoding. I'm going to get one, but that takes time.

Regards
Dalai
Last edited by Dalai on 2023-11-17, 17:37 UTC, edited 1 time in total.
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
Horst.Epp
Power Member
Power Member
Posts: 6498
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Horst.Epp »

Dalai wrote: 2023-11-17, 17:26 UTC And to add to that, I need to be able to verify anything I implement. Currently I can't even do that for UTF-32 because I don't have a freeware (or otherwise non-paid) editor that can deal with such an encoding. I'm going to get one, but that takes time.
AkelPad supports UTF32 LE and BE

I like it as an Editor, and there is even a good Lister plugin for it.
Windows 11 Home x64 Version 23H2 (OS Build 22631.3527)
TC 11.03 x64 / x86
Everything 1.5.0.1373a (x64), Everything Toolbar 1.3.3, Listary Pro 6.3.0.73
QAP 11.6.3.2 x64
Fla$her
Power Member
Power Member
Posts: 2318
Joined: 2020-01-18, 04:03 UTC

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Fla$her »

Dalai wrote: 2023-11-17, 17:26 UTCDon't you think that users are right to assume the line break count to be correct?
Where you can't use a counter, you can return 'Undefined'.
BOM is a separate option that has nothing to do with line breaks.
Dalai wrote: 2023-11-17, 17:26 UTCIt's also the prerequisite to counting line breaks, I think you'll agree on that.
Didn't quite understand what is a prerequisite?
Dalai wrote: 2023-11-17, 17:26 UTCBut detecting a BOM without a correct line break count would be ... silly, or even stupid IMO.
I completely disagree. Field data can coexist within the plugin, without obliging the user to combine them in columns or tooltips.
Overquoting is evil! 👎
User avatar
Dalai
Power Member
Power Member
Posts: 9393
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Dalai »

Fla$her wrote: 2023-11-17, 22:13 UTCWhere you can't use a counter, you can return 'Undefined'.
If I did that, the first question that would pop up here would be "Why does it return 'Undefined' for file X?". That question would be justified, but it can be avoided which I intend to do.
BOM is a separate option that has nothing to do with line breaks.
Well, it does in the code of my plugin. And I don't want to add additional code just to handle an exotic edge case.
Didn't quite understand what is a prerequisite?
Possible synonyms: requirement, precondition.
I completely disagree. Field data can coexist within the plugin, without obliging the user to combine them in columns or tooltips.
How a user uses the data provided by my plugin is not my concern. But I intend to provide complete data for all fields it does return. I either fully support file types with a BOM and line break count, or I don't support it at all.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Fla$her
Power Member
Power Member
Posts: 2318
Joined: 2020-01-18, 04:03 UTC

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Fla$her »

Dalai wrote: 2023-11-18, 00:06 UTCThat question would be justified, but it can be avoided which I intend to do.
The truth is that you have not avoided my question. No one will get better from rearranging the questions in places.
Dalai wrote: 2023-11-18, 00:06 UTCWell, it does in the code of my plugin.
It's not about the code, it's about how you try to relate one to the other.
Dalai wrote: 2023-11-18, 00:06 UTCPossible synonyms: requirement, precondition.
I didn't ask about the meaning of the word, but about what it indicates in the preceding text.

By the way, gvim also works with UTF-32. It's free.
Overquoting is evil! 👎
User avatar
Dalai
Power Member
Power Member
Posts: 9393
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Dalai »

Fla$her wrote: 2023-11-18, 01:24 UTCThe truth is that you have not avoided my question. No one will get better from rearranging the questions in places.
What are you even talking about?
It's not about the code, it's about how you try to relate one to the other.
And why would it be wrong to relate BOMs to line breaks when both of them are returned for each file? Users can (and probably will) expect the values to be correct - and rightfully so. After all, the plugin isn't called BOMinfo or something but LineBreakInfo.
I didn't ask about the meaning of the word, but about what it indicates in the preceding text.
Well, read the two sentences again:
Dalai wrote:It's true that detecting a BOM is fast and simple. It's also the prerequisite to counting line breaks, I think you'll agree on that.
I don't know how to phrase that any simpler.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
Dalai
Power Member
Power Member
Posts: 9393
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Dalai »

Horst.Epp wrote: 2023-11-17, 17:31 UTCAkelPad supports UTF32 LE and BE
Fla$her wrote: 2023-11-18, 01:24 UTCBy the way, gvim also works with UTF-32. It's free.
Thank you both. I'm going with EditPad Lite for now. I did use many years ago before I switched to Notepad++.



I've implemented UTF-32 support. Currently I'm testing it extensively, and it looks good so far.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Fla$her
Power Member
Power Member
Posts: 2318
Joined: 2020-01-18, 04:03 UTC

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Fla$her »

Dalai wrote: 2023-11-18, 11:36 UTCWhat are you even talking about?
That the argument from the quote is unconvincing.
Dalai wrote: 2023-11-18, 11:36 UTCUsers can (and probably will) expect the values to be correct - and rightfully so.
The values for BOM of unsupported encodings are now incorrect, which users do not expect.
Dalai wrote: 2023-11-18, 11:36 UTCAfter all, the plugin isn't called BOMinfo or something but LineBreakInfo.
A convenient argument, not to dispute.
Dalai wrote: 2023-11-18, 11:36 UTCI don't know how to phrase that any simpler.
I'm not asking for a reformulation. I asked you to answer the question — what exactly is a condition? Speed and simplicity? Or what?
Dalai wrote: 2023-11-18, 13:45 UTC I've implemented UTF-32 support. Currently I'm testing it extensively, and it looks good so far.
+1 good news.
Overquoting is evil! 👎
User avatar
Dalai
Power Member
Power Member
Posts: 9393
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Dalai »

Fla$her wrote: 2023-11-18, 19:48 UTCThe values for BOM of unsupported encodings are now incorrect, which users do not expect.
The plugin never claimed to support any BOM types other than the ones documented - right in the plugin (it's a multiple choice field) and in the readme. What gave you any other impression? Just the fact that it shows "None" for e.g. UTF-7?
I'm not asking for a reformulation. I asked you to answer the question — what exactly is a condition? Speed and simplicity? Or what?
I don't see what these questions have to do with the original statement that "detecting BOMs is a prerequisite for counting line breaks".

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Fla$her
Power Member
Power Member
Posts: 2318
Joined: 2020-01-18, 04:03 UTC

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Fla$her »

Dalai wrote: 2023-11-18, 21:35 UTC Just the fact that it shows "None" for e.g. UTF-7?
Exactly.
Dalai wrote: 2023-11-18, 21:35 UTC I don't see what these questions have to do with the original statement that "detecting BOMs is a prerequisite for counting line breaks".
This is not an original statement. Here is the original:
Fla$her wrote: 2023-11-17, 22:13 UTC
Dalai wrote: 2023-11-17, 17:26 UTCIt's also the prerequisite to counting line breaks, I think you'll agree on that.
Didn't quite understand what is a prerequisite?
You didn't specify that it's about BOM Detection. It's not clear from the context. Finally, the answer is received.
Overquoting is evil! 👎
User avatar
petermad
Power Member
Power Member
Posts: 14809
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *petermad »

No Here is the original statement:
Dalai wrote: 2023-11-17, 17:26 UTC It's true that detecting a BOM is fast and simple. It's also the prerequisite to counting line breaks
the second "it's" is referring to the previous sentence, explaining why BOM is detected.

So I read it as: It is true that detecting BOM is fast and simple AND it is the prerequisite to counting line breaks.


Under-quoting is deceiving
! 👎
License #524 (1994)
Danish Total Commander Translator
TC 11.03 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1371a
TC 3.50 on Android 6 & 13
Try: TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
Dalai
Power Member
Power Member
Posts: 9393
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Dalai »

Fla$her wrote: 2023-11-18, 21:51 UTC
Dalai wrote: 2023-11-18, 21:35 UTC Just the fact that it shows "None" for e.g. UTF-7?
Exactly.
I can't support every exotic, obsolete or niche BOM type out there. If a new BOM pops up next month, the field will also show "None", so there always will be cases where the output is wrong. Maybe I should explain somewhere that "None" means "None supported by the plugin" or "None that is known to the plugin". I could change "None" to "None/Unknown" but that would look kind of silly, especially for ANSI and binary files. IMO common sense should tell people that not every piece of software supports every aspect of something. Well, whatever I do, it's wrong for some people...
petermad wrote: 2023-11-19, 11:06 UTCSo I read it as: It is true that detecting BOM is fast and simple AND it is the prerequisite to counting line breaks.
Yes, that's what I meant.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *AntonyD »

2Dalai
I promised not to interfere more, but in connection with the fact that you manage to implement UTF-32 support - Well, I got to ask - will you manage to separate UTF16 LE and UTF-32 LE recognition? Considering that piece of C++ code I suggested you try?
#146217 personal license
Post Reply