ghisler.com support email server: email titles and Cyrillic text corruption

English support forum

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

ghisler.com support email server: email titles and Cyrillic text corruption

Post by *DrShark »

It would be interesting to know if someone else encountered next issues when receiving Christian Ghisler replies from official support email at ghisler.com:
1. ghisler.com email server/client strips spaces from title of original email subject in reply Re: subject.

For example, the email sent to ghisler.com server has a subject:
Some pretty long subject of email
The subject of reply email from ghisler.com could be:
Re: Some pretty long subjectof email
I mainly use quite long descriptive subjects, so I'm not sure if it happens on short subjects. The space which is killed in the subject of reply email, it seems, could be in random place of the subject.

I mentioned this issue in my email from August 6, 2020, but Christian didn't comment it.

2. Although official languages for support email are all (English, French, German or Italian) use Latin characters,
sometimes there may be need to use a language with other characters, e.g Cyrillic, for example to quote error messages in language used by Windows or Total Commander.

So if an email sent to ghisler.com there is some Cyrillic text like this:

Code: Select all

---------------------------
Да   Нет
---------------------------
in the reply email from ghisler.com this Cyrillic text can look like this:

Code: Select all

> ---------------------------
> Да   Нет
> ---------------------------
From this it's at least possible to recover original Cyrillic text, but I have emails from the past
where Cyrillic characters are replaced with just question marks:

Code: Select all

> ---------------------------
> ??   ???
> ---------------------------
I'm sending emails in plain text (not HTML) as UTF-8:

Code: Select all

Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: base64
, and reply emails from ghisler.com also use plain text, but with iso-8859-1 encoding:

Code: Select all

Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Both issues don't happen with Christian Ghisler's gmail account (there the HTML is used in reply emails, and Cyrillic is shown fine in this case).
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: ghisler.com support email server: email titles and Cyrillic text corruption

Post by *ghisler(Author) »

I'm using an old mail program which I need to handle the many 100'000 messages I get per year, other programs are just not fast enough. This old program does not support Unicode.
Author of Total Commander
https://www.ghisler.com
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Re: ghisler.com support email server: email titles and Cyrillic text corruption

Post by *DrShark »

ghisler(Author) wrote: 2021-09-24, 19:31 UTCThis old program does not support Unicode.
Can you please take a look at the 1st issue with spaces deleted from email subjects? It doesn't look like this one is Unicode-related. If this mail program itself deletes spaces form original email subject when you use its Reply function, maybe it would be possible with some scripting to restore these spaces? This problem is not that critical, but can make it harder to find desired email when searching for keywords in their subjects.
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: ghisler.com support email server: email titles and Cyrillic text corruption

Post by *Usher »

2DrShark
It seems to be a well known problem with multiline subjects. Standard declares that long header lines may be splitted on white spaces and lines of continuation should start with white space. See below examples from Google Groups webmailer and Microsoft Outlook Express, subjects quoted from the source of messages exactly as they were sent:

Code: Select all

Subject: =?UTF-8?Q?Re=3A_Quiz_75=2F2021_=2D_Kir_Bu=C5=82yczow_=22Trzeba_pom=C3=B3c=22_z_=22?=
 =?UTF-8?Q?Ludzie_jak_ludzie=22?=

Subject: =?utf-8?Q?Re:_Quiz_75/2021_-_Kir_Bu=C5=82yczow_?=
	=?utf-8?Q?=22Trzeba_pom=C3=B3c=22_z_=22Ludzie_jak_lud?=
	=?utf-8?Q?zie=22?=
It's the same subject in two versions, you should see it in your mailer as:

Code: Select all

Re: Quiz 75/2021 - Kir Bułyczow "Trzeba pomóc" z "Ludzie jak ludzie"
Note that both versions doesn't fully conform to the standard:
1. In the first version quoting character " (0x22) is separated from the word which follows it.
2. In the first version the program uses Linux EOL (LF, 0x0a) to split lines while the standard allows only CRLF (0x0d0a) as EOL.
3. In the second version the last word "ludzie" is splitted to "lud" "zie" though it shouldn't be.
4. In the second version the program used Tab (0x09) character as white space which is deprecated by the standard.

Other programs may have problems with such ambiguous splitting when decoding text. Some of them trim all ending and starting white spaces, which gives results:

Code: Select all

Re: Quiz 75/2021 - Kir Bułyczow "Trzeba pomóc" z "Ludzie jak ludzie"
Re: Quiz 75/2021 - Kir Bułyczow"Trzeba pomóc" z "Ludzie jak ludzie"
Some other programs always insert space between line and its continuation, which gives another results:

Code: Select all

Re: Quiz 75/2021 - Kir Bułyczow "Trzeba pomóc" z " Ludzie jak ludzie"
Re: Quiz 75/2021 - Kir Bułyczow "Trzeba pomóc" z "Ludzie jak lud zie"
As you can see, it's not easy to fight with all implementation bugs. Could you name your mailer and show the subject quoted from the source of your message?
Andrzej P. Wozniak
Polish subforum moderator
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Re: ghisler.com support email server: email titles and Cyrillic text corruption

Post by *DrShark »

Usher wrote: 2021-09-25, 22:24 UTCIt seems to be a well known problem with multiline subjects. Standard declares that long header lines may be splitted on white spaces and lines of continuation should start with white space. [...] Could you name your mailer and show the subject quoted from the source of your message?
I use web interface from one of Ukrainian public email providers.
Example of message subjects:
1. Message sent to ghisler.com,
its look in Web mail UI:

Code: Select all

Re[2]: Re[3]: Re[2]: Re[2]: Re[2]: TC4A: extsd and file operations issues
and its look in EML source (this web mail provider allows to download RAW email):

Code: Select all

Subject: Re[2]: Re[3]: Re[2]: Re[2]: Re[2]: TC4A: extsd and file operations 
 issues
2. Message received from ghisler.com in reply to above one,
look in Web mail UI:

Code: Select all

Re: Re[2]: Re[3]: Re[2]: Re[2]: Re[2]: TC4A: extsd and file operationsissues
and its look in EML source:

Code: Select all

Subject: Re: Re[2]: Re[3]: Re[2]: Re[2]: Re[2]: TC4A: extsd and file 
 operationsissues
So the line break in subject comes from my email provider. Is ghisler.com server correct when converting this linebreak by killing both the line break, the space, and joins the words around them?
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: ghisler.com support email server: email titles and Cyrillic text corruption

Post by *Usher »

2DrShark
Well, there are more problems here.

1. There should be no "Re[n]: " prefixes in the subject when replying - it's neither allowed by the standard nor supported by most mailers and webmail scripts. The only allowed prefix js just Latin "Re: ". I know that The Bat! uses such numbered prefixes by default and I suspect that your provider follows those ancient rules. You should change your webmail configuration, manually edit subjects broken by webmail or use better mailer (email client).

2. It's not a problem with ghisler.com server, @ghisler have explained that he uses some ancient email client (so migrating to another solution may take a long time). And it's not enough to just copy/paste plain text from a viewer to know what exactly is wrong. Web services in most cases convert EOLs and white spaces on the fly, so you should use some email client to download and save the message as a separate *.eml text file. Then you should use binary/hex view to determine what exactly characters are used as EOL and white spaces in the line continuation.

As you see, it won't fix the problems, we will just better know errors. In general, such errors are unpredictable and the only way to fix them is manual subject editing. However, you shouldn't rely on subject if you want to keep messages in a thread, there are other standard header fields dedicated for this purpose - References and In-Reply-To.
Andrzej P. Wozniak
Polish subforum moderator
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Re: ghisler.com support email server: email titles and Cyrillic text corruption

Post by *DrShark »

Usher wrote: 2021-09-27, 14:46 UTCit's not enough to just copy/paste plain text from a viewer to know what exactly is wrong. Web services in most cases convert EOLs and white spaces on the fly, so you should use some email client to download and save the message as a separate *.eml text file. Then you should use binary/hex view to determine what exactly characters are used as EOL and white spaces in the line continuation.
I found a program which is able to download raw *.eml from mail server using IMAP access to both Incoming and Sent folders. It downloads the same *.eml files as saved my email provider's web-client.
So, the email thread where the words are joined looks like this:

My sent original email
(here and later "Original subject with many words" = subject with 66 characters):

client subject view:

Code: Select all

Original subject with many words
raw subject view:

Code: Select all

Original subject with many words
Incoming, ghisler.com reply email 1:

client subject view:

Code: Select all

Re: Original subject with many words
raw subject view:

Code: Select all

Re: Original subject with many 
 words
My sent reply email:

client subject view:

Code: Select all

Re[2]: Original subject with many words
raw subject view:

Code: Select all

Re[2]: Original subject with many 
 words
Incoming, ghisler.com reply email 2:

client subject view:

Code: Select all

Re: Re[2]: Original subject with manywords
raw:

Code: Select all

Re: Re[2]: Original subject with 
 manywords
The hex view of characters between the words in different lines in the raw view are everywhere the same:

Code: Select all

20 0D 0A 20
Usher wrote: 2021-09-27, 14:46 UTC2. It's not a problem with ghisler.com server, @ghisler have explained that he uses some ancient email client (so migrating to another solution may take a long time).
In a raw view of email received from ghisler.com I see the mention of 2 email clients (though only 1 of them deals with email subject). Anyway, by "ghisler.com server" I meant in general the side from which I receive emails.
Usher wrote: 2021-09-27, 14:46 UTC1. There should be no "Re[n]: " prefixes in the subject when replying - it's neither allowed by the standard nor supported by most mailers and webmail scripts. The only allowed prefix js just Latin "Re: ". I know that The Bat! uses such numbered prefixes by default and I suspect that your provider follows those ancient rules. You should change your webmail configuration, manually edit subjects broken by webmail or use better mailer (email client).
I didn't know that "Re[n]:" in a subject is not allowed. Actually I put there numbers myself in emails which I send, so email thread would look like:
Incoming:

Code: Select all

Re: original subject
Sent:

Code: Select all

Re: original subject
Incoming:

Code: Select all

Re: Re: original subject
Sent:

Code: Select all

Re[2]: original subject
Incoming:

Code: Select all

Re: Re[2]: original subject
Sent:

Code: Select all

Re[3]: original subject
etc., so it would be possible to distinguish emails by subject somehow.

If I won't change incoming email's subject in reply and will allow receiver's or sender's side to add Re: to each email
in the email thread, I'll quickly get dozens of emails with subjects like:

Code: Select all

Re: Re: Re: Re: Re: <... count of Re: depends on number of my replies> original subject
which makes it hard to even read the subjects.
The approach used by Gmail, where all the emails in a thread use just a single Re: but without numbers is not much better, because it's also not easy to do some operations with certain emails from the thread where all of them look like:
Incoming:

Code: Select all

Re: original subject
Sent:

Code: Select all

Re: original subject
Incoming:

Code: Select all

Re: original subject
Sent:

Code: Select all

Re: original subject
Would it be OK to insert the number in a different part of email subject, e.g.:

Code: Select all

Re:[2] original subject
?
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
Post Reply