"\s" does not match non-breaking space characters

English support forum

Moderators: Hacker, petermad, Stefan2, white

Post Reply
Supa
Junior Member
Junior Member
Posts: 4
Joined: 2021-08-29, 16:54 UTC

"\s" does not match non-breaking space characters

Post by *Supa »

In Find Files, when the Find text contains a regular expression with "\s" text with non-breakable space (A0) is NOT matched.

This makes searching for a string in Word documents more challenging.

Same result in Lister. "\s" does NOT match non-breakable space when specified in the Find dialog.
User avatar
Dalai
Power Member
Power Member
Posts: 9945
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: "\s" does not match non-breakable space characters

Post by *Dalai »

You can use this:

Code: Select all

\x00A0
to find such characters.
TC help, section 3.n, Regular expressions wrote:Escape sequences:
[...]
\x{nnnn}
Unicode character with hexadecimal code nnnn. Note that Total Commander uses Unicode for file names, so you need to use this notation for characters not in your local codepage.
BTW, 0A is a line feed. A non-breaking space is 00A0.
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Supa
Junior Member
Junior Member
Posts: 4
Joined: 2021-08-29, 16:54 UTC

Re: "\s" does not match non-breakable space characters

Post by *Supa »

Dalai wrote: 2025-02-19, 14:55 UTC You can use this:

Code: Select all

\x00A0
to find such characters.
The point is that "\s", which stands for whitespace, does not match non-breaking spaces. Looks like a bug.
Last edited by Supa on 2025-02-19, 15:28 UTC, edited 1 time in total.
Supa
Junior Member
Junior Member
Posts: 4
Joined: 2021-08-29, 16:54 UTC

Re: "\s" does not match non-breaking space characters

Post by *Supa »

I have found out that "\h" does match non-breaking space. So I will simply use that one.

Still, it would be nice to make "\s" match it as as well.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50390
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: "\s" does not match non-breaking space characters

Post by *ghisler(Author) »

Different regex handlers use different definitions of white space. I have checked the sources of the one I use. It matches:
\s : Space (20), Tab (09), CR (0D), LF (0A), Form Feed (0C)
\h : Space (20), Tab (09), non-breaking space (A0) and if Unicode: 1680, 2000 .. 200A, 202F, 205F, 3000

Microsoft uses a different set for /s:
https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-classes-in-regular-expressions#WhitespaceCharacter
I can't find A0 mentioned there.

GNU Grep seems to use \s as a synonym for [[:space:]]
https://www.gnu.org/software/grep/manual/html_node/Special-Backslash-Expressions.html
Which is defined here:
https://www.gnu.org/software/grep/manual/html_node/Character-Classes-and-Bracket-Expressions.html
‘[:space:]’
Space characters: in the ‘C’ locale, this is tab, newline, vertical tab, form feed, carriage return, and space.
Author of Total Commander
https://www.ghisler.com
User avatar
Dalai
Power Member
Power Member
Posts: 9945
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: "\s" does not match non-breaking space characters

Post by *Dalai »

ghisler(Author) wrote: 2025-02-20, 09:05 UTCI have checked the sources of the one I use. It matches:
\s : Space (20), Tab (09), CR (0D), LF (0A), Form Feed (0C)
\h : Space (20), Tab (09), non-breaking space (A0) and if Unicode: 1680, 2000 .. 200A, 202F, 205F, 3000
Is it worth adding this information to the TC help? Currently \h isn't mentioned at all.
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50390
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: "\s" does not match non-breaking space characters

Post by *ghisler(Author) »

Yes, I think it would be a good idea to add them.
Author of Total Commander
https://www.ghisler.com
User avatar
white
Power Member
Power Member
Posts: 5747
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Re: "\s" does not match non-breaking space characters

Post by *white »

Dalai wrote: 2025-02-20, 10:50 UTC
ghisler(Author) wrote: 2025-02-20, 09:05 UTCI have checked the sources of the one I use. It matches:
\s : Space (20), Tab (09), CR (0D), LF (0A), Form Feed (0C)
\h : Space (20), Tab (09), non-breaking space (A0) and if Unicode: 1680, 2000 .. 200A, 202F, 205F, 3000
Is it worth adding this information to the TC help? Currently \h isn't mentioned at all.
Just to provide additional information: TC's help does mention the website https://regex.sorokin.engineer and it is described there.
User avatar
Dalai
Power Member
Power Member
Posts: 9945
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: "\s" does not match non-breaking space characters

Post by *Dalai »

2white
Ah, thanks for pointing that out! I've missed that.
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
white
Power Member
Power Member
Posts: 5747
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Re: "\s" does not match non-breaking space characters

Post by *white »

Dalai wrote: 2025-02-21, 14:25 UTC Ah, thanks for pointing that out! I've missed that.
Many people do. Perhaps it is wise to move the text:
Total Commander uses the free Delphi library TRegExpr by Andrey V. Sorokin, which is now available at https://regex.sorokin.engineer.
Some of the above explanations are from the help file for this library.
from the bottom to the top, just below the first paragraph. (and change "above explanations" to "explanations below")
Post Reply