"\s" does not match non-breaking space characters
Moderators: Hacker, petermad, Stefan2, white
"\s" does not match non-breaking space characters
In Find Files, when the Find text contains a regular expression with "\s" text with non-breakable space (A0) is NOT matched.
This makes searching for a string in Word documents more challenging.
Same result in Lister. "\s" does NOT match non-breakable space when specified in the Find dialog.
This makes searching for a string in Word documents more challenging.
Same result in Lister. "\s" does NOT match non-breakable space when specified in the Find dialog.
Re: "\s" does not match non-breakable space characters
You can use this:to find such characters.
Code: Select all
\x00A0
BTW, 0A is a line feed. A non-breaking space is 00A0.TC help, section 3.n, Regular expressions wrote:Escape sequences:
[...]
\x{nnnn}
Unicode character with hexadecimal code nnnn. Note that Total Commander uses Unicode for file names, so you need to use this notation for characters not in your local codepage.
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Re: "\s" does not match non-breakable space characters
The point is that "\s", which stands for whitespace, does not match non-breaking spaces. Looks like a bug.
Last edited by Supa on 2025-02-19, 15:28 UTC, edited 1 time in total.
Re: "\s" does not match non-breaking space characters
I have found out that "\h" does match non-breaking space. So I will simply use that one.
Still, it would be nice to make "\s" match it as as well.
Still, it would be nice to make "\s" match it as as well.
- ghisler(Author)
- Site Admin
- Posts: 50390
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: "\s" does not match non-breaking space characters
Different regex handlers use different definitions of white space. I have checked the sources of the one I use. It matches:
\s : Space (20), Tab (09), CR (0D), LF (0A), Form Feed (0C)
\h : Space (20), Tab (09), non-breaking space (A0) and if Unicode: 1680, 2000 .. 200A, 202F, 205F, 3000
Microsoft uses a different set for /s:
https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-classes-in-regular-expressions#WhitespaceCharacter
I can't find A0 mentioned there.
GNU Grep seems to use \s as a synonym for [[:space:]]
https://www.gnu.org/software/grep/manual/html_node/Special-Backslash-Expressions.html
Which is defined here:
https://www.gnu.org/software/grep/manual/html_node/Character-Classes-and-Bracket-Expressions.html
‘[:space:]’
Space characters: in the ‘C’ locale, this is tab, newline, vertical tab, form feed, carriage return, and space.
\s : Space (20), Tab (09), CR (0D), LF (0A), Form Feed (0C)
\h : Space (20), Tab (09), non-breaking space (A0) and if Unicode: 1680, 2000 .. 200A, 202F, 205F, 3000
Microsoft uses a different set for /s:
https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-classes-in-regular-expressions#WhitespaceCharacter
I can't find A0 mentioned there.
GNU Grep seems to use \s as a synonym for [[:space:]]
https://www.gnu.org/software/grep/manual/html_node/Special-Backslash-Expressions.html
Which is defined here:
https://www.gnu.org/software/grep/manual/html_node/Character-Classes-and-Bracket-Expressions.html
‘[:space:]’
Space characters: in the ‘C’ locale, this is tab, newline, vertical tab, form feed, carriage return, and space.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: "\s" does not match non-breaking space characters
Is it worth adding this information to the TC help? Currently \h isn't mentioned at all.ghisler(Author) wrote: 2025-02-20, 09:05 UTCI have checked the sources of the one I use. It matches:
\s : Space (20), Tab (09), CR (0D), LF (0A), Form Feed (0C)
\h : Space (20), Tab (09), non-breaking space (A0) and if Unicode: 1680, 2000 .. 200A, 202F, 205F, 3000
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
- ghisler(Author)
- Site Admin
- Posts: 50390
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: "\s" does not match non-breaking space characters
Yes, I think it would be a good idea to add them.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: "\s" does not match non-breaking space characters
Just to provide additional information: TC's help does mention the website https://regex.sorokin.engineer and it is described there.Dalai wrote: 2025-02-20, 10:50 UTCIs it worth adding this information to the TC help? Currently \h isn't mentioned at all.ghisler(Author) wrote: 2025-02-20, 09:05 UTCI have checked the sources of the one I use. It matches:
\s : Space (20), Tab (09), CR (0D), LF (0A), Form Feed (0C)
\h : Space (20), Tab (09), non-breaking space (A0) and if Unicode: 1680, 2000 .. 200A, 202F, 205F, 3000
Re: "\s" does not match non-breaking space characters
2white
Ah, thanks for pointing that out! I've missed that.
Ah, thanks for pointing that out! I've missed that.
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Re: "\s" does not match non-breaking space characters
Many people do. Perhaps it is wise to move the text:
from the bottom to the top, just below the first paragraph. (and change "above explanations" to "explanations below")Total Commander uses the free Delphi library TRegExpr by Andrey V. Sorokin, which is now available at https://regex.sorokin.engineer.
Some of the above explanations are from the help file for this library.