Searching for files containing only 0x00 bytes does not work

Please report only one bug per message!

Moderators: white, Hacker, petermad, Stefan2

Post Reply
Regular-Expression2
Junior Member
Junior Member
Posts: 3
Joined: 2010-06-08, 11:29 UTC

Searching for files containing only 0x00 bytes does not work

Post by *Regular-Expression2 »

Hi,

for some unknown reason on my system some files are overwritten with all 0x00 (file length is unchanged).

I tried to search for those file and did the following:
-Search text = "[\x01-\xFF]"
-find files which doesn't contain the search TEXT (called "Finde Dateien, die den Text NICHT enthalten" in german)
-Reg.Expression
-File size > 0 bytes

This didn't work for me! It didn't find any file, even those are not found which are known to only contain 0x00 bytes!

Then i did some more tests:
Seaching for files containing "\x00" (RegEx enabled) did find all files even if they don't contain any 0x00 byte!

I helped myself by setting the search text to "[\x02-\xFF]", which is not fully correct but will do almost what i wanted.
Unfortunally, that did also find files which only contains "0x0D 0x0A" sequence(s), but this seems to be a result of the RegEx search, which doesn't search globally, but line-by-line.

Conclusion:
There is a bug searching for regex "\x00"

PS: Any hints on how to optimize my search for files which only contains 0x00 bytes?
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

Problem with regex is that \x00 is an end of line. So I think any regex engine will find no matches for \x00 character. Only hex search can find zero bytes, but it doesn't allow to check if some other bytes exist.

I think this plugin can help you. All you need - to write script that will return Yes if file contain only zero bytes. It will allow to search using script.wdx plugin's field.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Try to search for
00
with checkbox "hex" instead.
Author of Total Commander
https://www.ghisler.com
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

ghisler(Author) wrote:Try to search for
00
with checkbox "hex" instead.
Problem is that he need to search files that contains 00 bytes only (entire file filled with 00 bytes), but not files that contain at least one 00 byte.
Regular-Expression2
Junior Member
Junior Member
Posts: 3
Joined: 2010-06-08, 11:29 UTC

Post by *Regular-Expression2 »

MVV wrote:Problem is that he need to search files that contains 00 bytes only (entire file filled with 00 bytes), but not files that contain at least one 00 byte.
Exact.
MVV wrote:Problem with regex is that \x00 is an end of line. So I think any regex engine will find no matches for \x00 character.
That is not fully true. I tried two other tools ("Notepad++" ("Find In Files") and "PowerGREP") and they both handle it like one would expect.
As the source code of Notepad++ is available, you might want to look into their implementation.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Unfortunately the used RegEx library doesn't allow to search for NULL bytes, sorry.
Author of Total Commander
https://www.ghisler.com
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

So it seems that the only way - is to use plugins, and script plugin can do it w/o recompiling, just with special script.
Regular-Expression2
Junior Member
Junior Member
Posts: 3
Joined: 2010-06-08, 11:29 UTC

Post by *Regular-Expression2 »

OK, so i will try the script plugin.
User avatar
white
Power Member
Power Member
Posts: 4593
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Post by *white »

If the file contents contains null-characters, searching using regex is not reliable.
To give some more insight, here's my beta test report from some time ago:
white (Posted on beta forum: Lister regex search and null-characters) wrote:When searching in Lister using Regex, it is known you cannot find null-characters (with hex value 00) using "\x00". Instead searching for "\x00" seems to find the end of a line. It is because the Regex library was implemented using null-terminated strings.

But also if the file contents contains null-characters, searching using regex is not reliable. So searching binaries of any kind using regex is not reliable. Here are some quirks I found:

1) Searching for "\x01" finds the corrects characters in lines without null-characters. Searching for "\x01" does not match the end of a line.
But in lines with one or more null-characters it also find null-characters and end of the line. TC finds things that aren't there.

2) When searching in a line containing a null-character, searching for "$", "\Z" or "\x00" finds the end of the line one or two characters after the actual end of the line.
Example:
Create the following file.

Code: Select all

some text
some[NULL]text
some text
Repeatedly search for "t$".
Also try to search repeatedly for "..$".

3) When searching backwards TC does not find matches after a null-character.
Example:
Open the file mentioned in the example above in Lister.
Go to the end of the file.
Repeatedly search backwards for "e". (regex checked!)

So searching in binaries can find things that aren't there and it may fail to find things that are there. I think this should be fixed or the user should be made aware of this.
Unfortunately these are known limitations of the RegEx library and cannot be helped. I hope in the future Christian will find a better library to use.

By the way. The website supporting the RegEx library: http://www.regexpstudio.com/ seems to be dead for quite some time. Does anyone know what happened?
User avatar
white
Power Member
Power Member
Posts: 4593
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Post by *white »

By the way. The website supporting the RegEx library: http://www.regexpstudio.com/ seems to be dead for quite some time. Does anyone know what happened?
The website is online again.
wfdhfghff
Junior Member
Junior Member
Posts: 6
Joined: 2010-01-21, 21:54 UTC

Post by *wfdhfghff »

Does this problem still exist?
I need to do the very same task (find all files consisting only of NUL-Bytes).
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

You can't do it, it's impossible with the currently used RegEx library because it is line based.
Author of Total Commander
https://www.ghisler.com
wfdhfghff
Junior Member
Junior Member
Posts: 6
Joined: 2010-01-21, 21:54 UTC

Post by *wfdhfghff »

Thank you for this up-to-date information!
Post Reply