This forum uses cookies. Click X button to hide this message. What is stored? 
Total Commander Forum Index Total Commander
Forum - Public Discussion and Support
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Searching for files containing only 0x00 bytes does not work

 
Post new topic   Reply to topic    Total Commander Forum Index -> TC7.5x(a) final bug reports (English) Printable version
View previous topic :: View next topic  
Author Message
Regular-Expression2
Junior Member
Junior Member


Joined: 08 Jun 2010
Posts: 3

PostPosted: Tue Jun 08, 2010 5:54 am    Post subject: Searching for files containing only 0x00 bytes does not work Reply with quote

Hi,

for some unknown reason on my system some files are overwritten with all 0x00 (file length is unchanged).

I tried to search for those file and did the following:
-Search text = "[\x01-\xFF]"
-find files which doesn't contain the search TEXT (called "Finde Dateien, die den Text NICHT enthalten" in german)
-Reg.Expression
-File size > 0 bytes

This didn't work for me! It didn't find any file, even those are not found which are known to only contain 0x00 bytes!

Then i did some more tests:
Seaching for files containing "\x00" (RegEx enabled) did find all files even if they don't contain any 0x00 byte!

I helped myself by setting the search text to "[\x02-\xFF]", which is not fully correct but will do almost what i wanted.
Unfortunally, that did also find files which only contains "0x0D 0x0A" sequence(s), but this seems to be a result of the RegEx search, which doesn't search globally, but line-by-line.

Conclusion:
There is a bug searching for regex "\x00"

PS: Any hints on how to optimize my search for files which only contains 0x00 bytes?
Back to top
View user's profile Send private message
MVV
Power Member
Power Member


Joined: 03 Aug 2008
Posts: 7861
Location: Russian Federation

PostPosted: Tue Jun 08, 2010 8:14 am    Post subject: Reply with quote

Problem with regex is that \x00 is an end of line. So I think any regex engine will find no matches for \x00 character. Only hex search can find zero bytes, but it doesn't allow to check if some other bytes exist.

I think this plugin can help you. All you need - to write script that will return Yes if file contain only zero bytes. It will allow to search using script.wdx plugin's field.
_________________
TCFS2 + TCFS2Tools: Full-screen mode for TC etc (forum)
TOTALCMD.NET: AskParam, CopyTree, NTLinks, Sudo, VirtualPanel…
Back to top
View user's profile Send private message Send e-mail
ghisler(Author)
Site Admin
Site Admin


Joined: 04 Feb 2003
Posts: 34301
Location: Switzerland

PostPosted: Tue Jun 08, 2010 8:44 am    Post subject: Reply with quote

Try to search for
00
with checkbox "hex" instead.
_________________
Author of Total Commander
http://www.ghisler.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
MVV
Power Member
Power Member


Joined: 03 Aug 2008
Posts: 7861
Location: Russian Federation

PostPosted: Tue Jun 08, 2010 9:55 am    Post subject: Reply with quote

ghisler(Author) wrote:
Try to search for
00
with checkbox "hex" instead.

Problem is that he need to search files that contains 00 bytes only (entire file filled with 00 bytes), but not files that contain at least one 00 byte.
_________________
TCFS2 + TCFS2Tools: Full-screen mode for TC etc (forum)
TOTALCMD.NET: AskParam, CopyTree, NTLinks, Sudo, VirtualPanel…
Back to top
View user's profile Send private message Send e-mail
Regular-Expression2
Junior Member
Junior Member


Joined: 08 Jun 2010
Posts: 3

PostPosted: Wed Jun 09, 2010 4:04 am    Post subject: Reply with quote

MVV wrote:
Problem is that he need to search files that contains 00 bytes only (entire file filled with 00 bytes), but not files that contain at least one 00 byte.

Exact.
MVV wrote:
Problem with regex is that \x00 is an end of line. So I think any regex engine will find no matches for \x00 character.

That is not fully true. I tried two other tools ("Notepad++" ("Find In Files") and "PowerGREP") and they both handle it like one would expect.
As the source code of Notepad++ is available, you might want to look into their implementation.
Back to top
View user's profile Send private message
ghisler(Author)
Site Admin
Site Admin


Joined: 04 Feb 2003
Posts: 34301
Location: Switzerland

PostPosted: Wed Jun 09, 2010 4:27 am    Post subject: Reply with quote

Unfortunately the used RegEx library doesn't allow to search for NULL bytes, sorry.
_________________
Author of Total Commander
http://www.ghisler.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
MVV
Power Member
Power Member


Joined: 03 Aug 2008
Posts: 7861
Location: Russian Federation

PostPosted: Wed Jun 09, 2010 4:29 am    Post subject: Reply with quote

So it seems that the only way - is to use plugins, and script plugin can do it w/o recompiling, just with special script.
_________________
TCFS2 + TCFS2Tools: Full-screen mode for TC etc (forum)
TOTALCMD.NET: AskParam, CopyTree, NTLinks, Sudo, VirtualPanel…
Back to top
View user's profile Send private message Send e-mail
Regular-Expression2
Junior Member
Junior Member


Joined: 08 Jun 2010
Posts: 3

PostPosted: Wed Jun 09, 2010 6:52 am    Post subject: Reply with quote

OK, so i will try the script plugin.
Back to top
View user's profile Send private message
white
Power Member
Power Member


Joined: 19 Nov 2003
Posts: 2020
Location: Netherlands

PostPosted: Tue Jul 13, 2010 3:03 pm    Post subject: Reply with quote

If the file contents contains null-characters, searching using regex is not reliable.
To give some more insight, here's my beta test report from some time ago:

white (Posted on beta forum: Lister regex search and null-characters) wrote:
When searching in Lister using Regex, it is known you cannot find null-characters (with hex value 00) using "\x00". Instead searching for "\x00" seems to find the end of a line. It is because the Regex library was implemented using null-terminated strings.

But also if the file contents contains null-characters, searching using regex is not reliable. So searching binaries of any kind using regex is not reliable. Here are some quirks I found:

1) Searching for "\x01" finds the corrects characters in lines without null-characters. Searching for "\x01" does not match the end of a line.
But in lines with one or more null-characters it also find null-characters and end of the line. TC finds things that aren't there.

2) When searching in a line containing a null-character, searching for "$", "\Z" or "\x00" finds the end of the line one or two characters after the actual end of the line.
Example:
Create the following file.
Code:
some text
some[NULL]text
some text

Repeatedly search for "t$".
Also try to search repeatedly for "..$".

3) When searching backwards TC does not find matches after a null-character.
Example:
Open the file mentioned in the example above in Lister.
Go to the end of the file.
Repeatedly search backwards for "e". (regex checked!)

So searching in binaries can find things that aren't there and it may fail to find things that are there. I think this should be fixed or the user should be made aware of this.

Unfortunately these are known limitations of the RegEx library and cannot be helped. I hope in the future Christian will find a better library to use.

By the way. The website supporting the RegEx library: http://www.regexpstudio.com/ seems to be dead for quite some time. Does anyone know what happened?
Back to top
View user's profile Send private message Send e-mail
white
Power Member
Power Member


Joined: 19 Nov 2003
Posts: 2020
Location: Netherlands

PostPosted: Sat Dec 25, 2010 6:17 am    Post subject: Reply with quote

Quote:
By the way. The website supporting the RegEx library: http://www.regexpstudio.com/ seems to be dead for quite some time. Does anyone know what happened?

The website is online again.
Back to top
View user's profile Send private message Send e-mail
wfdhfghff
Junior Member
Junior Member


Joined: 21 Jan 2010
Posts: 6

PostPosted: Wed Jan 27, 2016 6:34 am    Post subject: Reply with quote

Does this problem still exist?
I need to do the very same task (find all files consisting only of NUL-Bytes).
Back to top
View user's profile Send private message
ghisler(Author)
Site Admin
Site Admin


Joined: 04 Feb 2003
Posts: 34301
Location: Switzerland

PostPosted: Thu Jan 28, 2016 4:42 am    Post subject: Reply with quote

You can't do it, it's impossible with the currently used RegEx library because it is line based.
_________________
Author of Total Commander
http://www.ghisler.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
wfdhfghff
Junior Member
Junior Member


Joined: 21 Jan 2010
Posts: 6

PostPosted: Thu Jan 28, 2016 6:36 am    Post subject: Reply with quote

Thank you for this up-to-date information!
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Total Commander Forum Index -> TC7.5x(a) final bug reports (English) All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Impressum: This site is maintained by Ghisler Software GmbH

Using phpBB © 2001-2005 phpBB Group