Bug in hex search?

Mi.Chal. · Post by *Mi.Chal. » 2012-07-22, 17:49 UTC

I tried to find text in files in hex form. I looked for "B16B00B5" and it found some files. But there is actually B14B00B5, not what I tried to find.

Same issue is when I use F3 to view file content and use Search.

Sob · Post by *Sob » 2012-07-22, 20:25 UTC

Looking where else Microsoft put naughty constants, are you? :)

But yes, there is something wrong with the way how TC searches it. It seems that the search is influenced by Case sensitive option, although as hex it should not be. ASCII character 6B is "k" and 4B is "K". Both will be found with Case sensitive unchecked. When you check it, then it works correctly.

IMHO when Hex is selected, all other options should be blocked. Unless it was not meant as "search exactly these bytes" but rather "you can input your string as hex, but it will still be treated like text", but that would be strange.

umbra · Post by *umbra » 2012-07-22, 20:42 UTC

TC's help wrote:Note: When Case sensitive isn't checked, characters with different case will also be found. Example: 4B will not only find 'K', but also the lowercase 'k' (Hex. 6B)

So I guess it is intentional.

Sob · Post by *Sob » 2012-07-22, 21:04 UTC

Well, it seems it is then. And I must admit that for mixed (ascii with some hex) searches it's really good idea (instead of forcing user to write everything as hex). If I knew that TC supports it, I'd definitely use it from time to time.

But I still think that without reading the help file, no one will get it. Because if you see "search hex" option, you expect to search exact bytes.

Post by *Hacker » 2012-07-22, 21:06 UTC

Sob,

it was not meant as "search exactly these bytes" but rather "you can input your string as hex, but it will still be treated like text"

That is exactly how it is.

Roman

umbra · Post by *umbra » 2012-07-22, 21:27 UTC

Sob wrote:But I still think that without reading the help file, no one will get it.

It's not the only thing in TC, that requires a user to read the helpfile. Luckily, TC's help is context-sensitive, so it's easy to look things up if you are trying to do something new or something, you haven't done in a long while. I really recommend doing that.

Sob · Post by *Sob » 2012-07-22, 22:47 UTC

Here's the problem: some things are obvious or self-explanatory (or they only seem so, that's the danger) and some are not. With those that are not, it's simple. E.g. if you use synchronization for the first time, you might wonder what asymmetric means. So you'll press F1 and get the answer. But you won't read the help file to find out what e.g. Tree button does in copy dialog. You'll just try to press it once and when it seems to do what you assumed it would, you'll think that your expectations were correct and keep living with that.
Unfortunately for hex search, it falls into the (seemingly) obvious category. You try it once (very likely with some only non-ascii data, otherwise you would simply use standard text search) and get the expected result. Lesson learned: it searches the exact bytes as expected. And once you "know" it, you won't go reading about it in help file again just for fun. You might go there only if it "misbehaves" AND you're not absolutely sure that you got it right. But in this case it's likely that you ARE sure. At least the original poster was and I'm guilty too. :)

What to do to make it give the RIGHT obvious impression (if anything, because if most people could live with it until now... but then again most people do not do hex searches). Perhaps checking Hex could also check Case sensitive, but some people used to current version might not like it. Or split Hex into two separate options, one for "exact bytes and no other options" and other for "mixed ascii/hex" (current behaviour). Kind of redundant, but that way would the first fall into right obvious category and the other one into non-obvious, i.e. "need to look up in manual", so no space for confusion anywhere.

Mi.Chal. · Post by *Mi.Chal. » 2012-07-23, 20:49 UTC

Sob wrote: But yes, there is something wrong with the way how TC searches it. It seems that the search is influenced by Case sensitive option, although as hex it should not be. ASCII character 6B is "k" and 4B is "K". Both will be found with Case sensitive unchecked. When you check it, then it works correctly.

You are right, case sensitive search works. And as you said, it is really strange behavior - when I type some hex numbers, I would not expect it can search also something else.

Sob · Post by *Sob » 2012-07-24, 02:52 UTC

It actually makes sense when you read the help file. If you need to search some text consisting mostly from regular letters, but also containing some strange bytes, then it's really cool feature, because you can use expressions like:

"some"00"really"1b"strange"0d"long"0a"string"

And you might or might not want case sensitivity with that. But if you're used to how hex searching usually works in other programs, you simply can't expect that it's like this in TC.

Post by *ghisler(Author) » 2012-07-26, 12:59 UTC

Yes, it is intentional excatly because of this mixed search method. The search remains a text search even when giving some of the characters as hex codes.

Sob · Post by *Sob » 2012-07-27, 18:04 UTC

It became clear very fast that it's not a bug, thanks to the fact that some people do read help files. :) But most don't (for such seemingly obvious things) and the question is, whether it is a problem or not. If user should be more curious and be able to find about it or if TC should do something more to make user aware about this mixed mode.

IMHO the best solution would be to leave it as it is now and do something with the dialog (specifically with text search options) in longer run. I think it deserves to be made a little more user-friendly (and I don't mean dumbed-down, I hate that).

--

Currently there is just bunch of checkboxes and it's not clear at all how they play together.

There are three for encodings (DOS special chars, Unicode, UTF8), scattered all over the place. But those should in fact be radio buttons (*1), because only one can be active at the same time. Plus the fourth should be there for default ANSI, which is currently selected by unchecking all three.

Then the simple text search has the same problem as hex/mixed - it's not just text search, it also supports some special characters. Yes, it's documented, but there is no indication about it in dialog itself. It's easy to quickly realize what went wrong, but it would be better if any confusion did not occur in the first place. And it would not, if there was a choice what to search (radio buttons):

Text exactly as entered
Text with special characters
Not just \t and \n, but all (\r, \0, \0x20, ...). This could replace current hex/mixed. The only problem is troublemaker \n, which can mean both LF and CRLF.
Regular expression
Hex search for exact bytes
This one is possibly unnecessary with second option, but it's easier to type just "0d0a00" than "\0x0d\0x0a\0x00", so it's justified. And on top of that, it can enforce input validation. Now I can enter "test0d0a", check Hex and TC will happily search for something, even though the input is clearly wrong.

--
(*1) for now, but eventually the request for parallel search using more than one encoding should be implemented

umbra · Post by *umbra » 2012-07-27, 19:00 UTC

Sob wrote:There are three for encodings (DOS special chars, Unicode, UTF8), scattered all over the place. But those should in fact be radio buttons ...

That is an old and known problem. You can find that type of behavior in several TC's dialogs. For example I reported it a long time ago (see points 1.E and 3.B). I think there were also others, who requested something similar, but so far no change.