Jump to content


Buffer overflow?


  • Please log in to reply
9 replies to this topic

#1 x

x
  • Guests

Posted 06 March 2006 - 07:45 AM

in script.cpp
VarSizeType Script::GetFileDir(char *aBuf)
{
    char buf[MAX_PATH + 1]; // Uses +1 to append final backslash for AutoIt2 (.aut) scripts.
    char *target_buf = aBuf ? aBuf : buf;
    strlcpy(target_buf, mFileDir, MAX_PATH);                 //<<<<<<<HERE
    size_t length = strlen(target_buf); // Needed not just for AutoIt2.
    // If it doesn't already have a final backslash, namely due to it being a root directory,
    // provide one so that it is backward compatible with AutoIt v2:
    if (mIsAutoIt2 && length && target_buf[length - 1] != '\\')
    {
        target_buf[length++] = '\\';
        target_buf[length] = '\0';
    }
    return (VarSizeType)length;
}
aBuf is not always allocated enough.
If not,following memory of aBuf will be overwritten with mFileDir or '\0',and causes fatal error.

#2 PhiLho

PhiLho
  • Fellows
  • 6850 posts

Posted 06 March 2006 - 09:52 AM

Can you give an example or a line of code showing that GetFileDir is called with a buffer too small? You give the "faulty" function, but not the context, ie. how it is called with a buffer too small.

PS.: Chris, searching how GetFileDir was called, I found the GetVarType function. Since you seem to be careful about performance, perhaps you can advance the aVarName of two chars, and do the stricmp on the names without the "A_" prefix. It would save a wooping 250 bytes to the (non-compressed) exe too... :-)
Checking for some common subprefixes like IPAddress, LoopFile or LoopReg can save also some time/space.
Another trick to save time (microseconds...) could be to lowercase the searched string and to do case sensitive comparisons.

PPS.: my initial remark may sound dismissive, but I must thank x to provide feedback (ponderated by a question mark in the title...). Now, it is up to Chris to confirm if there is a real problem here.

#3 Chris

Chris
  • Administrators
  • 10727 posts

Posted 06 March 2006 - 05:53 PM

strlcpy(target_buf, mFileDir, MAX_PATH); <<<<<<aBuf is not always allocated enough.
If not,following memory of aBuf will be overwritten with mFileDir or '\0',and causes fatal error.

You could be right but I don't yet see it. Perhaps you are thinking that strlcpy() is similar to strncpy(). It's actually a little different. Also, aBuf is allocated to be as large as the previously returned length (with an extra +1 for the terminator). So as long as the first call's returned length is correct, I don't see any way for the second call to cause overflow.

Incidentally, I know the two-call method for retrieving built-in variables is somewhat inefficient; but coming up with a better alternative might be harder than it seems.

do the stricmp on the names without the "A_" prefix.
Checking for some common subprefixes like IPAddress, LoopFile or LoopReg can save also some time/space.

Nice improvement. I'd applied a similar trick in other sections, and it certainly seems appropriate here too. Thanks.

Another trick to save time (microseconds...) could be to lowercase the searched string and to do case sensitive comparisons.

I've seen a lot of other code do that but never got in the habit myself. I will try to make it a habit by starting with this section.

Since some of the code is nearly three years old, I'm sure there are more minor optimizations like these. If you find more, please let me know.

Thanks.

#4 Guests

  • Guests

Posted 06 March 2006 - 07:45 PM

http://lukewarm.s101...up/file/048.zip
Unzip this scripts and AutoHotkey1.0.42.05 zip file into "F:\test\AutoHotkey\bugtest\script\"(it can be in other drive) , run AutoHotkey.exe. It will crash.

The second call of GetFileDir(for copying A_ScriptDir to HomeDir), aBuf has been allocated enough(64bytes) size for the path(33+1bytes).
But strncpy(used in strlcpy) fills remaining memory(MAX_PATH-(33+1)bytes) with '\0'.
If that memory is used for other purpose,it will be broken,and causes error.

See this document:
http://msdn.microsof...c_._mbsncpy.asp

If count is greater than the length of strSource, the destination string is padded with null characters up to length count.



#5 Chris

Chris
  • Administrators
  • 10727 posts

Posted 06 March 2006 - 10:06 PM

Thanks for all the details, which confirms this is a bug. I didn't know about this behavior of strncpy, but I should have since it's part of the standard.

I'll fix this issue and search the code for any related problems. I'll also replace strncpy with more efficient code in places where its zero filling is wasteful.

Thanks.

Edit: I checked and it looks like A_ScriptDir is the only part of the code affected by this bug. Also, it seems best to keep strncpy() because its 32-bit memory manipulations in assembly benchmark faster than anything I could code.

#6 PhiLho

PhiLho
  • Fellows
  • 6850 posts

Posted 07 March 2006 - 09:12 AM

Aha! It took me a while to understand the problem, but now I get it: the strlcpy is OK with the internal buffer, should be OK for aBuf as it is supposed to have the right size, but Microsoft's strlcpy implementation overdoes it...

Note that if you search for strlcpy on MSDN, you get only a hit on a generic article, no entry in the vclib section.

Googling for it gives, for example, a man page for the BSD version (used in MacOS X). The manual states that the string is NUL-terminated, not that the remainder should be filled with NUL chars (unlike strncpy). That's where MS goofed...

Thanks for pointing that out, I learnt a thing today. :-)

PS.: Chris, I suggested these micro-optimizations because there was already one at the start of the function... I doubt users will ever notice a speedup. But well, you like to take care of these improvements, and I am quite anal to these micro-optimizations too.
But you must be aware that you trade a self-documenting code to one slightly more cryptic - but a few comments could compensate for that, I know you are not greedy for them...
The best optimization would be to use a hash table: fill the hash table once, compute the hash of the string to search, make an instant lookup (table index) and compare the string to be sure (assuming there is no collision).
It may not be worth creating a hash class just for this function, but if it can be reused elsewhere...

#7 Chris

Chris
  • Administrators
  • 10727 posts

Posted 07 March 2006 - 02:15 PM

...but Microsoft's strlcpy implementation overdoes it...

Actually, I don't think the MS C 7.1 library even has a strlcpy in it. I'm using my own that consists simply of:
--aDstSize; // Convert from size to length (caller has ensured that aDstSize > 0).
strncpy(aDst, aSrc, aDstSize);
aDst[aDstSize] = '\0';
The call to strncpy() is where I went wrong: it zero-fills any unused portion of the destination.

It may not be worth creating a hash class just for this function, but if it can be reused elsewhere...

Hashing might get implemented someday for faster lookups of variables and labels. However, it wouldn't be that big of an improvement because:

1) All variables are pre-looked-up when the script launches. No lookups are done at runtime except for dynamic variables (like Array%i%).

2) All lookups use the split binary-search algorithm, which Laszlo suggested when I asked him for an alternative to hashing and trees. This algorithm is quite fast and scales very well (at least up to several million variables).

these micro-optimizations [such as case sensitive comparisons]

I made the change but then realized I had all sorts of new questions for which answers are scarce. For example:

1) Is the strlwr + strcmp strategy exactly equivalent to stricmp regardless of setlocale? I suspect the answer is yes, if only because of the high number of people using it.

2) Is strlwr the same as the OS's CharLower? It appears the answer is "no" because strlwr uses setlocale's locale (which is always the C default if you never call setlocale?) but CharLower uses the OS's region as set by the control panel's language/keyboard layout.

3) Does CharLower() always convert the letters A-Z into a-z, regardless of the user's region? Apparently, the answer is so obvious that isn't documented at MSDN or anywhere else I could find. MSDN does offer this little clue, but it's so specific as to be mysterious: "CharLower always maps uppercase I to lowercase I, even when the current language is Turkish or Azeri."

4) If strlwr() is called in the code, there is a slight amount of bloat so I wanted to avoid it. But from the above, it appears that mixing CharLower with strcmp would be a bad practice, even if it happens to work for certain subsets of ANSI characters. CharLower() could be paired up with the OS's CompareString(), but I seem to remember that it's a very poor performer compared to strcmp().

Answers and comments are welcome.

#8 PhiLho

PhiLho
  • Fellows
  • 6850 posts

Posted 07 March 2006 - 02:42 PM

Frankly, the stricmp is most probably good enough, because for low Ascii chars, it probably convert to the other case with just bit masking, so it is very fast.

But for the sake of discussion, I think that the locale is a non issue. We are speaking of variable names here, where the character set is very delimited, unlike some languages like Visual Basic or Lua that allows characters used in locale words (but Lua is case sensitive...).

So strlwr should behave consistently, at least for the low Ascii characters. The 7bit charset is extremly well defined, whatever the locale. Even UTF-7 leaves it as is...

Note: if you have Visual Studio, you have the sources of the C library where there is strlwr. I don't have it handy, but I know that most of the C library functions maps more or less directly to Win32 API functions. The difference is, obviously, in the wrapper code, which can be very thin, or more complex.

#9 Chris

Chris
  • Administrators
  • 10727 posts

Posted 07 March 2006 - 03:58 PM

That's good info; thanks.

#10 Chris

Chris
  • Administrators
  • 10727 posts

Posted 07 March 2006 - 04:55 PM

The bug reported at the top of this topic has been fixed. Thanks again for reporting it, and for helping to educate me. :)