AutoHotkey Community

It is currently May 26th, 2012, 9:08 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 13 posts ] 
Author Message
 Post subject: substr(buffer,1,X) bugs
PostPosted: June 26th, 2009, 1:54 pm 
Offline

Joined: December 4th, 2006, 10:35 am
Posts: 561
Location: Galil, Israel
hope this is enough to be helpful.

call to sub(x,buffer)

where sub(x,byref buffer) {..}

fills buffer from dll call


allows (sometimes but not allways, not sure why)


zz := substr(buffer,1,10) results in zz longer than 10 bytes!



note: unrelated issue, with a:=buffer; does not seem to assign entire varcapacity of buffer, nor to set varcapacity of a = zero term limit of buffer. noticed specifically with sub returns, may be related to that, have not have time to look at into depth, sorry if this is not enough detail to be helpful, if can delve deeper later, will do).

_________________
Joyce Jamce


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 26th, 2009, 5:19 pm 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7502
Location: Australia
Looking at the source code for SubStr, it seems impossible for a string to be returned longer than the length you specify. The string is specifically null-terminated at that position, or an earlier position if the source string is not long enough.

Perhaps there is something else in your script causing this behaviour. Can you post a short script which reproduces the behaviour?
Quote:
note: unrelated issue, with a:=buffer; does not seem to assign entire varcapacity of buffer,
To do so would be incorrect behaviour. tinku99 and I discussed this recently in the AutoHotkey.dll thread.
Quote:
nor to set varcapacity of a = zero term limit of buffer.
In a direct assignment, the variable's internal length field is used to avoid unnecessarily counting the number of characters in the string. If the variable's internal length field is out-of-date (i.e. due to modification by NumPut, DllCall or other code using the address of the variable), null-bytes may be copied. That is why VarSetCapacity(var,-1) can be used to update var's internal length field.

To state the obvious (for a native English-speaker, at least), a variable's capacity is the maximum amount of data it can contain. AutoHotkey makes certain optimizations to avoid unnecessarily copying and reallocating memory - the apparent result is that a variable's capacity is often greater than the length of its data.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 1st, 2009, 2:24 am 
Offline

Joined: December 4th, 2006, 10:35 am
Posts: 561
Location: Galil, Israel
Lexikos wrote:
Looking at the source code for SubStr, it seems impossible for a string to be returned longer than the length you specify.


yes. that is what I thought when glanced at it. But all the same, that is indeed what happens.

does not seem to *always* happen, (may relate to bytes involved ??)


basically

Code:
 if (bufsize := GETbuf2(source, returnval) ) > 1
; returnval is byref
; return val filled via dll, basically memory moves.

{

if (substr(source,-3) = ".*$1") and  ((returnval5 := substr(returnval,1,5)) != badVtcID)
....

}


returnval5 was returning sometimes 7 or 10 chars in length.


Quote:

Perhaps there is something else in your script causing this behaviour.



yes, this was my thought also. but unless AHK is confusing var with another ?, does not seem issue.

msgbox % "*" returnval5 "*"

clearly shows > 5 chars sometimes.



Quote:
note: unrelated issue, with a:=buffer; does not seem to assign entire varcapacity of buffer,

the variable's internal length field is out-of-date (i.e. due to modification by NumPut, DllCall or other code using the address of the variable),


yes. this is exactly what seems to be happening, but return in funct. seems to restore whereas := does not...

Insight as to reason that is so ?

_________________
Joyce Jamce


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 1st, 2009, 6:41 am 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7502
Location: Australia
Variables have an internal length field; every other string value does not and must therefore be limited by the null-terminator. Script functions can only return strings, not variables or pure numbers.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 1st, 2009, 8:53 pm 
Offline

Joined: December 4th, 2006, 10:35 am
Posts: 561
Location: Galil, Israel
Lexikos wrote:
Variables have an internal length field; every other string value does not and must therefore be limited by the null-terminator. Script functions can only return strings, not variables or pure numbers.


not following why X: = Y

*should* be different result from X:= FUN(Y) // FUN(Y) { return y }

_________________
Joyce Jamce


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 1st, 2009, 10:56 pm 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7502
Location: Australia
So do you understand why it is different? If you're asking why it was designed this way, I'd guess because variables were originally intended only to hold (null-terminated) strings - other types of values were not needed.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 2nd, 2009, 7:48 am 
Offline

Joined: December 4th, 2006, 10:35 am
Posts: 561
Location: Galil, Israel
Lexikos wrote:
So do you understand why it is different? If you're asking why it was designed this way, I'd guess because variables were originally intended only to hold (null-terminated) strings - other types of values were not needed.

(have not looked at code, based on what am understanding you to explain,)

return of var via function resets the varcapacity to location of string end char(0) in buffer, but assignment via xx := xxx does not.

:= assignment sends over as much of the var as was last (?) registered as being zero string term location.

eg., if define string with 1M of X's. can use that as buffer, using := to copy buffer to other vars, but cannot do same within functions via. function return of var.



is that about right ?

_________________
Joyce Jamce


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 2nd, 2009, 9:00 am 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7502
Location: Australia
That's about it, except that "varcapacity" is not directly relevant - it is determined by the length of data which is being copied into the variable.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 2nd, 2009, 11:12 am 
Offline

Joined: December 4th, 2006, 10:35 am
Posts: 561
Location: Galil, Israel
am assuming that any op on a var will result in reset of var length, eg.,

x .= ""

will reset ?

and,

can depend on this, eg., varsetcapacity(x,1000000,1) allows 1 meg buffer passed Z := x etc, and y := z, etc., so long as only binary touching of the &x buffer area ?


(or actually need to fill x with string to reset string length pointer).

_________________
Joyce Jamce


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 2nd, 2009, 11:40 am 
Offline

Joined: December 4th, 2006, 10:35 am
Posts: 561
Location: Galil, Israel
haystack_length is being set to allocated length, eg., size of buffer not string pointer, right. ?






ref:
Code:
void BIF_SubStr(ExprTokenType &aResultToken, ExprTokenType *aParam[], int aParamCount) // Added in v1.0.46.
{
   // Set default return value in case of early return.
   aResultToken.symbol = SYM_STRING;
   aResultToken.marker = "";

   // Get the first arg, which is the string used as the source of the extraction. Call it "haystack" for clarity.
   char haystack_buf[MAX_NUMBER_SIZE]; // A separate buf because aResultToken.buf is sometimes used to store the result.
   char *haystack = TokenToString(*aParam[0], haystack_buf); // Remember that aResultToken.buf is part of a union, though in this case there's no danger of overwriting it since our result will always be of STRING type (not int or float).
   int haystack_length = (int)EXPR_TOKEN_LENGTH(aParam[0], haystack);

   // Load-time validation has ensured that at least the first two parameters are present:
   int starting_offset = (int)TokenToInt64(*aParam[1]) - 1; // The one-based starting position in haystack (if any).  Convert it to zero-based.
   if (starting_offset > haystack_length)
      return; // Yield the empty string (a default set higher above).
   if (starting_offset < 0) // Same convention as RegExMatch/Replace(): Treat a StartingPos of 0 (offset -1) as "start at the string's last char".  Similarly, treat negatives as starting further to the left of the end of the string.
   {
      starting_offset += haystack_length;
      if (starting_offset < 0)
         starting_offset = 0;
   }

   int remaining_length_available = haystack_length - starting_offset;
   int extract_length;
   if (aParamCount < 3) // No length specified, so extract all the remaining length.
      extract_length = remaining_length_available;
   else
   {
      if (   !(extract_length = (int)TokenToInt64(*aParam[2]))   )  // It has asked to extract zero characters.
         return; // Yield the empty string (a default set higher above).
      if (extract_length < 0)
      {
         extract_length += remaining_length_available; // Result is the number of characters to be extracted (i.e. after omitting the number of chars specified in extract_length).
         if (extract_length < 1) // It has asked to omit all characters.
            return; // Yield the empty string (a default set higher above).
      }
      else // extract_length > 0
         if (extract_length > remaining_length_available)
            extract_length = remaining_length_available;
   }

   // Above has set extract_length to the exact number of characters that will actually be extracted.
   char *result = haystack + starting_offset; // This is the result except for the possible need to truncate it below.

   if (extract_length == remaining_length_available) // All of haystack is desired (starting at starting_offset).
   {
      aResultToken.marker = result; // No need for any copying or termination, just send back part of haystack.
      return;                       // Caller and Var:Assign() know that overlap is possible, so this seems safe.
   }
   
   // Otherwise, at least one character is being omitted from the end of haystack.  So need a more complex method.
   if (extract_length <= MAX_NUMBER_LENGTH) // v1.0.46.01: Avoid malloc() for small strings.  However, this improves speed by only 10% in a test where random 25-byte strings were extracted from a 700 KB string (probably because VC++'s malloc()/free() are very fast for small allocations).
      aResultToken.marker = aResultToken.buf; // Store the address of the result for the caller.
   else
   {
      // Otherwise, validation higher above has ensured: extract_length < remaining_length_available.
      // Caller has provided a NULL circuit_token as a means of passing back memory we allocate here.
      // So if we change "result" to be non-NULL, the caller will take over responsibility for freeing that memory.
      if (   !(aResultToken.circuit_token = (ExprTokenType *)malloc(extract_length + 1))   ) // Out of memory. Due to rarity, don't display an error dialog (there's currently no way for a built-in function to abort the current thread anyway?)
         return; // Yield the empty string (a default set higher above).
      aResultToken.marker = (char *)aResultToken.circuit_token; // Store the address of the result for the caller.
      aResultToken.buf = (char *)(size_t)extract_length; // MANDATORY FOR USERS OF CIRCUIT_TOKEN: "buf" is being overloaded to store the length for our caller.
   }
   memcpy(aResultToken.marker, result, extract_length);
   aResultToken.marker[extract_length] = '\0'; // Must be done separately from the memcpy() because the memcpy() might just be taking a substring (i.e. long before result's terminator).
}

_________________
Joyce Jamce


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 2nd, 2009, 11:42 am 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7502
Location: Australia
Joy2DWorld wrote:
am assuming that any op on a var will result in reset of var length, eg.,

x .= ""
Concat (.=) uses Var::mLength and not the null-terminator, but only if there is enough space in the variable to append the string. In this case (x .= ""), there is always enough space. It can potentially be used to concatenate binary values. For example:
Code:
VarSetCapacity(pt,8)
VarSetCapacity(x,4,1), NumPut(33,x)
VarSetCapacity(y,4,1), NumPut(10,y)
pt .= x   ; This also appears to work:  pt := x . y
pt .= y
MsgBox % "pt: " pt "`nlength of pt: " StrLen(pt) "`npt@0: " NumGet(pt,0) "`npt@4: " NumGet(pt,4)

However, if you rely on this or other undocumented behaviour I have explained in this thread, you may find your scripts will suddenly break when you update AutoHotkey. Relying on an "out-of-date" length may be an interesting way to code, but any benefits it offers over the alternatives (NumPut/NumGet/DllCall) are not worth the risk.
Quote:
(or actually need to fill x with string to reset string length pointer).
Yes. As I said, capacity is not directly relevant; only the perceived length.
Joy2DWorld wrote:
haystack_length is being set to allocated length, eg., size of buffer not string pointer, right. ?
If by "size of buffer" you mean the variable's capacity as set by VarSetCapacity, nope. See the EXPR_TOKEN_LENGTH macro definition in script.h; it uses strlen() for strings or var->Length() for variables. Anyway, so what?

Edit! OK, it looks like only assign-concat (.=) requires there to be space in the variable. This seems to work:
Code:
VarSetCapacity(x,4,1), NumPut(33,x)
VarSetCapacity(y,4,1), NumPut(10,y)
pt := x . y
MsgBox % "pt: '" pt "'`nlength of pt: " StrLen(pt) "`npt@0: " NumGet(pt,0) "`npt@4: " NumGet(pt,4)


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 2nd, 2009, 11:59 am 
Offline

Joined: December 4th, 2006, 10:35 am
Posts: 561
Location: Galil, Israel
interesting.

btw,

Lexikos wrote:
See the EXPR_TOKEN_LENGTH macro definition in script.h; it uses strlen() for strings or var->Length() for variables. Anyway, so what?



trying to figure out how substr(x,1,5) can result in 10 char string,
and notice that substr is taking shortcut where EXPR_TOKEN_LENGTH result is shorter than the # chars included in the substr.

_________________
Joyce Jamce


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 5th, 2009, 7:09 am 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7502
Location: Australia
I've just realised that although the return value of a script function is always null-terminated, if you pass a variable reference to a non-ByRef parameter, it acts the same as a simple assignment:
Code:
VarSetCapacity(x,4,1), NumPut(33,x)
VarSetCapacity(y,4,1), NumPut(10,y)
F(x . y)        ; Null-terminated, passes the bytes: 33, 0.
F(_ := x . y)   ; Uses length of _ (length of x + length of y).

F(pt) {
    MsgBox % "pt: '" pt "'`nlength of pt: " StrLen(pt) "`npt@0: " NumGet(pt,0) "`npt@4: " NumGet(pt,4)
}

Of course, ByRef would be more suitable in most cases.


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: Bing [Bot], BrandonHotkey, Exabot [Bot], krajan, patgenn123, Yahoo [Bot] and 60 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group