AutoHotkey Community

It is currently May 26th, 2012, 4:21 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 14 posts ] 
Author Message
PostPosted: March 21st, 2009, 12:54 am 
Offline

Joined: October 7th, 2006, 4:50 pm
Posts: 3157
Location: MN, USA
[Moved from Ask for Help forum.]
Could the solution offered here be added to the official release?

Can someone explain why "if Var in MatchList" is so much slower than a parsing loop, when they should be doing the same thing? Actually, I would expect the parsing loop to be a bit slower since it references a third variable, A_LoopField, instead of just a needle and matchlist.
Code:
SetBatchLines, -1

Loop, 100000
 matchlist .= A_Index ","
needle = 100000

Start_Time := A_TickCount
If needle IN %matchlist%
{
 Stop_Time := A_TickCount
 time1 := (Stop_Time - Start_Time)/1000 " seconds."
}

Start_Time := A_TickCount
Loop, Parse, matchlist, `,
 If (A_LoopField = needle)
 {
  Stop_Time := A_TickCount
  time2 := (Stop_Time - Start_Time)/1000 " seconds."
 }

MsgBox, The "IN" operator took:`t%time1%`nThe parsing loop took:`t%time2%

On my laptop, the IN operator is more than 10 times slower (1.53 seconds versus 0.13) :!:


Last edited by jaco0646 on March 25th, 2009, 5:31 am, edited 2 times in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: March 21st, 2009, 3:37 am 
I believe that the number of comaprisons required for the IN command is what is causing the extra time.

Look at it this way. Your needle is 588895 bytes long and contains 100000 ","s.

For the loop, it must do:
588895 searches for ","
and compare the previous 6 bytes (100000 compares - one compare for each found ",")

For the IN it must do:
588895 compares (one 6 byte compare at each character postion in matchlist)


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: March 21st, 2009, 3:39 am 
Offline
User avatar

Joined: November 2nd, 2008, 4:23 pm
Posts: 2906
Location: 127.0.0.1
I tested it and your right. A work around:
Code:
Fruits := "Apple,Orange,Banana"

If In("Apple", Fruits)
  Msgbox In("Apple", Fruits) = True
Else
  Msgbox In("Apple", Fruits) = False

If In("Cookie", Fruits)
  Msgbox In("Cookie", Fruits) = True
Else
  Msgbox In("Cookie", Fruits) = False

In(String, MatchList, D=",") {
Loop, Parse, matchlist, `,
 If (A_LoopField = String)
  Return, True
Return, False
}

_________________
aboutscriptappsscripts
Any code ⇈ above ⇈ requires AutoHotkey_L to run


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: March 21st, 2009, 4:37 am 
Offline

Joined: October 7th, 2006, 4:50 pm
Posts: 3157
Location: MN, USA
I guess my question was, "Is there any difference between the IN operator and a parsing loop?" Because I don't see one... whatever the IN operator is doing is wasteful: it's accomplishing the same task, but 10x slower. It's like a compact car with the fuel mileage of a Hummer. What's the point?

Shouldn't it be programmed to just perform a parsing loop internally if that's the faster method?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: March 21st, 2009, 5:08 am 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7502
Location: Australia
They work differently, since if-in must support ,, to mean a literal comma. The actual work is done by IsStringInList() in util.cpp. I think the "wasteful" part might be:
Code:
   while (*this_field)  // For each field in aList.
   {
      // To avoid the need to constantly check for buffer overflow (i.e. to keep it simple),
      // just copy up to the limit of the buffer:
      strlcpy(buf, this_field, sizeof(buf));
...
For each field in the list, it copies up to LINE_SIZE (16384+1) characters from the list itself. That is, it potentially copies from the beginning of the current field to the end of the list, including all fields after it. (If each field was exactly 16384 characters, there would be no wastage.)

Parsing loops, on the other hand, create a single copy of the entire input string once. This copy is referred to for A_LoopField, rather than copying each field/at each iteration.

Note that "10x slower" isn't really accurate, as it depends on the length of each field and the overall length of the list. On my system, your test shows it is "20x slower".


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: March 21st, 2009, 5:34 am 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7502
Location: Australia
Unless I have made an error,
Code:
   char *this_field = aList, *next_field, *cp;

   while (*this_field)  // For each field in aList.
   {
      // To avoid the need to constantly check for buffer overflow (i.e. to keep it simple),
      // just copy up to the limit of the buffer:
      strlcpy(buf, this_field, sizeof(buf));
      // Find the end of the field inside buf.  In keeping with the tradition set by the Input command,
      // this always uses comma rather than g_delimiter.
      for (cp = buf, next_field = this_field; *cp; ++cp, ++next_field)
      {
         if (*cp == ',')
         {
            if (cp[1] == ',') // Make this pair into a single literal comma.
            {
               memmove(cp, cp + 1, strlen(cp + 1) + 1);  // +1 to include the zero terminator.
               ++next_field;  // An extra increment since the source string still has both commas of the pair.
            }
            else // this comma marks the end of the field.
            {
               *cp = '\0';  // Terminate the buffer to isolate just the current field.
               break;
            }
         }
      }

      if (*next_field)  // The end of the field occurred prior to the end of aList.
         ++next_field; // Point it to the character after the delimiter (otherwise, leave it where it is).
may be replaced with
Code:
   char *this_field = aList, *next_field, *cp, *end_buf = buf + sizeof(buf) - 1;

   while (*this_field)  // For each field in aList.
   {
      for (cp = buf, next_field = this_field; cp < end_buf && *next_field; ++cp, ++next_field)
      {
         if (*next_field == ',') // Check if this is a delimiter or literal comma.
         {
            ++next_field;
            if (*next_field != ',') // Was it a delimiter?
               break;
            // Otherwise next_field now points at the second comma, so continue copying.
         }
         *cp = *next_field;
      }
      // Terminate the string in the buffer.
      *cp = '\0';
to the same effect, but considerably faster in most cases. Actually, with fields of approximately 16300 characters, both if-in methods were roughly the same, and both were faster than Loop,Parse. I think short fields are probably more common.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: March 21st, 2009, 1:30 pm 
Offline

Joined: August 24th, 2005, 5:29 pm
Posts: 549
Location: Berlin / Germany
"If Var in" should be marked as obsolet:
Code:
#NoEnv
SetBatchLines, -1
SetFormat, Float, 0.15

Loop, 100
 matchlist .= A_Index . ","
needle = 50

T()
If needle IN %matchlist%
 time1 := T() . " seconds."
 
T()
If InStr("," . Matchlist, "," . Needle . ",")
 time2 := T() . " seconds."

T()
Loop, Parse, matchlist, `,
 If (A_LoopField = needle) {
  time3 := T() . " seconds."
  break
 }

MsgBox, The "IN" operator took:`t%time1%`nInStr() function took:`t%time2%`nThe parsing loop took:`t%time3%
ExitApp

T() {
   Static freq, last_count
   if !freq
      DllCall("QueryPerformanceFrequency","int64*",freq)
   DllCall("QueryPerformanceCounter","int64*",count)
   return (count-last_count)/freq, last_count:=count
}

_________________
nick :wink:


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: March 21st, 2009, 3:55 pm 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7502
Location: Australia
It is pointless having 15 digits of "precision" when the "benchmark" results are inconsistent even with 5 digits. To be more accurate, repeat the test many times and average the result.
Code:
#NoEnv
SetBatchLines, -1
SetFormat, Float, 0.7
i = 10000

Loop, 100
    matchlist .= A_Index . ","
needle = 50

T()
Loop %i%
    If needle IN %matchlist%
        continue ; Do nothing.
time1 := T()/i . " seconds."
 
T()
Loop %i%
    If InStr("," . Matchlist, "," . Needle . ",")
        continue ; Do nothing.
time2 := T()/i . " seconds."

T()
Loop %i%
    Loop, Parse, matchlist, `,
        If (A_LoopField = needle)
            break ; Found match, just break inner loop and continue benchmark.
time3 := T()/i . " seconds."

MsgBox, The "IN" operator took:`t%time1%`nInStr() function took:`t`t%time2%`nThe parsing loop took:`t%time3%
ExitApp
Anyway, what was your point? I already pointed out an optimization, which brings if-in close to the InStr method. If-in still has readability benefits, which is likely to be more important than performance in many cases.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: March 22nd, 2009, 5:24 am 
Offline

Joined: October 7th, 2006, 4:50 pm
Posts: 3157
Location: MN, USA
Lexikos wrote:
If-in still has readability benefits, which is likely to be more important than performance in many cases.
Absolutely, I would guess that readability is the main reason IF-IN was added to AHK at all. I just thought the performance gap between the two methods could be shortened, which your optimization proves. Thank you for weighing in on this, Lexikos; I had hoped you would.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: March 22nd, 2009, 9:44 am 
Offline

Joined: August 24th, 2005, 5:29 pm
Posts: 549
Location: Berlin / Germany
Well, so keep it, I don't need it. :wink:

_________________
nick :wink:


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: March 23rd, 2009, 1:10 pm 
Offline

Joined: May 27th, 2007, 9:41 am
Posts: 4999
Thanks jaco0646, I noticed IF IN Matchlist was slow for some of the
scripts I made, but never considered to change to InStr, it really
is much much faster now I changed it.

_________________
AHK FAQ
TF : Text files & strings lib, TF Forum


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: March 26th, 2009, 11:52 am 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7502
Location: Australia
Anyone interested in testing my optimization may download the latest revision (currently 22) of AutoHotkey_L.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: March 27th, 2009, 10:52 am 
Offline

Joined: May 27th, 2007, 9:41 am
Posts: 4999
Lexikos wrote:
Anyone interested in testing my optimization may download the latest revision (currently 22) of AutoHotkey_L.
Yep, If IN Matchlist is now as fast as InStr for the scripts I've tested it with.

_________________
AHK FAQ
TF : Text files & strings lib, TF Forum


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: April 15th, 2009, 6:45 pm 
Offline

Joined: March 2nd, 2004, 3:36 pm
Posts: 10720
This change has been applied to v1.0.48.01. Thanks for the code and the testing.


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: gamax92 and 2 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group