AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

"if Var in MatchList" is very slooooow

 
Reply to topic    AutoHotkey Community Forum Index -> Wish List
View previous topic :: View next topic  
Author Message
jaco0646



Joined: 07 Oct 2006
Posts: 2300
Location: MN, USA

PostPosted: Sat Mar 21, 2009 12:54 am    Post subject: "if Var in MatchList" is very slooooow Reply with quote

[Moved from Ask for Help forum.]
Could the solution offered here be added to the official release?

Can someone explain why "if Var in MatchList" is so much slower than a parsing loop, when they should be doing the same thing? Actually, I would expect the parsing loop to be a bit slower since it references a third variable, A_LoopField, instead of just a needle and matchlist.
Code:
SetBatchLines, -1

Loop, 100000
 matchlist .= A_Index ","
needle = 100000

Start_Time := A_TickCount
If needle IN %matchlist%
{
 Stop_Time := A_TickCount
 time1 := (Stop_Time - Start_Time)/1000 " seconds."
}

Start_Time := A_TickCount
Loop, Parse, matchlist, `,
 If (A_LoopField = needle)
 {
  Stop_Time := A_TickCount
  time2 := (Stop_Time - Start_Time)/1000 " seconds."
 }

MsgBox, The "IN" operator took:`t%time1%`nThe parsing loop took:`t%time2%

On my laptop, the IN operator is more than 10 times slower (1.53 seconds versus 0.13) Exclamation
_________________
http://autohotkey.net/~jaco0646/


Last edited by jaco0646 on Wed Mar 25, 2009 5:31 am; edited 2 times in total
Back to top
View user's profile Send private message Visit poster's website
Guest






PostPosted: Sat Mar 21, 2009 3:37 am    Post subject: Reply with quote

I believe that the number of comaprisons required for the IN command is what is causing the extra time.

Look at it this way. Your needle is 588895 bytes long and contains 100000 ","s.

For the loop, it must do:
588895 searches for ","
and compare the previous 6 bytes (100000 compares - one compare for each found ",")

For the IN it must do:
588895 compares (one 6 byte compare at each character postion in matchlist)
Back to top
Frankie



Joined: 02 Nov 2008
Posts: 828

PostPosted: Sat Mar 21, 2009 3:39 am    Post subject: Reply with quote

I tested it and your right. A work around:
Code:
Fruits := "Apple,Orange,Banana"

If In("Apple", Fruits)
  Msgbox In("Apple", Fruits) = True
Else
  Msgbox In("Apple", Fruits) = False

If In("Cookie", Fruits)
  Msgbox In("Cookie", Fruits) = True
Else
  Msgbox In("Cookie", Fruits) = False

In(String, MatchList, D=",") {
Loop, Parse, matchlist, `,
 If (A_LoopField = String)
  Return, True
Return, False
}

_________________
Click here to join #AHK channel in IRC for general chat and quick help.
Back to top
View user's profile Send private message AIM Address
jaco0646



Joined: 07 Oct 2006
Posts: 2300
Location: MN, USA

PostPosted: Sat Mar 21, 2009 4:37 am    Post subject: Reply with quote

I guess my question was, "Is there any difference between the IN operator and a parsing loop?" Because I don't see one... whatever the IN operator is doing is wasteful: it's accomplishing the same task, but 10x slower. It's like a compact car with the fuel mileage of a Hummer. What's the point?

Shouldn't it be programmed to just perform a parsing loop internally if that's the faster method?
_________________
http://autohotkey.net/~jaco0646/
Back to top
View user's profile Send private message Visit poster's website
Lexikos



Joined: 17 Oct 2006
Posts: 4726
Location: Australia

PostPosted: Sat Mar 21, 2009 5:08 am    Post subject: Reply with quote

They work differently, since if-in must support ,, to mean a literal comma. The actual work is done by IsStringInList() in util.cpp. I think the "wasteful" part might be:
Code:
   while (*this_field)  // For each field in aList.
   {
      // To avoid the need to constantly check for buffer overflow (i.e. to keep it simple),
      // just copy up to the limit of the buffer:
      strlcpy(buf, this_field, sizeof(buf));
...
For each field in the list, it copies up to LINE_SIZE (16384+1) characters from the list itself. That is, it potentially copies from the beginning of the current field to the end of the list, including all fields after it. (If each field was exactly 16384 characters, there would be no wastage.)

Parsing loops, on the other hand, create a single copy of the entire input string once. This copy is referred to for A_LoopField, rather than copying each field/at each iteration.

Note that "10x slower" isn't really accurate, as it depends on the length of each field and the overall length of the list. On my system, your test shows it is "20x slower".
Back to top
View user's profile Send private message Visit poster's website
Lexikos



Joined: 17 Oct 2006
Posts: 4726
Location: Australia

PostPosted: Sat Mar 21, 2009 5:34 am    Post subject: Reply with quote

Unless I have made an error,
Code:
   char *this_field = aList, *next_field, *cp;

   while (*this_field)  // For each field in aList.
   {
      // To avoid the need to constantly check for buffer overflow (i.e. to keep it simple),
      // just copy up to the limit of the buffer:
      strlcpy(buf, this_field, sizeof(buf));
      // Find the end of the field inside buf.  In keeping with the tradition set by the Input command,
      // this always uses comma rather than g_delimiter.
      for (cp = buf, next_field = this_field; *cp; ++cp, ++next_field)
      {
         if (*cp == ',')
         {
            if (cp[1] == ',') // Make this pair into a single literal comma.
            {
               memmove(cp, cp + 1, strlen(cp + 1) + 1);  // +1 to include the zero terminator.
               ++next_field;  // An extra increment since the source string still has both commas of the pair.
            }
            else // this comma marks the end of the field.
            {
               *cp = '\0';  // Terminate the buffer to isolate just the current field.
               break;
            }
         }
      }

      if (*next_field)  // The end of the field occurred prior to the end of aList.
         ++next_field; // Point it to the character after the delimiter (otherwise, leave it where it is).
may be replaced with
Code:
   char *this_field = aList, *next_field, *cp, *end_buf = buf + sizeof(buf) - 1;

   while (*this_field)  // For each field in aList.
   {
      for (cp = buf, next_field = this_field; cp < end_buf && *next_field; ++cp, ++next_field)
      {
         if (*next_field == ',') // Check if this is a delimiter or literal comma.
         {
            ++next_field;
            if (*next_field != ',') // Was it a delimiter?
               break;
            // Otherwise next_field now points at the second comma, so continue copying.
         }
         *cp = *next_field;
      }
      // Terminate the string in the buffer.
      *cp = '\0';
to the same effect, but considerably faster in most cases. Actually, with fields of approximately 16300 characters, both if-in methods were roughly the same, and both were faster than Loop,Parse. I think short fields are probably more common.
Back to top
View user's profile Send private message Visit poster's website
nick



Joined: 24 Aug 2005
Posts: 550
Location: Berlin / Germany

PostPosted: Sat Mar 21, 2009 1:30 pm    Post subject: Reply with quote

"If Var in" should be marked as obsolet:
Code:
#NoEnv
SetBatchLines, -1
SetFormat, Float, 0.15

Loop, 100
 matchlist .= A_Index . ","
needle = 50

T()
If needle IN %matchlist%
 time1 := T() . " seconds."
 
T()
If InStr("," . Matchlist, "," . Needle . ",")
 time2 := T() . " seconds."

T()
Loop, Parse, matchlist, `,
 If (A_LoopField = needle) {
  time3 := T() . " seconds."
  break
 }

MsgBox, The "IN" operator took:`t%time1%`nInStr() function took:`t%time2%`nThe parsing loop took:`t%time3%
ExitApp

T() {
   Static freq, last_count
   if !freq
      DllCall("QueryPerformanceFrequency","int64*",freq)
   DllCall("QueryPerformanceCounter","int64*",count)
   return (count-last_count)/freq, last_count:=count
}

_________________
nick Wink
Back to top
View user's profile Send private message
Lexikos



Joined: 17 Oct 2006
Posts: 4726
Location: Australia

PostPosted: Sat Mar 21, 2009 3:55 pm    Post subject: Reply with quote

It is pointless having 15 digits of "precision" when the "benchmark" results are inconsistent even with 5 digits. To be more accurate, repeat the test many times and average the result.
Code:
#NoEnv
SetBatchLines, -1
SetFormat, Float, 0.7
i = 10000

Loop, 100
    matchlist .= A_Index . ","
needle = 50

T()
Loop %i%
    If needle IN %matchlist%
        continue ; Do nothing.
time1 := T()/i . " seconds."
 
T()
Loop %i%
    If InStr("," . Matchlist, "," . Needle . ",")
        continue ; Do nothing.
time2 := T()/i . " seconds."

T()
Loop %i%
    Loop, Parse, matchlist, `,
        If (A_LoopField = needle)
            break ; Found match, just break inner loop and continue benchmark.
time3 := T()/i . " seconds."

MsgBox, The "IN" operator took:`t%time1%`nInStr() function took:`t`t%time2%`nThe parsing loop took:`t%time3%
ExitApp
Anyway, what was your point? I already pointed out an optimization, which brings if-in close to the InStr method. If-in still has readability benefits, which is likely to be more important than performance in many cases.
Back to top
View user's profile Send private message Visit poster's website
jaco0646



Joined: 07 Oct 2006
Posts: 2300
Location: MN, USA

PostPosted: Sun Mar 22, 2009 5:24 am    Post subject: Reply with quote

Lexikos wrote:
If-in still has readability benefits, which is likely to be more important than performance in many cases.
Absolutely, I would guess that readability is the main reason IF-IN was added to AHK at all. I just thought the performance gap between the two methods could be shortened, which your optimization proves. Thank you for weighing in on this, Lexikos; I had hoped you would.
_________________
http://autohotkey.net/~jaco0646/
Back to top
View user's profile Send private message Visit poster's website
nick



Joined: 24 Aug 2005
Posts: 550
Location: Berlin / Germany

PostPosted: Sun Mar 22, 2009 9:44 am    Post subject: Reply with quote

Well, so keep it, I don't need it. Wink
_________________
nick Wink
Back to top
View user's profile Send private message
hugov



Joined: 27 May 2007
Posts: 3273

PostPosted: Mon Mar 23, 2009 1:10 pm    Post subject: Reply with quote

Thanks jaco0646, I noticed IF IN Matchlist was slow for some of the
scripts I made, but never considered to change to InStr, it really
is much much faster now I changed it.
_________________
Tut 4 Newbies
TF : Text file & string lib, TF Forum
Back to top
View user's profile Send private message Visit poster's website
Lexikos



Joined: 17 Oct 2006
Posts: 4726
Location: Australia

PostPosted: Thu Mar 26, 2009 11:52 am    Post subject: Reply with quote

Anyone interested in testing my optimization may download the latest revision (currently 22) of AutoHotkey_L.
Back to top
View user's profile Send private message Visit poster's website
hugov



Joined: 27 May 2007
Posts: 3273

PostPosted: Fri Mar 27, 2009 10:52 am    Post subject: Reply with quote

Lexikos wrote:
Anyone interested in testing my optimization may download the latest revision (currently 22) of AutoHotkey_L.
Yep, If IN Matchlist is now as fast as InStr for the scripts I've tested it with.
_________________
Tut 4 Newbies
TF : Text file & string lib, TF Forum
Back to top
View user's profile Send private message Visit poster's website
Chris
Site Admin


Joined: 02 Mar 2004
Posts: 10666

PostPosted: Wed Apr 15, 2009 6:45 pm    Post subject: Reply with quote

This change has been applied to v1.0.48.01. Thanks for the code and the testing.
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Reply to topic    AutoHotkey Community Forum Index -> Wish List All times are GMT
Page 1 of 1

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group