Boolean Search as the needle searching a haystack

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Boolean Search as the needle searching a haystack

08 Sep 2014, 20:31

ahcahc,

(1) Great! I decided not to use « and » but went with µ for OR (micro sign-181) and ¡ for NOT (inverted exclamation mark-161). It's working fine.

(2 & 3) Thanks for the case insensitive and word boundary code - and for the heads-up about word boundary working only if the search text is alnum.

(4) Interesting idea! Hadn't thought of that angle. I'll have to decide which approach I prefer. My first thought was that I like the idea of allowing mixed case on the Boolean operators (users will probably like being able to enter all lower case), but the idea of being able to search for them is nice, too.

Will continue testing. Thanks again, Joe
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Boolean Search as the needle searching a haystack

08 Sep 2014, 21:05

ahcahc,

I just noticed this commented out line:

;Search3 :=RegExReplace(Search3,"(?<=\d)\s+?(?=\d)","*") ;In case regexmatch failed to match the entire sentence, search for space sorrounded by numbers and replace it with *

Should I be doing anything with this line? Should it be un-commented? Thanks, Joe
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Boolean Search as the needle searching a haystack

08 Sep 2014, 21:06

guest3456,
Thanks for the suggestion. Regards, Joe
ahcahc
Posts: 110
Joined: 25 Jul 2014, 23:55

Re: Boolean Search as the needle searching a haystack

08 Sep 2014, 23:12

JoeWinograd wrote:ahcahc,

I just noticed this commented out line:

;Search3 :=RegExReplace(Search3,"(?<=\d)\s+?(?=\d)","*") ;In case regexmatch failed to match the entire sentence, search for space sorrounded by numbers and replace it with *

Should I be doing anything with this line? Should it be un-commented? Thanks, Joe
Previously the final expression would have a space between two numbers because regex only partially matched the sentence and the resulting expression will be like (1 * 1 0), eval (1 * 1 0) would result to 10 but it should be zero so (1 * 1*0). If you get and expression like that, you can use that code fix to it. I think this wont likely happen now.

I've made it easier to replace the operators with different special characters and made the match-whole-word also include non-alphanumeric characters(if this fails you can always revert back to \b).

Code: Select all

string=
(
@utoHo:tkey? is a
better 
language``s's
than 
VBScr|pt
)
search = (@utoHo:tkey? OR Auto Hotkey OR Auto-Hotkey) AND language``s's AND NOT (Microsoft OR VBScr|pt)

op_and=¤    ;AND
op_or=µ     ;OR
op_not=¡     ;NOT
case_sense=1    ;case insensitive
whole_word=1    ;match whole word/sentence


option_start1=
option_start2=
option_end1=
option_end2=
if case_sense=1
   {
   option_start1 .= "i)"
   option_start2 .= "i)"
   }
if whole_word=1
   {
   ;option_start1 .= "\b"                    ;alphanumeric only
   ;option_end1 .= "\b"                     ;alphanumeric only
   ;option_start2 .= "\b"                   ;alphanumeric only
   ;option_end2 .= "\b"                    ;alphanumeric only
   option_start1 .= "(?:(?<=[\s" op_and op_or op_not "()])|^)"  ;alphanumeric + special characters
   option_end1 .= "(?:(?=[\s" op_and op_or op_not "()])|$)"     ;alphanumeric + special characters
   option_start2 .= "(?:(?<=\s)|^)"        ;alphanumeric + special characters
   option_end2 .= "(?:(?=\s)|$)"            ;alphanumeric + special characters
   }
search1 := RegExReplace(RegExReplace(RegExReplace(search,"i)\bAND\b",op_and),"i)\bOR\b",op_or),"i)\bNOT\b",op_not)
while pos := RegExMatch(search1,"((?:(?=\S)[^\s" op_and op_or op_not "()]+\s*)+)",m,A_Index=1?1:pos+StrLen(m))
   {
	if a_index=1
        search2 := RegExReplace(search1,option_start1 "\Q" trim(m) "\E" option_end1,RegExMatch(string,option_start2 "\Q" trim(m) "\E" option_end2)>0?1:0)
	else
        search2 := RegExReplace(search2,option_start1 "\Q" trim(m) "\E" option_end1,RegExMatch(string,option_start2 "\Q" trim(m) "\E" option_end2)>0?1:0)
	}
search3 := RegExReplace(search2,op_and,"*")    ;AND
search3 := RegExReplace(search3,op_or,"+")    ;OR
search3 := RegExReplace(search3,op_not,"!")    ;NOT
;Search3 :=RegExReplace(Search3,"(?<=\d)\s+?(?=\d)","*") ;In case regexmatch failed to match the entire sentence, search for space sorrounded by numbers and replace it with *
MsgBox % "String: " string "`nExpression0: " search "`nExpression1: " search1 "`nExpression2: " search2 "`nExpression3: " search3 "`nResult: " eval(search3)
if (eval(search3)>0)
	MsgBox do something here
Last edited by ahcahc on 09 Sep 2014, 08:16, edited 1 time in total.
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Boolean Search as the needle searching a haystack

09 Sep 2014, 07:24

Hi ahcahc,

Thank you for the new code - everything I've tested has worked perfectly so far! It's great to have code that makes it so easy to alter the operators with different special characters and to control the case-sense and whole-word options. The ability to include non-alphanumeric characters in a search is a nice bonus, especially valuable for the hyphen, which I'm sure users will need in search terms.

Question: Based on experimentation, it seems that the operator precedence is NOT, AND, OR, but I wanted to confirm that with you. I realize it depends on Eval, which is being fed:

! for NOT
* for AND
+ for OR

At first I thought it was being evaluated left-to-right, but this test proved otherwise:

string = AutoHotkey is a better language than VBScript
search = AutoHotkey OR VBScript AND Microsoft

Left-to-right on that would result in 0, while AND before OR results in 1, which is the actual result. I'm going to recommend that users take advantage of parentheses to control precedence, but I still need to explain what the default precedence is. Thanks, Joe
ahcahc
Posts: 110
Joined: 25 Jul 2014, 23:55

Re: Boolean Search as the needle searching a haystack

09 Sep 2014, 08:25

eval() treats the final expression just like any mathematical expression. The final expression should only contain 1, 0, +, *, ! or ().

Code: Select all

string = AutoHotkey is a better language than VBScript
search = AutoHotkey OR VBScript AND Microsoft
search = 1 OR 1 AND 0
search = 1 + 1 * 0
search = 1 + 0
search = 1
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Boolean Search as the needle searching a haystack

09 Sep 2014, 08:31

search = 1 + 1 * 0
The thing is, this evaluates to 0 left-to-right, but to 1 if the * operator is evaluated before the + operator.
ahcahc
Posts: 110
Joined: 25 Jul 2014, 23:55

Re: Boolean Search as the needle searching a haystack

09 Sep 2014, 09:02

Code: Select all

algebra: 
x=1
y=1
z=0
a= x+yz    ;1 OR 1 AND 0
a= 1+1*0
a= 1+0
a=1
I have simplified this part of the code, use the code below:

Code: Select all

while pos := RegExMatch(search1,"((?:(?=\S)[^\s" op_and op_or op_not "()]+\s*)+)",m,A_Index=1?1:pos+StrLen(m))
   search2 := RegExReplace(A_Index=1?search1:search2,option_start1 "\Q" trim(m) "\E" option_end1,RegExMatch(string,option_start2 "\Q" trim(m) "\E" option_end2)>0?1:0)
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Boolean Search as the needle searching a haystack

09 Sep 2014, 09:41

Yes, it's true that standard algebraic precedence is multiplication before addition, but in this case, it depends on how Eval was programmed, and I didn't study it well enough to figure that out. Also, there are different ways that unary operators (such as !) are handled, so that also depends on how Eval was programmed. For example, the calculator built into Windows does left-to-right precedence in Standard mode, so entering 1+1*0 results in 0, not 1; however, in Scientific mode, it does standard math precedence, so entering 1+1*0 results in 1. But it's good to know that Eval was programmed to perform multiplication before addition - thanks for explaining that (and it handles the unary operator ! first - also good).

Regarding the new, simplified code, so far it has worked perfectly! Thank you for your ongoing efforts to improve the solution. Regards, Joe
ahcahc
Posts: 110
Joined: 25 Jul 2014, 23:55

Re: Boolean Search as the needle searching a haystack

09 Sep 2014, 14:25

It seems I missed this part in the eval() ; Logic operators: !, ||, &&; ternary operator: (_?_:_);
You can replace those search3 with:

Code: Select all

search3 := RegExReplace(search2,op_and,"&&")    ;AND
search3 := RegExReplace(search3,op_or,"||")    ;OR
search3 := RegExReplace(search3,op_not,"!")    ;NOT
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Boolean Search as the needle searching a haystack

09 Sep 2014, 14:51

Very nice! That eliminates results greater than 1. For example:

string = AutoHotkey is a better language than VBScript
search = AutoHotkey or language or VBScript

The Eval result with the previous code was 3. With the new code, it is 1. Excellent! Regards, Joe
User avatar
FanaticGuru
Posts: 1906
Joined: 30 Sep 2013, 22:25

Re: Boolean Search as the needle searching a haystack

10 Sep 2014, 02:06

I know this has pretty much been solved to the satisfaction of the original post but it caught my eye and looked like an interesting parsing exercise.

Here is my more concise approach with everything in one function without the dependency of a library function which seems overkill.

Code: Select all

Haystack := "Hello world, this is a beta test" ; TRUE
Needle := "(hello AND world) AND (is a beta AND (test AND (NOT production)))"

;~ Haystack := "Hello world, this is a beta test" ; TRUE
;~ Needle := "(hello AND world) OR (is a alpha AND (test AND (NOT production)))"

;~ Haystack := "Hello world, this i% a beta test" ; TRUE
;~ Haystack := "Hello worlds, this i% a beta test" ; TRUE
;~ Haystack := "Helloworlds, this i% a beta test" ; FALSE
;~ Haystack := "Hello world , this i% a beta test" ; FALSE
;~ Haystack := "Hello world, this i% a beta test end" ; FALSE
;~ Haystack := " World, Hello, this i% a beta test" ; FALSE
;~ Needle := "(%%hello(?=.*world)%% AND %% worlds?,%%) AND (i% a beta AND (%%test$%% AND (NOT production)))"

;~ Haystack := "(Hello) worlds, this i% a beta test" ; TRUE
;~ Haystack := "Hello worlds, this i% a beta test" ; FALSE
;~ Needle := "(%%\(hello\)%% AND %% worlds?,%%) AND (i% a beta AND (%%test$%% AND (NOT production)))"


if SearchBoolean(Haystack, Needle, false)
	MsgBox TRUE
else
	MsgBox FALSE

SearchBoolean(H, N, C:=false) ; Haystack, Needle, Case-sensitive
{
	NT := RegExReplace(N, "AND|OR|NOT", "¥")
	Loop, Parse, NT, ¥
		if T := Trim(A_LoopField, " ()")
			if RegExMatch(T,"^%%(.*)%%$", M)
				N := RegExReplace(N, (C ? "" : "i)")"\Q" T "\E", RegExMatch(H, (C ? "" : "i)") M1) ? 1 : 0,, 1)
			else
				N := RegExReplace(N, (C ? "" : "i)")"\Q" T "\E", RegExMatch(H, (C ? "" : "i)")"\Q" T "\E") ? 1 : 0,, 1)
	Loop
	{
		NN := N, NNP := 1
		Loop
			if (P := RegExMatch(NN, "U)\(\s*([^\(]*)\s*\)", M))
				NN := M1, NNP := P
			else
				break
		NNN :=  NN, X := 1
		Loop
		{
			if RegExMatch(NNN, "(NOT)\s*(\d)", M)
				NNN := RegExReplace(NNN, "NOT\s*\d", X := ! M2,, 1)
			if !RegExMatch(NNN, "(\d)\s*(AND|OR)\s*(\d)", M)
				break
			else if (M2 = "AND")
				X := M1 & M3 & X
			else if (M2 = "OR")
				X := M1 | M3 & X
			NNN := RegExReplace(NNN, "\d\s*(AND|OR)\s*\d", X,, 1)
		}
		N := RegExReplace(N, "\(?\s*" NN "\s*\)?", X,, 1, NNP)
		if (N = 1 or N = 0)
			return N
	}
}
It should accept any characters other than () AND OR NOT ¥ as those have special meaning. I toyed with allowing wildcards and such. And also with allowing searches that include () AND OR NOT as words but then you have to force the user to put things in quotes or some other escaping to differentiate between things to actually search for and conditional commands. Then I figured it is probably more important to keep things simple for the user.

I have tested this practically none but it seems to work ok. If you want to take this approach and find a Needle that it does not handle properly let me know.

EDIT: Added a way for power users to have some control without burdening the normal user. Any search item that begin and ends with %% are treated as RegEx expressions.

FG
Hotkey Help - Help Dialog for Currently Running AHK Scripts
AHK Startup - Consolidate Multiply AHK Scripts with one Tray Icon
Hotstring Manager - Create and Manage Hotstrings
[Class] WinHook - Create Window Shell Hooks and Window Event Hooks
ahcahc
Posts: 110
Joined: 25 Jul 2014, 23:55

Re: Boolean Search as the needle searching a haystack

10 Sep 2014, 04:08

FanaticGuru wrote:I know this has pretty much been solved to the satisfaction of the original post but it caught my eye and looked like an interesting parsing exercise.

Here is my more concise approach with everything in one function without the dependency of a library function which seems overkill.

Code: Select all

Haystack := "Hello world, this is a beta test" ; TRUE
Needle := "(hello AND world) AND (is a beta AND (test AND (NOT production)))"

;~ Haystack := "Hello world, this is a beta test" ; TRUE
;~ Needle := "(hello AND world) OR (is a alpha AND (test AND (NOT production)))"

;~ Haystack := "Hello world, this i% a beta test" ; TRUE
;~ Haystack := "Hello worlds, this i% a beta test" ; TRUE
;~ Haystack := "Helloworlds, this i% a beta test" ; FALSE
;~ Haystack := "Hello world , this i% a beta test" ; FALSE
;~ Haystack := "Hello world, this i% a beta test end" ; FALSE
;~ Haystack := " World, Hello, this i% a beta test" ; FALSE
;~ Needle := "(%%hello(?=.*world)%% AND %% worlds?,%%) AND (i% a beta AND (%%test$%% AND (NOT production)))"

;~ Haystack := "(Hello) worlds, this i% a beta test" ; TRUE
;~ Haystack := "Hello worlds, this i% a beta test" ; FALSE
;~ Needle := "(%%\(hello\)%% AND %% worlds?,%%) AND (i% a beta AND (%%test$%% AND (NOT production)))"


if SearchBoolean(Haystack, Needle, false)
	MsgBox TRUE
else
	MsgBox FALSE

SearchBoolean(H, N, C:=false) ; Haystack, Needle, Case-sensitive
{
	NT := RegExReplace(N, "AND|OR|NOT", "¥")
	Loop, Parse, NT, ¥
		if T := Trim(A_LoopField, " ()")
			if RegExMatch(T,"^%%(.*)%%$", M)
				N := RegExReplace(N, (C ? "" : "i)")"\Q" T "\E", RegExMatch(H, (C ? "" : "i)") M1) ? 1 : 0,, 1)
			else
				N := RegExReplace(N, (C ? "" : "i)")"\Q" T "\E", RegExMatch(H, (C ? "" : "i)")"\Q" T "\E") ? 1 : 0,, 1)
	Loop
	{
		NN := N, NNP := 1
		Loop
			if (P := RegExMatch(NN, "U)\(\s*([^\(]*)\s*\)", M))
				NN := M1, NNP := P
			else
				break
		NNN :=  NN, X := 1
		Loop
		{
			if RegExMatch(NNN, "(NOT)\s*(\d)", M)
				NNN := RegExReplace(NNN, "NOT\s*\d", X := ! M2,, 1)
			if !RegExMatch(NNN, "(\d)\s*(AND|OR)\s*(\d)", M)
				break
			else if (M2 = "AND")
				X := M1 & M3 & X
			else if (M2 = "OR")
				X := M1 | M3 & X
			NNN := RegExReplace(NNN, "\d\s*(AND|OR)\s*\d", X,, 1)
		}
		N := RegExReplace(N, "\(?\s*" NN "\s*\)?", X,, 1, NNP)
		if (N = 1 or N = 0)
			return N
	}
}
It should accept any characters other than () AND OR NOT ¥ as those have special meaning. I toyed with allowing wildcards and such. And also with allowing searches that include () AND OR NOT as words but then you have to force the user to put things in quotes or some other escaping to differentiate between things to actually search for and conditional commands. Then I figured it is probably more important to keep things simple for the user.

I have tested this practically none but it seems to work ok. If you want to take this approach and find a Needle that it does not handle properly let me know.

EDIT: Added a way for power users to have some control without burdening the normal user. Any search item that begin and ends with %% are treated as RegEx expressions.

FG

Code: Select all

Haystack:="hello word Microsoft"
Needle := "NOT (Microsoft word)"
Should be true. Need more tweaking.
User avatar
LinearSpoon
Posts: 156
Joined: 29 Sep 2013, 22:55

Re: Boolean Search as the needle searching a haystack

10 Sep 2014, 07:22

I had wrote this a few days ago, but I didn't post it because it seemed you had a solution. For those versed in CS, this is a recursive descent parser for the grammar:

Code: Select all

O -> A (OR A)* | A
A -> N (AND N)* | N
N -> NOT* T | T
T -> {text} | (O)
I'm leaving for classes in a few minutes, so it's not as well tested as I would like. I'll be back in a few hours if you have questions.

Code: Select all

searchText := "AutoHotkey is a better language than VBScript"

testcases =
(
(((((not hello)))))
(AutoHotkey or xyz) and not zzz
VBScript and AutoHotkey
x OR VBScript AND x
(x OR VBScript) AND VBScript
AutoHotkey is a better language AND VBscript
NOT a big long string of text AND VBscript
VBScript and not (language and xxx)

VBscript and AutoHotkey and VBscript and AutoHotkey and VBscript
Not Not not x
AutoHotkey and (VBscript and AutoHotkey and VBscript) and AutoHotkey
VBScript OR x OR language OR is OR z
)

Loop, Parse, testcases, `n
  output .= A_LoopField " = " parse(A_LoopField, searchText) "`n"
Msgbox % searchText "`n" output
ExitApp

parse(search, byref haystack)
{
  if (!search)
    return 1
  x := new symbols
  search := RegexReplace(search, "i)(^|[\(\s])\K(and|or|not)(?=\s)", "|$2|")
  search := RegexReplace(search, "(\)|\()", "|$1|")
  Loop, Parse, search, |, %A_Space%%A_Tab%
  {
    if (A_LoopField = "")
      continue
    if (A_LoopField = "(")
      x.Insert({token:"LPAREN", value:"("})
    else if (A_LoopField = ")")
      x.Insert({token:"RPAREN", value:")"})
    else if RegexMatch(A_LoopField, "i)^(and|or|not)$")
      x.Insert({token:A_LoopField, value:A_LoopField})
    else
      x.Insert({token:"TEXT", value:A_LoopField})
  }

  return O(x, haystack)
}

T(tokens, byref haystack)
{
  if (tokens.current().token = "LPAREN")
  {
    tokens.advance()
    r := O(tokens, haystack)
    if (tokens.current().token != "RPAREN")
      throw ("Expected "")"" but got " tokens.current().value "`n" tokens.pos)
    tokens.advance()
    return r
  }

  if (tokens.current().token != "TEXT")
    throw ("Expected text but got " tokens.current().token "`n" tokens.pos)
  r := Instr(haystack, tokens.current().value)
  tokens.advance()
  return r
}

N(tokens, byref haystack)
{
  n := 0
  Loop
  {
    if (tokens.current().token != "NOT")
    {
      v := T(tokens, haystack)
      return n ? !v : v
    }
    n := !n
    tokens.advance()
  } until (tokens.end() || tokens.current().token = "RPAREN")
}

A(tokens, byref haystack)
{
  r := N(tokens, haystack)
  Loop
  {
    if (tokens.current().token != "AND")
      return r
    tokens.advance()

    r := N(tokens, haystack) && r
  } until (tokens.end() || tokens.current().token = "RPAREN")
  return r
}

O(tokens, byref haystack)
{
  r := A(tokens, haystack)
  Loop
  {
    if (tokens.current().token != "OR")
      return r
    tokens.advance()

    r := A(tokens, haystack) || r
  } until (tokens.end() || tokens.current().token = "RPAREN")
  return r
}


class symbols
{
  __New()
  {
    this.pos := 1
  }

  advance()
  {
    return this[++this.pos]
  }

  current()
  {
    return this[this.pos]
  }

  end()
  {
    return this[this.pos] = ""
  }
}
ahcahc
Posts: 110
Joined: 25 Jul 2014, 23:55

Re: Boolean Search as the needle searching a haystack

10 Sep 2014, 08:13

Yes, MONSTER's Evaluate Arithmetic Expression function is overkill. So I made a simple boolean logic evaluator. So far it has not failed (yet).
Edit: Updated the function

Code: Select all

expression = 1 && 0 || !(0||   0)  ;1
MsgBox % eval_logic(expression)

expression = (1 || 0|| 0  ) && 1 && !(0||   0)  ;1
MsgBox % eval_logic(expression)

expression = 1 OR 1 AND 0   ; 1
MsgBox % eval_logic(expression,"AND","OR","NOT")

expression = (1 + 1  ) * 1 * 0  ;0
MsgBox % eval_logic(expression,"*","+","!")

eval_logic(expr, op_and:="&&", op_or:="||", op_not:="!") {
tmp:=expr
expr:=RegExReplace(expr,"\s+")
expr:=RegExReplace(expr,"11+","1")
expr:=RegExReplace(expr,"00+","0")
if RegExMatch(expr,"[^10()" op_and op_or op_not "]",m)
   return expr "`n" "Invalid Character: " m "`nAllowed Characters: 1 0 ( ) " op_and " " op_or " " op_not
Loop
   {
	;AND
	x:=expr
	while pos := RegExMatch(expr,"\Q" op_and "\E",m,A_Index=1?1:pos+StrLen(m))
		{
		StringReplace, x, x, 1%op_and%1, 1, All
		StringReplace, x, x, 0%op_and%0, 0, All
		StringReplace, x, x, 0%op_and%1, 0, All
		StringReplace, x, x, 1%op_and%0, 0, All
		StringReplace, x, x, 01, 0, All    ; 0 AND 1 = 0
		StringReplace, x, x, 10, 0, All    ;1 AND 0 = 0
		}
	expr:=x

	;OR
	StringReplace, expr, expr, 1%op_or%1, 1, All
	StringReplace, expr, expr, 0%op_or%0, 0, All
	StringReplace, expr, expr, 0%op_or%1, 1, All
	StringReplace, expr, expr, 1%op_or%0, 1, All

	StringReplace, expr, expr, (1), 1, All
	StringReplace, expr, expr, (0), 0, All

	;NOT
	StringReplace, expr, expr, %op_not%1, 0, All
	StringReplace, expr, expr, %op_not%0, 1, All

	if (expr="1" or expr="0")
	  return expr
	if (A_Index>500)
	  return "Invalid or Incomplete Expression`n" tmp "`nAllowed Characters: 1 0 ( ) " op_and " " op_or " " op_not
	}
}
Search with boolean logic evaluator

Code: Select all

string=
(
@utoHo:tkey? is a
better 
language``s's
than 
hello VBScript is always a scr1ipt
 Microsoft word
)
;string = word hello Microsoft
;search = (@utoHo:tkey? OR Auto Hotkey OR Auto-Hotkey) AND language``s's AND NOT (Microsoft OR VBScr|pt)
;search =  NOT (Microsoft word)
search =  (@utoHo:tkey? OR Auto Hotkey OR Auto-Hotkey) AND language``s's AND NOT (Microsoft word OR VBScript is always a script)



op_and=¤    ;AND
op_or=µ     ;OR
op_not=¡     ;NOT
case_sense=1    ;case insensitive
whole_word=0         ;match whole word/sentence

option_start1=
option_start2=
option_end1=
option_end2=
if case_sense=1
   {
   option_start1 .= "i)"
   option_start2 .= "i)"
   }
if whole_word=1
   {
   ;option_start1 .= "\b"                    ;alphanumeric only
   ;option_end1 .= "\b"                     ;alphanumeric only
   ;option_start2 .= "\b"                   ;alphanumeric only
   ;option_end2 .= "\b"                    ;alphanumeric only
   option_start1 .= "(?:(?<=[\s" op_and op_or op_not "()])|^)"  ;alphanumeric + special characters
   option_end1 .= "(?:(?=[\s" op_and op_or op_not "()])|$)"     ;alphanumeric + special characters
   option_start2 .= "(?:(?<=\s)|^)"        ;alphanumeric + special characters
   option_end2 .= "(?:(?=\s)|$)"            ;alphanumeric + special characters
   }
search1 := RegExReplace(RegExReplace(RegExReplace(search,"i)\bAND\b",op_and),"i)\bOR\b",op_or),"i)\bNOT\b",op_not)
while pos := RegExMatch(search1,"((?:(?=\S)[^\s" op_and op_or op_not "()]+\s*)+)",m,A_Index=1?1:pos+StrLen(m))
   search2 := RegExReplace(A_Index=1?search1:search2,option_start1 "\Q" trim(m) "\E" option_end1,RegExMatch(string,option_start2 "\Q" trim(m) "\E" option_end2)>0?1:0)

search3 := RegExReplace(search2,op_and,"&&")    ;AND
search3 := RegExReplace(search3,op_or,"||")    ;OR
search3 := RegExReplace(search3,op_not,"!")    ;NOT

MsgBox % "String: " string "`nExpression0: " search "`nExpression1: " search1 "`nExpression2: " search2 "`nExpression3: " search3 "`neval_logic Result: " eval_logic(search3) 
if (eval_logic(search3)>0)
	MsgBox do something here

ExitApp

eval_logic(expr, op_and:="&&", op_or:="||", op_not:="!") {
tmp:=expr
expr:=RegExReplace(expr,"\s+")
expr:=RegExReplace(expr,"11+","1")
expr:=RegExReplace(expr,"00+","0")
if RegExMatch(expr,"[^10()" op_and op_or op_not "]",m)
   return expr "`n" "Invalid Character: " m "`nAllowed Characters: 1 0 ( ) " op_and " " op_or " " op_not
Loop
   {
	;AND
	x:=expr
	while pos := RegExMatch(expr,"\Q" op_and "\E",m,A_Index=1?1:pos+StrLen(m))
		{
		StringReplace, x, x, 1%op_and%1, 1, All
		StringReplace, x, x, 0%op_and%0, 0, All
		StringReplace, x, x, 0%op_and%1, 0, All
		StringReplace, x, x, 1%op_and%0, 0, All
		StringReplace, x, x, 01, 0, All    ; 0 AND 1 = 0
		StringReplace, x, x, 10, 0, All    ;1 AND 0 = 0
		}
	expr:=x

	;OR
	StringReplace, expr, expr, 1%op_or%1, 1, All
	StringReplace, expr, expr, 0%op_or%0, 0, All
	StringReplace, expr, expr, 0%op_or%1, 1, All
	StringReplace, expr, expr, 1%op_or%0, 1, All

	StringReplace, expr, expr, (1), 1, All
	StringReplace, expr, expr, (0), 0, All

	;NOT
	StringReplace, expr, expr, %op_not%1, 0, All
	StringReplace, expr, expr, %op_not%0, 1, All

	if (expr="1" or expr="0")
	  return expr
	if (A_Index>500)
	  return "Invalid or Incomplete Expression`n" tmp "`nAllowed Characters: 1 0 ( ) " op_and " " op_or " " op_not
	}
}
Last edited by ahcahc on 10 Sep 2014, 11:16, edited 2 times in total.
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Boolean Search as the needle searching a haystack

10 Sep 2014, 09:37

Hi ahcahc, FanaticGuru, and LinearSpoon,
Sorry for the delayed response, but I am in the US Central Time zone and all of this occurred while I was sleeping. Thanks to all of you for your efforts in improving the solution. I'll try to do some testing later today and Friday (I'm tied up this morning and all day tomorrow). Regards, Joe
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Boolean Search as the needle searching a haystack

10 Sep 2014, 13:26

Hi ahcahc,

The new eval_logic code works great when the search term is valid (at least, none of my tests has failed yet). However, if the search term contains unpaired parentheses or one of the three invalid search characters, it hangs. For example, a search for

AutoHotkey)

or

AutoHotkey%op_and%

hangs. I don't care about this, but wanted to pass it along to you. I'm going to handle it by checking the search term right after the user inputs it to make sure that it (1) contains an equal number of left and right parentheses and (2) does not contain one of the three invalid characters. Not being a RegEx expert, I'll probably do something like this right after inputting the search term from the user:

Code: Select all

If (InStr(search,op_and) or InStr(search,op_or) or InStr(search,op_not))
{
  MsgBox,4112,Fatal Error,These three characters are not allowed in a search`n%op_and_out%`n%op_or_out%`n%op_not_out%
  ExitApp
}
StringReplace,search,search,(,(,UseErrorLevel
NumLeft:=ErrorLevel
StringReplace,search,search,),),UseErrorLevel
NumRight:=ErrorLevel
If (NumLeft<>NumRight)
{
  MsgBox,4112,Fatal Error,There are %NumLeft% left parentheses and %NumRight% right parentheses in`n%search%
  ExitApp
}
I'm sure you could do it much more elegantly with RegEx, but the above is in my comfort zone. :)

I've defined the three "_out" variables to contain some text describing each character (e.g., "upside down exclamation mark - ASCII code 161"), so it's easier for the user to understand which characters are invalid. Regards, Joe
User avatar
FanaticGuru
Posts: 1906
Joined: 30 Sep 2013, 22:25

Re: Boolean Search as the needle searching a haystack

10 Sep 2014, 14:24

ahcahc wrote:

Code: Select all

Haystack:="hello word Microsoft"
Needle := "NOT (Microsoft word)"
Should be true. Need more tweaking.
Superfluous () were causing problems. NOT Microsoft word worked fine but the unneeded NOT (Microsoft word) did not.

Add a check to remove unneeded ().

Code: Select all

;~ Haystack := "Hello world, this is a beta test" ; TRUE
;~ Needle := "(hello AND world) AND (is a beta AND (test AND (NOT production)))"

;~ Haystack:="hello word Microsoft" ; TRUE
;~ Needle := "NOT (Microsoft word)"

Haystack:="hello word Microsoft" ; TRUE
Needle := "word"

;~ Haystack:="hello word Microsoft" ; FALSE
;~ Needle := "NOT word"

;~ Haystack:="hello word Microsoft" ; TRUE
;~ Needle := "NOT Microsoft word"

;~ Haystack:="hello word Microsoft" ; FALSE
;~ Needle := "NOT ( ( (word Microsoft)) )"

;~ Haystack:="hello word Microsoft" ; FALSE
;~ Needle := "NOT (Microsoft word) AND NOT ((hello))"

;~ Haystack:="hello ((Microsoft word Microsoft))" ; FALSE
;~ Needle := "NOT Microsoft word"

;~ Haystack := "Hello world, this is a beta test" ; TRUE
;~ Needle := "(hello AND world) OR (is a alpha AND (test AND (NOT production)))"

;~ Haystack := "Hello world, this i% a beta test" ; TRUE
;~ Haystack := "Hello worlds, this i% a beta test" ; TRUE
;~ Haystack := "Helloworlds, this i% a beta test" ; FALSE
;~ Haystack := "Hello world , this i% a beta test" ; FALSE
;~ Haystack := "Hello world, this i% a beta test end" ; FALSE
;~ Haystack := " World, Hello, this i% a beta test" ; FALSE
;~ Needle := "(%%hello(?=.*world)%% AND %% worlds?,%%) AND (i% a beta AND (%%test$%% AND (NOT production)))"

;~ Haystack := "(Hello) worlds, this i% a beta test" ; TRUE
;~ Haystack := "Hello worlds, this i% a beta test" ; FALSE
;~ Needle := "(%%\(hello\)%% AND %% worlds?,%%) AND (i% a beta AND (%%test$%% AND (NOT production)))"


if SearchBoolean(Haystack, Needle, false)
	MsgBox TRUE
else
	MsgBox FALSE

SearchBoolean(H, N, C:=false) ; Haystack, Needle, Case-sensitive
{
	NT := RegExReplace(N, "AND|OR|NOT", "¥")
	Loop, Parse, NT, ¥
		if T := Trim(A_LoopField, " ()")
			if RegExMatch(T,"^%%(.*)%%$", M)
				N := RegExReplace(N, (C ? "" : "i)")"\Q" T "\E", RegExMatch(H, (C ? "" : "i)") M1) ? 1 : 0,, 1)
			else
				N := RegExReplace(N, (C ? "" : "i)")"\Q" T "\E", RegExMatch(H, (C ? "" : "i)")"\Q" T "\E") ? 1 : 0,, 1)
	while !(N =1 or N = 0)
	{
		N := RegExReplace(N, "\([\(\s]*(\d)[\)\s]*\)", " $1 ")
		NN := N, NNP := 1
		Loop
			if (P := RegExMatch(NN, "U)\(\s*([^\(]*)\s*\)", M))
				NN := M1, NNP := P
			else
				break
		NNN :=  NN, X := 1
		Loop
		{
			while RegExMatch(NNN, "NOT\s*(\d)", M)
                NNN := RegExReplace(NNN, "NOT\s*\d", X := ! M1,, 1)
			if !RegExMatch(NNN, "(\d)\s*(AND|OR)\s*(\d)", M)
				break
			else if (M2 = "AND")
				X := M1 & M3 & X
			else if (M2 = "OR")
				X := M1 | M3 & X
			NNN := RegExReplace(NNN, "\d\s*(AND|OR)\s*\d", X,, 1)
		}
		N := RegExReplace(N, "\(?\s*" NN "\s*\)?", X,, 1, NNP)
	}
	return N
}
EDIT: Small change to handle single item checks that have no AND OR NOT in them.

FG
Hotkey Help - Help Dialog for Currently Running AHK Scripts
AHK Startup - Consolidate Multiply AHK Scripts with one Tray Icon
Hotstring Manager - Create and Manage Hotstrings
[Class] WinHook - Create Window Shell Hooks and Window Event Hooks
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Boolean Search as the needle searching a haystack

10 Sep 2014, 22:30

Hi FG,

I've been testing your updated code that removes the superfluous parens. It has worked on many complex expressions, but amazingly fails on this simple one:

Haystack:="hello world"
Needle:="goodbye"

Should be FALSE, but comes up TRUE.

You're right about keeping things simple for the users, but I do want a case sensitive/insensitive option and a whole-word option. I'll get those from the users via GUI radio buttons, so it will be very easy for them to understand and use. I don't think that the ability to search for AND, OR, NOT is important, but users are sometimes hard to predict. :) Wildcards is an interesting idea, but my thinking right now is that a non-whole-word search is a good enough wildcard subset, without any of the complexity of wildcard characters.

Nice addition on treating a search that begins and ends with a % as a RegEx, although I already have a radio button for the type of search (Simple, Boolean, RegEx). I think that's an easier interface for the users, although anyone who can do a RegEx can certainly handle the %...% syntax. :) Regards, Joe

UPDATE: I just saw your change to handle single items that have no AND OR NOT in them. That fixed the problem shown above, although I haven't regression tested it. Thanks for the fix!
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Boolean Search as the needle searching a haystack

10 Sep 2014, 22:51

Hi LinearSpoon,

Your code has worked on everything I've tried. Nice approach on the test cases - great idea being able to process all of them in a single run via the loop/parse/`n technique.

It throws an error on un-paired parentheses, as you anticipated with the Throw command, and that's fine - as I mentioned to ahcahc, I'll make sure there are an equal number of lefts and rights before calling the search.

I don't understand the "class symbols" statement. It's not a label and not a function and I couldn't find it described in the AHK doc. What is it?
I'm leaving for classes in a few minutes,
Computer Science, I presume? :)

Thanks, Joe

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: doodles333 and 311 guests