Page 1 of 1

Help with Grep issue

Posted: 22 Oct 2022, 19:35
by maitresin
Hi,

What is wrong with my grep? It should not msgbox "ABC Group"?


This is a line in the download.htm file
<div class="geodir-post-meta-container bsui sdel-298db276" ><div class="geodir_post_meta text-left d-block geodir-field-banniere"><span class="geodir_post_meta_icon geodir-i-select" style=""><i class="fas fa-flag fa-fw" aria-hidden="true"></i> <span class="geodir_post_meta_title gv-secondary" >Banner: </span></span>ABC Group</div></div>

Code: Select all

numpad2::

FileRead, FileRead1, DOWNLOAD.HTM
	g:=grep(fileread1, "sU)Banner: </span></span>(.*)</div></div>")
	for i, v in g
	{
	result1 := v.1
	msgbox, %result1%
	}
	return


grep(haystack, needle)
{
    a:=[], match := "", pos := 1
    while pos:=RegExMatch(haystack, needle, match, pos+StrLen(match))
        a[A_Index]:= {"match": match, 1: match1}
    Return a
}
return

Re: Help with Grep issue

Posted: 22 Oct 2022, 22:17
by boiler
Are you sure your file is right? Run the following:

Code: Select all

FileRead1 = <div class="geodir-post-meta-container bsui sdel-298db276" ><div class="geodir_post_meta text-left d-block geodir-field-banniere"><span class="geodir_post_meta_icon geodir-i-select" style=""><i class="fas fa-flag fa-fw" aria-hidden="true"></i> <span class="geodir_post_meta_title gv-secondary" >Banner: </span></span>ABC Group</div></div>
g:=grep(fileread1, "sU)Banner: </span></span>(.*)</div></div>")
for i, v in g
{
	result1 := v.1
	msgbox, %result1%
}
return


grep(haystack, needle)
{
    a:=[], match := "", pos := 1
    while pos:=RegExMatch(haystack, needle, match, pos+StrLen(match))
        a[A_Index]:= {"match": match, 1: match1}
	Return a
}

Re: Help with Grep issue

Posted: 23 Oct 2022, 17:35
by maitresin
Like this it works but from the file it does not work.

I did copy the line exactly as per the file and it work if I put it manually in the variable FileRead1, but from the file it refuse to work.

Is there linefeed or carriage that block from file?

Re: Help with Grep issue  Topic is solved

Posted: 23 Oct 2022, 17:51
by boiler
Maybe instead of sU), try `aU) as the RegEx pattern’s options.

Are you sure there are no other characters within what you’ve shown? Can you attach a sample file that doesn’t work?

Re: Help with Grep issue

Posted: 23 Oct 2022, 18:01
by maitresin
ok I found. Sorry my mistake there was a special character with an accent

Regex does not recognize accent characters like "é,à,è..."

Re: Help with Grep issue

Posted: 23 Oct 2022, 18:29
by boiler
maitresin wrote:
23 Oct 2022, 18:01
Regex does not recognize accent characters like "é,à,è..."
Yes it does. Run the following

Code: Select all

FileRead1 = <div class="geodir-post-meta-container bsui sdel-298db276" ><div class="geodir_post_meta text-left d-block geodir-field-banniere"><span class="geodir_post_meta_icon geodir-i-select" style=""><i class="fas fa-flag fa-fw" aria-hidden="true"></i> <span class="geodir_post_meta_title gv-secondary" >Banner: </span></span>àé Group</div></div>
g:=grep(fileread1, "sU)Banner: </span></span>(.*)</div></div>")
for i, v in g
{
	result1 := v.1
	msgbox, %result1%
}
return


grep(haystack, needle)
{
    a:=[], match := "", pos := 1
    while pos:=RegExMatch(haystack, needle, match, pos+StrLen(match))
        a[A_Index]:= {"match": match, 1: match1}
	Return a
}

Re: Help with Grep issue

Posted: 23 Oct 2022, 18:59
by maitresin
It is the Fileread that does not get character correctly when there is an accent

I did change only this section of the line: </span></spanàé>ABC Group</div></div>


Test1 work because the variable did store your input correctly

Code: Select all

FileRead1 = <div class="geodir-post-meta-container bsui sdel-298db276" ><div class="geodir_post_meta text-left d-block geodir-field-banniere"><span class="geodir_post_meta_icon geodir-i-select" style=""><i class="fas fa-flag fa-fw" aria-hidden="true"></i> <span class="geodir_post_meta_title gv-secondary" >Banner: </span></spanàé>ABC Group</div></div>
msgbox, %FileRead1%
g:=grep(fileread1, "sU)Banner: </span></spanàé>(.*)</div></div>")
for i, v in g
{
	result1 := v.1
	msgbox, %result1%
}
return


grep(haystack, needle)
{
    a:=[], match := "", pos := 1
    while pos:=RegExMatch(haystack, needle, match, pos+StrLen(match))
        a[A_Index]:= {"match": match, 1: match1}
	Return a
}

Test2 Save the line into test.txt and you can see from the msgbox the character with accent are wrong and code does not work

Code: Select all

Fileread, FileRead1, test.txt
msgbox, %FileRead1%
g:=grep(fileread1, "sU)Banner: </span></spanàé>(.*)</div></div>")
for i, v in g
{
	result1 := v.1
	msgbox, %result1%
}
return


grep(haystack, needle)
{
    a:=[], match := "", pos := 1
    while pos:=RegExMatch(haystack, needle, match, pos+StrLen(match))
        a[A_Index]:= {"match": match, 1: match1}
	Return a
}


Re: Help with Grep issue

Posted: 23 Oct 2022, 19:53
by boiler
That’s likely because you haven’t saved the files (both the file containing those characters and your script file) with the right encoding: “UTF-8 with BOM”. It’s not because of how RegEx works. What does it show when you put the msgbox, %FileRead1% line after reading the text from your file? If the characters are wrong there, then it can’t be the RegEx’s problem since that hasn’t even occurred yet.