Some help with regex and ReplaceRegex() method

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
packets25
Posts: 2
Joined: 05 Jun 2018, 00:25

Some help with regex and ReplaceRegex() method

08 Jun 2018, 10:19

Hi everyone,
I have a problem with this part, I got a script to allow download data from a website (scrape):
the big problem is the className, i want only the date but in some tag (from the list are not the same)

Code: Select all

[color=#FF0000]<div class="content-type-text">[/color]
	<h2 class="job-list-title">....</h2>
	<h3 class="job-list-subtitle">...</h3>
	<ul class="tag-group">                            ;-----> I can't select this className because it uses it in another tag, thats why I get all the content from the <div class="content-type-text"> with that className
		<li class="tag-divider tag-ellipsis-cols2">
			<span class="ellipsis">Barcelona</span>
                </li>
		<li>
			<span class="marked">Hace 27m</span> 
			<span class="badge-marked-sticker">Nueva</span>	      				
	       </li>
	</ul>
...</div>
this is for one element, in another elements tag <span class="marked"> doesn't have it.
I've look for information and i've found Regex way:

Code: Select all

 an := IE.document.getElementsByClassname("content-type-text")[a_index].Innerhtml

		RegExMatch(an,"></li.*?(?=</ul>)", exp)                    ; --> i wanted to select the date from here, but i couldn't, only cut out the code html
; exp = "></li><li><span class=""marked"">Hace 7m</span><span class=""badge-marked-sticker"">Nueva</span></li>[/c]"
; in another elements exp= "></li><li><span>01 de jun</span></li>"
		n4 := RegExReplace(exp, ".*?i><s.*?>(.*?)<", "$1")     ;  --> and after i use this method to encapsulate waht i need (the date), but it doesn't anything
I've tried many expression to encapsular that data and AutoHotKey doesn't allow "\" this character and "quotation marks" (well in my case)
it's so wird becuase when i try to get "Barcelona" it does it:

Code: Select all

 RegExMatch(an,"col[^>]*>.*?(?=</span></li>)", reg)
		 n3 := RegExReplace(reg, ".*>(.*)", "$1")
	         ; n3 = "Barcelona"
Thanks a lot for your answers and i apologize for my english i know is not very good.
geek
Posts: 1052
Joined: 02 Oct 2013, 22:13
Location: GeekDude
Contact:

Re: Some help with regex and ReplaceRegex() method

08 Jun 2018, 10:56

It's not clear what information you want specifically, but it sounds like you want to get the contents of <ul class="tag-group"> that is a child of <div class="content-type-text"> right?

You should be able to get the exact data you need using only IE's querySelector/querySelectorAll methods. To select for that you could use the following code.

Code: Select all

elements := IE.document.querySelectorAll(".content-type-text .tag-group")
Loop, % elements.length
{
	MsgBox, % elements.Item(A_Index-1).innerText
}
Here are some pages where you can read about the CSS Selectors that you can use in querySelector/querySelectorAll to target the elements that you want.
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: Some help with regex and ReplaceRegex() method

08 Jun 2018, 11:21

im also not sure what youre asking for exactly, so ill assume youve somehow extracted the following string and want to get the date:

Code: Select all

text := "></li><li><span>01 de jun</span></li>"
match := RegExReplace(text, ".*(?<=<span>)([^<]*).*", "$1")
MsgBox % match
i wouldnt necessarily recommend using regexReplace that way to retrieve matches, as this requires you to consume the whole haystack
use RegExMatch instead, looped if u need multiple matches
packets25
Posts: 2
Joined: 05 Jun 2018, 00:25

Re: Some help with regex and ReplaceRegex() method

12 Jun 2018, 07:11

Thanks GeekDude I used that method. You're amazing, I have to correct something about classes but I almost get it.

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Araphen, Dobbythenerd1, Draken and 326 guests