| View previous topic :: View next topic |
| Author |
Message |
Guest
|
Posted: Fri Jul 18, 2008 8:24 pm Post subject: Extracting SubString |
|
|
I'm trying to extra a sub-string of variable length.
I have a .txt file that has a bunch of lines like this:
cfm?site_id=38500"><b>
cfm?site_id=123"><b>
cfm?site_id=6789345"><b>
cfm?site_id=6783"><b>
cfm?site_id=24650"><b>
cfm?site_id=542326"><b>
First I need to loop through each line, then I want to extract just the number between id= "> And then output it to a second .txt file.
Anyone know what this code would look like?
Thanks! |
|
| Back to top |
|
 |
[VxE]
Joined: 07 Oct 2006 Posts: 1071
|
Posted: Fri Jul 18, 2008 8:39 pm Post subject: |
|
|
| Code: | Infile = %A_MyDocuments%\something.txt
outfile = %A_MyDocuments%\output.txt
Loop, Read, %Infile%, %Outfile%
{
FileAppend, % regexreplace( A_LoopReadLine, ?, ? )
} | You provide the ? part. _________________ My Home Thread
More Common Answers: 1. It's in the FAQ 2. Ternary ( ? : ) guide 3. Post code with [code][/code] tags |
|
| Back to top |
|
 |
SKAN
Joined: 26 Dec 2005 Posts: 5790
|
Posted: Fri Jul 18, 2008 8:52 pm Post subject: |
|
|
This can also be achieved with InStr() and Substr() combo.. but the code would not be as compact as RegEx.. _________________ SKAN - Suresh Kumar A N |
|
| Back to top |
|
 |
Guest
|
Posted: Fri Jul 18, 2008 9:50 pm Post subject: |
|
|
Thanks, i'm giving this a try right now.
What if my haystack or needle string has a quote in it. Like this:
| Code: |
FileAppend, % regexreplace( A_LoopReadLine, "href="script\.cfm\?site_id=", "" )
|
My script is getting jammed on the double quote " right before the word script in my haystack. How can I make this a literal double quote? I tried putting a backslash before it, but that didn't help. Ideas? |
|
| Back to top |
|
 |
SKAN
Joined: 26 Dec 2005 Posts: 5790
|
Posted: Fri Jul 18, 2008 9:55 pm Post subject: |
|
|
Try 4 double quotes in lieu of 1 , like:
| Code: | | regexreplace( A_LoopReadLine, "href=""""script\.cfm\?site_id=", "" ) |
 _________________ SKAN - Suresh Kumar A N |
|
| Back to top |
|
 |
Krogdor
Joined: 18 Apr 2008 Posts: 731 Location: The Interwebs
|
Posted: Fri Jul 18, 2008 9:57 pm Post subject: |
|
|
| Code: | | regexreplace( A_LoopReadLine, "href=""script\.cfm\?site_id=", "" ) |
Within double quotes, you need to use two double quotes to equal one double quote. So, """Hi!""" would be "Hi!" |
|
| Back to top |
|
 |
Guest
|
Posted: Fri Jul 18, 2008 10:02 pm Post subject: |
|
|
| awesome, thanks! |
|
| Back to top |
|
 |
Guest
|
Posted: Fri Jul 18, 2008 10:04 pm Post subject: |
|
|
Wait, what if i just have one double quote
like
test"test
I tried
test"""""test
but that didn't work |
|
| Back to top |
|
 |
Guest
|
Posted: Fri Jul 18, 2008 10:10 pm Post subject: |
|
|
okay, i got the double quotes under control now
| Code: | | regexreplace( A_LoopReadLine, "href=""""script\.cfm\?site_id=", "" ) |
How would I insert a line break after each line write to the new output file? |
|
| Back to top |
|
 |
garry
Joined: 19 Apr 2005 Posts: 1017 Location: switzerland
|
Posted: Fri Jul 18, 2008 10:12 pm Post subject: |
|
|
aargh, I never learned/understand regex ....
here a very long example script
| Code: | F1=read.txt
F2=new.txt
filedelete,%F2%
;cfm?site_id=38500"><b>
VANAF1=cfm?site_id=
UNTIL1="><b>
stringlen,L1,vanaf1
L1:=L1+1
Loop,Read,%F1%
{
LR=%A_LoopReadLine%
{
StringGetPos,VAR1,LR,%VANAF1%
StringGetPos,VAR2,LR,%UNTIL1%
VAR1:=(VAR1+L1)
VAR2:=(VAR2+1)
VAR3:=(VAR2-VAR1)
stringmid,ADRESS,LR,VAR1,VAR3
if adress=
continue
Fileappend,%ADRESS%`r`n,%F2%
}
}
run,%F2%
return
|
|
|
| Back to top |
|
 |
infogulch
Joined: 27 Mar 2008 Posts: 121 Location: KC, MO
|
Posted: Sat Jul 19, 2008 3:46 am Post subject: |
|
|
and a regex version, just because:
| Code: | FRead=read.txt
FWrite=new.txt
FileDelete, %FWrite% ;since it may already exist
Loop, Read, %FRead%
FileAppend, % RegexReplace(A_LoopReadLine, ".*?site_id=(.*?)"">.*", "$1`n"), %FWrite%
Run %FWrite% ;Edit: not %F2%, since that var doesn't exist here. :P
; about the RegEx:
; the needle: .*?site_id=(.*?)"">.*
; .*?site_id= get anything up thru site_id=
; (.*?) capture anything after that up until:
; "">.* a literal double quote with gt symbol and anything after that
; & the replacement: $1`n
; return the captured subpattern in ( ) above with a LF after it
|
_________________

Last edited by infogulch on Sat Jul 19, 2008 1:38 pm; edited 1 time in total |
|
| Back to top |
|
 |
garry
Joined: 19 Apr 2005 Posts: 1017 Location: switzerland
|
Posted: Sat Jul 19, 2008 8:00 am Post subject: |
|
|
thank you , infogulch for the regex explanation
(just replace > run %F2% with run %FWrite% ) |
|
| Back to top |
|
 |
infogulch
Joined: 27 Mar 2008 Posts: 121 Location: KC, MO
|
Posted: Sat Jul 19, 2008 1:35 pm Post subject: |
|
|
oh, right. can you tell i started with your version?
thnx  _________________
 |
|
| Back to top |
|
 |
|