AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Put here requests of problems with regular expressions
Goto page Previous  1, 2, 3 ... 16, 17, 18, 19  Next
 
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Ask for Help
View previous topic :: View next topic  
Author Message
Guest






PostPosted: Mon Feb 11, 2008 3:31 am    Post subject: Reply with quote

Anonymous wrote:
When searching the forums for regexreplace I get every thread PhilHo has ever posted to, since that the word regexreplace is part of his sig. Is there some way of excluding personal sigs from forum search, so this doesn't happen? Phil is invaluable as an instructor here but it's screwing up my search.


Actually I can use "-Philho" as part of my search string, but that's not exactly what I wanted. Smile
Back to top
Heretic
Guest





PostPosted: Wed Mar 12, 2008 11:17 pm    Post subject: URL Extraction problem Reply with quote

i'm trying to extract red colored URLs only from the source below

Code:
target="_blank">http://pics.site.com/archive/random/fh283fh/fh283fhpl.jpg</a><br>Link URL: <a href="/j.php?id=839523&amp;u=http%3A%2F%2Fwww.site.com%2Flink.php%3Fref%3fhwio3fih3" target="_blank">http://www.site.com/link.php?ref=fhwio3fih3</a> <br><br>



and below is what i've tried.

Code:
FileRead, stripurl, htmlsource.html

intFoundNum := 0
infFoundPos := 1

loop{
   intFoundPos := RegExMatch(stripurl, "target=""_blank"">([^<]*)</a><br>Link[^<]*target=""_blank"">([^<]*)</a>", urlstripped[%A_index%]  ,intFoundPos )

if(intFoundPos=0){
              break
     }else{
              intFoundNum++
     }
     intFoundPos+=strLen( urlstripped[%A_index%] )
 }

 Loop %intFoundNum%{
    FirstURL :=urlstripped[%A_Index%]1
    SecondURL :=urlstripped[%A_Index%]2
 }

msgbox %FirstURL%
msgbox %SecondURL%

and returns nothing
where did i wrong?
i'm new to AutoHotKey and also don't know well about RegEx
all i want to do is, extract two urls in the source.
can anyone help me please?
Back to top
Razlin



Joined: 05 Nov 2007
Posts: 436
Location: canada

PostPosted: Mon Apr 07, 2008 3:46 pm    Post subject: Reply with quote

I'm trying to find anything that is NOT 02402


I have a document with some errors.

ie

ERROR:02402: tadadada File ERROR:02402: asf asdf fsdafas
ERROR:02402: tadadada File ERROR:02402: asf asdf fsdafas
ERROR:02402: tadadada File ERROR:02402: asf asdf fsdafas
ERROR:08888: tadadada File ERROR:02402: asf asdf fsdafas
ERROR:02402: tadadada File ERROR:02402: asf asdf fsdafas
ERROR:02402: tadadada File ERROR:02402: asf asdf fsdafas
ERROR:02402: tadadada File ERROR:02402: asf asdf fsdafas

notice the 08888 for example. (could be any number)

how does one do a search to find anything EXCEPT a number

IE here are some attempts

- ERROR:{[^0]{[^2]}{[^4]}{[^0]{[^2]}
- ERROR:{^[0]{^[2]}{^[4]}{^[0]{^[2]}
and
- ERROR {{^02402]} << doesnt find error 00004

I'm trying to find anything that is NOT 02402

Is that possible?
Back to top
View user's profile Send private message
Oberon



Joined: 18 Feb 2008
Posts: 456

PostPosted: Mon Apr 07, 2008 5:52 pm    Post subject: Reply with quote

Use a negative lookahead, i.e. ...(?!02402).+
Back to top
View user's profile Send private message
Wouther



Joined: 01 May 2007
Posts: 79
Location: The Netherlands

PostPosted: Mon Apr 07, 2008 7:25 pm    Post subject: Reply with quote

Hey all,

I understand basic RegEx, but I really don't know why this does not work:
Code:
code =
(
...Website here...
         Current documents:
         <ul>
            <li>[date/time] <a href="content/xxxxxx">title</a></li>
            <li>[date/time] <a href="content/xxxxxx">title</a></li>
            <li>[date/time] <a href="content/xxxxxx">title</a></li>
            <li>[date/time] <a href="content/xxxxxx">title</a></li>
            <li>[date/time] <a href="content/xxxxxx">title</a></li>
            <li>[date/time] <a href="content/xxxxxx">title</a></li>
         </ul>
...Website here...
)

Foo("Current documents:", "</ul>")

Foo(param1, param2)
{
   global
   RegExMatch(code, ".*\Q" . param1 . "\E(.*)\Q" . param2 . "\E.*", Sub)
   MsgBox, % Sub1
}

It's a piece from a website's source code. I think it has something to do with the following I found on this topic:
PhiLho wrote:
I use the surrounding tags, because HTML has other non-significant numbers:
Code:
page =
(
</tr>
<tr id="C1WebGrid1_R2" bgcolor="#F2F2F2">
<td><font color="Black">
<a id="C1WebGrid1_R2_Hyperlink1" NAME="Hyperlink1" tabIndex="103" href="trgovanje.aspx?hartijaID=1105" target="_parent">EDPL-R-A</a>

</font></td><td><font color="Black">ELEKTRODISTRIBUCIJA AD PALE</font></td><td><font color="Black">13.11.2006</font></td><td><font color="Black">3,5300</font></td><td><font color="Black">3,5200</font></td><td><font color="Black">0</font></td><td><font color="Black">3,5100</font></td><td><font color="Black">3,6000</font></td>
</tr>
)

newPos := 1
Loop
{
   fp := RegExMatch(page, "<font color=""Black"">([\d.,]+)</font>", number, newPos)
   If (fp = 0)
      Break
   newPos := fp + StrLen(number)
   results := results . number1 . "`n"
}
MsgBox %results%

So, the main goal is to retrieve that piece (between "Current documents:" and "</ul>") so I can use it in the rest of the script. Any ideas? Smile

P.S. I tried something with the "m" option but it didn't work either... Sad
_________________
Printing css/html-formatted text
Back to top
View user's profile Send private message
Wouther



Joined: 01 May 2007
Posts: 79
Location: The Netherlands

PostPosted: Tue Apr 08, 2008 2:39 pm    Post subject: Reply with quote

OK, this is a bit confusing...
I think the problem has something to do with newlines, because this code won't work: (the file contains exactly the same as the variable)
Code:
code =
(
...Website here...
         Current documents:
         <ul>
            <li>[date/time] <a href="content/xxxxxx">title</a></li>
            <li>[date/time] <a href="content/xxxxxx">title</a></li>
            <li>[date/time] <a href="content/xxxxxx">title</a></li>
            <li>[date/time] <a href="content/xxxxxx">title</a></li>
            <li>[date/time] <a href="content/xxxxxx">title</a></li>
            <li>[date/time] <a href="content/xxxxxx">title</a></li>
         </ul>
...Website here...
)

FileRead, code, code.txt
MsgBox, % code

Foo("<ul>", "</ul>")

Foo(param1, param2)
{
   global
   RegExMatch(code, ".*\Q" . param1 . "\E(.*)\Q" . param2 . "\E.*", Sub)
   MsgBox, % Sub1
}
Disabling the "FileRead"-line works (so using the variable with exactly the same content as the file works...). Why?
_________________
Printing css/html-formatted text
Back to top
View user's profile Send private message
HugoV



Joined: 27 May 2007
Posts: 617

PostPosted: Tue Apr 08, 2008 3:12 pm    Post subject: Reply with quote

I just tested it and it works in both cases for me. Check the original file, is it perhaps UTF-8?
Back to top
View user's profile Send private message
Wouther



Joined: 01 May 2007
Posts: 79
Location: The Netherlands

PostPosted: Tue Apr 08, 2008 6:26 pm    Post subject: Reply with quote

HugoV wrote:
I just tested it and it works in both cases for me.
Strange... it doesn't work here. Sad I reinstalled the latest version of ahk so it's not that.
Quote:
Check the original file, is it perhaps UTF-8?
I'm not sure... Embarassed I downloaded a website with URLDownloadToFile and used FileRead, if that matters.
Edit: I forgot to mention: thanks for you help! Smile
_________________
Printing css/html-formatted text
Back to top
View user's profile Send private message
HugoV



Joined: 27 May 2007
Posts: 617

PostPosted: Tue Apr 08, 2008 7:14 pm    Post subject: Reply with quote

try this, works for me:
Code:
RegExMatch(code, "s).*\Q" . param1 . "\E(.*)\Q" . param2 . "\E.*", Sub)
Back to top
View user's profile Send private message
Wouther



Joined: 01 May 2007
Posts: 79
Location: The Netherlands

PostPosted: Wed Apr 09, 2008 5:06 pm    Post subject: Reply with quote

Thank you! It works now. Razz
_________________
Printing css/html-formatted text
Back to top
View user's profile Send private message
Ozay



Joined: 23 Oct 2007
Posts: 15

PostPosted: Tue Apr 15, 2008 8:55 pm    Post subject: Reply with quote

I have some data which has been OCR'd into text files. which looks like this:

Code:
30 00.01.04 RECORD OF REVISIONS
40 00.01.02 LIST OF CHAPTERS (TABLE OF CON
TENTS)
50 00.02.00 01 GENERAL INFORMATION
60 99.02.00 INTRODUCTION
1.60 99.03.01 KA 138 NAV TRANSFER RELAY
INSTALLATION DNG (REV 0)
9.50 99.08.02 429 SIGNAL LEVELS (TABLE
5-2)
9.60 99.08.02 RS232 SIGNAL LEVELS (TABLE
53) .
9.70 99.08.01 OVERHAUL


Each line should begin with:
## ##.##.## (continues with text)
OR
#.## ##.##.## (more text)
OR
##.## ##.##.## (another section of text)

Sometimes the text will terminate and continue on the next line, and is either text or a number.
Either way, I need a way to automatically go through all the files and correct this. I have been able to get the extra lines which start with text to be fixed, but those with numbers are giving me problems:

Code:
dataA := RegExReplace(data,"m)\R([a-zA-Z_()*`,])"," $1")

and

dataA := RegExReplace(data,"m)\R([^\d{2}|\d.\d{2}|\d{2}.\d{2}])"," $1")


Both seem to work on the test data I am working with except on lines with numbers.

For example the lines:
40 00.01.02 LIST OF CHAPTERS (TABLE OF CON
TENTS)
9.50 99.08.02 429 SIGNAL LEVELS (TABLE
5-2)

Should be:
40 00.01.02 LIST OF CHAPTERS (TABLE OF CONTENTS)
9.50 99.08.02 429 SIGNAL LEVELS (TABLE 5-2)


Thanks!
_________________
Blessed are those who can laugh at themselves for they shall never cease to be amused.
Back to top
View user's profile Send private message
Ozay



Joined: 23 Oct 2007
Posts: 15

PostPosted: Thu Apr 17, 2008 5:39 pm    Post subject: Reply with quote

Did I scare everyone off this thread? I tried to explain my problem as well as I could; do I need to clarify further?

This has been plaguing me for weeks and thought I would set it to the forums to see if I could get some ideas on how to run accomplish it.

Thanks!
_________________
Blessed are those who can laugh at themselves for they shall never cease to be amused.
Back to top
View user's profile Send private message
Oberon



Joined: 18 Feb 2008
Posts: 456

PostPosted: Thu Apr 17, 2008 7:15 pm    Post subject: Reply with quote

Ozay wrote:
I have some data which has been OCR'd into text files. which looks like this:...
If your query is anything more than a short question about regex or pcre in AutoHotkey you should create a new thread.
Back to top
View user's profile Send private message
biatche



Joined: 23 Feb 2008
Posts: 59

PostPosted: Mon May 12, 2008 7:02 am    Post subject: Reply with quote

say i have this

testvar = random_key=random_settings hello=hi

and random would really mean it could be anything

I wish to use regexmatch to get random_key into a var, ok, it might not get it into a var, but i suppose i could at least get the position

FoundPos := RegExMatch("%testvar%", "^.*=")

Apparently this does not seem to work.

(btw, what command should i use to get random_key into a variable? StringLeft?)
Back to top
View user's profile Send private message
Wouther



Joined: 01 May 2007
Posts: 79
Location: The Netherlands

PostPosted: Mon May 12, 2008 10:29 am    Post subject: Reply with quote

You can save parts of a string with RegexMatch like this:
Code:
testvar = random_key=random_settings hello=hi
RegexMatch(testvar, "^(.*?)=", Sub)
MsgBox, random_key = %Sub1%

_________________
Printing css/html-formatted text
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Ask for Help All times are GMT
Goto page Previous  1, 2, 3 ... 16, 17, 18, 19  Next
Page 17 of 19

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group