Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Is it possible to grab text from a web page?


  • Please log in to reply
10 replies to this topic
tacopalypse
  • Members
  • 5 posts
  • Last active: Jul 08 2009 06:49 PM
  • Joined: 21 Jun 2009
Lets say i have a web page or other program running as the active window, and it has some text on it that i want to make usable by an autohotkey script. The problem is it's not detected using wingettext even with detecthiddentext on, and windowspy shows up nothing. This seems to be case with most websites - the bulk of the text isn't detected by wingettext.

Now, one way of doing it would be to quickly do a ctrl-a & ctrl-c to load the whole page into the clipboard, but i'm looking for a less cumbersome method, like similar to wingettext so that it runs in the background. So, anyone know of a way to do this?

tidbit
  • Administrators
  • 2623 posts
  • Hates playing Janitor
  • Last active: Today, 04:42 PM
  • Joined: 09 Mar 2008
URLDownloadToFile.
Then FileRead it.
Get the text you need (best way would probably be regex.)
FileDelete the file.

TLM
  • Moderators
  • 3821 posts
  • Last active: Nov 11 2014 07:38 PM
  • Joined: 21 Aug 2006
Well if its text from a website you could just download the page. It ofcourse would work better if the page is has a static formatting to it. Then you just have to use file read line. I wrote this example for someone else here.

You can see the whole site here: <!-- m -->http://www.ipchicken.com<!-- m -->

; Get IP address from http://www.ipchicken.com
; Confirmed to work on XP
; Unconfirmed on Vista or Windows 7

; This script also checks for IE's Online/Offline status.
; It will leave it untouched if you work in online mode.

  RegRead, lineStatus, ; 0 is on. 1 is off
   HKCU,Software\Microsoft\Windows\CurrentVersion\Internet Settings, GlobalUserOffline
    if lineStatus = 1
       {
       errorlineStatus = Offline
       Goto, stOnlineMode
      }
    else
      {
       errorlineStatus = Online
       lineStatus = online
        Goto, procDwnld
     }

; Switch Mode.

stOnlineMode:
   RegWrite, REG_DWORD, HKCU, Software\Microsoft\Windows\CurrentVersion\Internet Settings, GlobalUserOffline, 0

; Process Download.

procDwnld:
   UrlDownloadToFile, http://www.ipchicken.com, %A_Temp%\ipaddFile.txt
      If ErrorLevel = 1
        {
         msgbox, Page Download Failed! %errorlineStatus%
	If errorlineStatus = Offline
	 {
	   RegWrite, REG_DWORD, HKCU, Software\Microsoft\Windows\CurrentVersion\Internet Settings, GlobalUserOffline, 1
	  ExitApp
	}
	else if errorlineStatus = Online
	{
	 RegWrite, REG_DWORD, HKCU, Software\Microsoft\Windows\CurrentVersion\Internet Settings, GlobalUserOffline, 0
	ExitApp
               }
            }
          If lineStatus = online
             {
              Goto, ReadFile
            }
         else
             {
              RegWrite, REG_DWORD, HKCU, Software\Microsoft\Windows\CurrentVersion\Internet Settings, GlobalUserOffline, 1
            }

; Reads the IP into var.

ReadFile:

FileReadLine, ipaddLine, %A_Temp%\ipaddFile.txt, 35
 StringReplace, ipaddLine, ipaddLine,%A_space%, , All
  StringReplace, ipaddLine, ipaddLine,<br>, , All
   Clipboard := ipaddLine
   FileDelete, %A_Temp%\ipaddFile.txt
  msgbox, % "IP address is:`n`n" ipaddLine "`n`nResults copied to clipboard"
 Exitapp

If the page is not static and you plan on reading from it more than once, you can still download the page, you will have to use filereadline and do the necessary character removal using StringReplace and or RegExReplace() etc.

If your trying to get text from window other than a webpage, perhaps it can be converted to a text file. Then apply the same process. Just an idea...

hth

tacopalypse
  • Members
  • 5 posts
  • Last active: Jul 08 2009 06:49 PM
  • Joined: 21 Jun 2009
hmm... urldownloadtofile seems like it would work, provided it has the same behavior as just saving the page from the browser using File->Save. however i haven't been able to test it yet due to another issue - i have to log in to the site. so urldownloadtofile just returns an error page. got any ideas on how to log in using autohotkey?

TLM
  • Moderators
  • 3821 posts
  • Last active: Nov 11 2014 07:38 PM
  • Joined: 21 Aug 2006
May I ask the URL of the site? Or if you dont want to say do you know another site that has the same kind of login?

I tried using a FTP style login for another site and it doesnt seem to work.

Perhaps there will be more clues here: <!-- m -->http://www.autohotke... ... 138#210138<!-- m -->

I also need this so I will keep at it.. Let me know if you figure anything out also..

TLM
  • Moderators
  • 3821 posts
  • Last active: Nov 11 2014 07:38 PM
  • Joined: 21 Aug 2006
HOLY CRAP!

I just took a wild chance to see if editing the values for both username and password fields would work in downloaded html from Yahoo's mail web login page and OMG IT LOGGED ME IN (after I manually pressed enter ofcourse) :shock: :shock: :shock: !

Here are the HTML lines:

<input name="login" id="username" value="[color=red]MY EMAIL ADDRESS @yahoo.ca[/color]" 

...........................                                       

<input name="passwd" id="passwd" value="[color=red]MY PASSWORD[/color]"
Theres no way its this easy..
I'm sure this must be some kind of anomalous behavior! Or maybe some kind of symptom of caching.. Gotta be...

Must do some more investigation into this. If it does work alls that would have to be done next is the enter button press.. WOW..

More to come...

tacopalypse
  • Members
  • 5 posts
  • Last active: Jul 08 2009 06:49 PM
  • Joined: 21 Jun 2009
heh i just realized some of the sites i'm trying to do this to have a captcha so theres pretty much no chance of using an auto-login script. basically what i'm trying to do is automatically record a few numbers from the page every time i visit it (after manually logging in).

2 ways i've thought of so far are: just quickly sending ctrl a + ctrl c then parsing the clipboard, or just sending ctrl s + enter and parsing the saved file. both can be done in under half a second, but a method that runs 100% in the background would still be nicer.

TLM
  • Moderators
  • 3821 posts
  • Last active: Nov 11 2014 07:38 PM
  • Joined: 21 Aug 2006

..the sites i'm trying to do this to have a captcha so theres pretty much no chance of using an auto-login script...

Eww yeah captcha is extra ugly.. Unless theres some way of grabbing the image.. hmmm how do I do this without create a major site exploit ;)??


preedit..

More good news. In Firefox when you this edited page into a NEW browser window (not a new tab), sign in and then go to the page in another FF window, It caches your info and keeps you logged in. Which is great.

2nd good peice is that this also works in IE.

I'm still working on a way to auto submit thought. Maybe someone can chime in if they know how to do this?


bbs

sinkfaze
  • Moderators
  • 6365 posts
  • Last active:
  • Joined: 18 Mar 2008
Your best bet is to go to a COM method using tank's iWeb COM functions to extract the information as soon as you're logged in (Internet Explorer only). You can use tank's DOM viewer to get information about page elements you would like to use with the COM functions.

fro01
  • Members
  • 45 posts
  • Last active: Mar 08 2010 01:54 AM
  • Joined: 29 Apr 2009
Send,{Enter}
hmm im not too sure about this but the best thing i can think of is to have your script check if the current site that you are on has any info saved, and if yes, enters the info then sends Enter to auto-login

if this helps, most websites have the login fields selected by default

TLM
  • Moderators
  • 3821 posts
  • Last active: Nov 11 2014 07:38 PM
  • Joined: 21 Aug 2006

You can use tank's DOM viewer to get information about page elements you would like to use with the COM functions.


See now thats what I call useful info. Thnx sinkfaze!

edit: The dom stuff is simply perfect :D!