Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

NOAA weather data scrape and save


  • Please log in to reply
13 replies to this topic
me2space
  • Members
  • 9 posts
  • Last active: Mar 24 2012 01:13 PM
  • Joined: 03 Mar 2012
Trying to build a script that can download NOAA data to a file. At the moment it works, but some things have to be initially set otherwise it won't work correctly.

You must use Firefox and the default save option must be to save as a text file. I tried IE but ran into some problems the way copy and paste worked inside Notepad.
Also, Notepad must be set by default to save to Desktop. Is there a better way to specify a save location and format regardless of the save as options?

I'm new at this, so any advice / tip would be appreciated. Not sure if there is a way easier way to do this that I'm not seeing...


; Extract weather data from the NOAA site from a major airport.
; Visit www.srh.noaa.gov/data/obhistory/?C=N;O=A to find all available airports.

; ----------------------------------------------Info Gather
Gui, Font, s10, Verdana 
Gui, Add, DropDownList, vLoc_Value, ABQ|ALB|ATL|AUS|BDL|BHM|BNA|BOI|BOS|BUF|BUR|BWI|CHS|CLE|CLT|CMH|COS|CVG|DAL|DAY|DCA|DEN|DFW|DSM|DTW|ELP|EWR|FLL|GEG|GRR|GSO|GUM|HNL|HOU|HPN|IAD|IAH|IND|ISP|ITO|JAX|JFK|KOA|LAS|LAX|LGA|LGB|LIH|LIT|MCI|MCO||MDW|MEM|MHT|MIA|MKE|MS|MSP|MSY|MYR|OAK|OKC|OMA|ONT|ORD|ORF|ORL|PBI|PDX|PHL|PHX|PIT|PNS|PSP|PVD|PWM|RDU|RIC|RNO|ROC|RSW|SAN|SAT|SAV|SDF|SEA|SFB|SFO|SJC|SJU|SLC|SMF|SNA|STL|SYR|TPA|TUL|TUS|TYS
Gui, Add, Button, default ym, Scrape
Gui, Show, w325 h80, NOAA Weather Data Scrape
Gui, Color, White
;Gui, Add, Picture, x0 y0 w325 h100, %A_WinDir%\system32\ntimage.gif ;use as window background if using win xp
Return

ButtonScrape:
GuiClose:
GuiEscape:
Gui, Submit  ; Save each control's contents to its associated variable.

; ----------------------------------------------Add Variables
SetWorkingDir %A_ScriptDir% ;sets current directory as working directory
Site = http://www.srh.noaa.gov/data/obhistory/K%Loc_Value%.html
Web_data = %Loc_Value%_data.txt

; ----------------------------------------------Launch Firefox
Run Firefox.exe %Site%
WinWait, National Weather Service
  Sleep, 500
  Send, {CTRLDOWN}a{CTRLUP}{CTRLDOWN}c{CTRLUP}
  Sleep, 200
  Send, {LWINDOWN}r{LWINUP}

WinWait, Run, 
IfWinNotActive, Run, , WinActivate, Run, 
WinWaitActive, Run, 
  Sleep, 500
  Send, {SHIFTDOWN}n{SHIFTUP}otepad.exe{ENTER}
  
WinWait, Untitled - Notepad, 
IfWinNotActive, Untitled - Notepad, , WinActivate, Untitled - Notepad, 
WinWaitActive, Untitled - Notepad, 
  Send, {CTRLDOWN}v{CTRLUP}
  Sleep, 500
  Send, {CTRLDOWN}s{CTRLUP}

WinWait, Save As, 
IfWinNotActive, Save As, , WinActivate, Save As, 
WinWaitActive, Save As, 
  Send, %Loc_Value%_data.txt{ENTER}
  Sleep, 1000
  WinActivate
  Send, {ALTDOWN}{F4}{ALTUP}
  Sleep, 500

WinWait, National Weather Service
  Sleep, 500
  Send, {ALTDOWN}{F4}{ALTUP}

Loop, read, %Web_data%, %Loc_Value%_Scrape.txt
{
    IfInString, A_LoopReadLine, .gov, FileAppend, %A_LoopReadLine%`n
    IfInString, A_LoopReadLine, :, FileAppend, %A_LoopReadLine%`n
}

Sleep, 1000
FileDelete, %Web_data%
ExitApp
Return


nimda
  • Members
  • 4368 posts
  • Last active: Aug 09 2015 02:36 AM
  • Joined: 26 Dec 2010
Look at <!-- m -->http://c.ahk.me/URLDownloadToFile<!-- m -->

me2space
  • Members
  • 9 posts
  • Last active: Mar 24 2012 01:13 PM
  • Joined: 03 Mar 2012
Thanks for taking a look. I initially attempted to use this method, but I get the entire webpage including the html. I was not able to find an easy way to remove the html tags to extract time, date, and temperatures.

Any insight on how to remove the html tags, or grab the content inside a specific set of html tags?


Look at <!-- m -->http://c.ahk.me/URLDownloadToFile<!-- m -->



VxE
  • Moderators
  • 3622 posts
  • Last active: Dec 24 2015 02:21 AM
  • Joined: 07 Oct 2006

Any insight on how to remove the html tags, or grab the content inside a specific set of html tags?

Take a look at Table_FromHTML() (part of my library for handling TSV tables). E.g:
data := Table_FromHTML( Source_HTML, InStr( Source_HTML, "</form>" ) )
The second parameter tells the function where it should start looking for a "<Table>" tag.

nimda
  • Members
  • 4368 posts
  • Last active: Aug 09 2015 02:36 AM
  • Joined: 26 Dec 2010
There's also UnHTM() and several like it.

me2space
  • Members
  • 9 posts
  • Last active: Mar 24 2012 01:13 PM
  • Joined: 03 Mar 2012

Any insight on how to remove the html tags, or grab the content inside a specific set of html tags?

Take a look at Table_FromHTML() (part of my library for handling TSV tables). E.g:
data := Table_FromHTML( Source_HTML, InStr( Source_HTML, "</form>" ) )
The second parameter tells the function where it should start looking for a "<Table>" tag.

Thanks for sharing the library of functions. It will take me some time to look and see if i can easily adapt it for my project, but i'll certainly try. My initial attempt gave a blank MsgBox, but I'm sure with more attempts I can see something being pulled from html file.

me2space
  • Members
  • 9 posts
  • Last active: Mar 24 2012 01:13 PM
  • Joined: 03 Mar 2012

There's also UnHTM() and several like it.

This function works well, thanks for pointing me in the right direction. I will most likely have to create a loop so that it grabs it one line at a time, but otherwise it seems to work out of the box. Nice!
Only trouble is that the text is concatenated, no spaces between the blocks of data inside the tags that were removed. I'm guessing that a space can be introduced somewhere, maybe in RegExReplace so that the data doesn't look like this...
0416:53NW1410.00Partly CloudySCT3006723
it should look like this
04 16:53 NW 14 10.00 Partly Cloudy SCT300 67 23

actually, it would probably be better with a delimiter like a comma, or something like that.

HCProfessionals
  • Members
  • 179 posts
  • Last active: Jul 31 2013 12:49 AM
  • Joined: 18 Jun 2007
I already built a program for NOAA using autohotkey :)

<!-- m -->http://noaadw.com<!-- m -->

me2space
  • Members
  • 9 posts
  • Last active: Mar 24 2012 01:13 PM
  • Joined: 03 Mar 2012
That looks nice, but not exactly what I was looking to do. What I wanted to create is more like a data scraper that will write to file the date, time, and temperatures for a given airport. I don't see a way to do that with the program you provided. Maybe I missed that part?

Anyway, thanks for sharing.

TheDewd
  • Members
  • 842 posts
  • Last active: Jun 10 2016 06:55 PM
  • Joined: 28 Mar 2010
FileDelete, KBWG.xml

UrlDownloadToFile, http://www.weather.gov/xml/current_obs/KBWG.xml, KBWG.xml

FileRead, XML, KBWG.xml

Temp := StrX( XML, "<temp_f>", 1, 8, "</temp_f>", 1, 9 )

Condition := StrX( XML, "<weather>", 1, 9, "</weather>", 1, 10 )

; ... Add more here.

Return



F1::

MsgBox, %Temp% - %Condition%

return



StrX(H,BS="",BO=0,BT=1,ES="",EO=0,ET=1,ByRef N=""){  ; http://www.autohotkey.com/forum/topic51354.html

Return SubStr(H,P:=(((Z:=StrLen(ES))+(X:=StrLen(H))+StrLen(BS)-Z-X)?((T:=InStr(H,BS,0,((BO

<0)?(1):(BO))))?(T+BT):(X+1)):(1)),(N:=P+((Z)?((T:=InStr(H,ES,0,((EO)?(P+1):(0))))?(T-P+Z

+(0-ET)):(X+P)):(X)))-P)

}


me2space
  • Members
  • 9 posts
  • Last active: Mar 24 2012 01:13 PM
  • Joined: 03 Mar 2012
I like the idea of converting html data to xml, but that seems to be the main hurdle for me. The data is not nicely ordered, and getting the temp values involves skipping over many similar table tags.

My new code is not finished, but I should probably post it here so that my new direction can be seen. I think converting it to xml would be solving the same problem that I'm having... ie. skipping over many identical <td> tags is the main problem I'm trying to solve right now.

The scrape file it creates will show the table html. Here is a sample...
<tr align="center" valign="top" bgcolor="#eeeeee"><td>15</td><td align="right">13:53</td><td>E 9 G 17</td><td>10.00</td><td align="left">Mostly Cloudy</td><td>SCT047 BKN060</td><td>80</td><td>59</td>







FileDelete, KBWG.xml
UrlDownloadToFile, http://www.weather.gov/xml/current_obs/KBWG.xml, KBWG.xml
FileRead, XML, KBWG.xml
Temp := StrX( XML, "<temp_f>", 1, 8, "</temp_f>", 1, 9 )
Condition := StrX( XML, "<weather>", 1, 9, "</weather>", 1, 10 )
; ... Add more here.
Return

F1::
MsgBox, %Temp% - %Condition%
return

StrX(H,BS="",BO=0,BT=1,ES="",EO=0,ET=1,ByRef N=""){  ; http://www.autohotkey.com/forum/topic51354.html
Return SubStr(H,P:=(((Z:=StrLen(ES))+(X:=StrLen(H))+StrLen(BS)-Z-X)?((T:=InStr(H,BS,0,((BO
<0)?(1):(BO))))?(T+BT):(X+1)):(1)),(N:=P+((Z)?((T:=InStr(H,ES,0,((EO)?(P+1):(0))))?(T-P+Z
+(0-ET)):(X+P)):(X)))-P)
}

; [code]Extract weather data from the NOAA site for MCO Airport.
; Visit www.srh.noaa.gov/data/obhistory/?C=N;O=A to find all available airports.

; ----------------------------------------------Info Gather
Gui, Font, s10, Verdana 
Gui, Add, DropDownList, vLoc_Value, ABQ|ALB|ATL|AUS|BDL|BHM|BNA|BOI|BOS|BUF|BUR|BWI|CHS|CLE|CLT|CMH|COS|CVG|DAL|DAY|DCA|DEN|DFW|DSM|DTW|ELP|EWR|FLL|GEG|GRR|GSO|GUM|HNL|HOU|HPN|IAD|IAH|IND|ISP|ITO|JAX|JFK|KOA|LAS|LAX|LGA|LGB|LIH|LIT|MCI|MCO||MDW|MEM|MHT|MIA|MKE|MS|MSP|MSY|MYR|OAK|OKC|OMA|ONT|ORD|ORF|ORL|PBI|PDX|PHL|PHX|PIT|PNS|PSP|PVD|PWM|RDU|RIC|RNO|ROC|RSW|SAN|SAT|SAV|SDF|SEA|SFB|SFO|SJC|SJU|SLC|SMF|SNA|STL|SYR|TPA|TUL|TUS|TYS
Gui, Add, Button, default ym, Scrape
Gui, Show, w325 h80, NOAA Weather Data Scrape
Gui, Color, White
;Gui, Add, Picture, x0 y0 w325 h100 
Return

ButtonScrape:
GuiClose:
GuiEscape:
Gui, Submit  ; Save each control's contents to its associated variable.

; ----------------------------------------------Add Variables
SetWorkingDir %A_ScriptDir% ;sets current directory as working directory
Site = http://www.srh.noaa.gov/data/obhistory/K%Loc_Value%.html
Web_html = %Loc_Value%_Web.htm

; ----------------------------------------------Download Data
UrlDownloadToFile, %Site%, %Web_html%

Loop, read, %Web_html%, %Loc_Value%_Scrape.txt
{
    IfInString, A_LoopReadLine, <tr align="center" valign="top" bgcolor=
    FileAppend, %A_LoopReadLine%`n`n   
}

;FileDelete, %Web_html%

ExitApp ;Troubleshooting Purposes
Pause ;###############################



ExitApp
Return


; ----------------------------------------------Functions

StrX(H,  BS="",BO=0,BT=1,   ES="",EO=0,ET=1,  ByRef N="" ) { ;    | by Skan | 19-Nov-2009 
Return SubStr(H,P:=(((Z:=StrLen(ES))+(X:=StrLen(H))+StrLen(BS)-Z-X)?((T:=InStr(H,BS,0,((BO 
 <0)?(1):(BO))))?(T+BT):(X+1)):(1)),(N:=P+((Z)?((T:=InStr(H,ES,0,((EO)?(P+1):(0))))?(T-P+Z 
 +(0-ET)):(X+P)):(X)))-P) ; v1.0-196c 21-Nov-2009 www.autohotkey.com/forum/topic51354.html 
} 

UnHTM( HTM ) { ; Remove HTML formatting / Convert to ordinary text     by SKAN 19-Nov-2009 
 Static HT     ; Forum Topic: www.autohotkey.com/forum/topic51342.html 
 IfEqual,HT,,   SetEnv,HT, % "ááââ´´ææàà&ååãã&au" 
 . "mlä&bdquo„¦¦&bull•ç縸¢¢&circˆ©©¤¤&dagger†&dagger‡°" 
 . "°÷÷ééêêèèððëë&euro€&fnofƒ½½¼¼¾¾>>&h" 
 . "ellip…ííîî¡¡ìì¿¿ïï««&ldquo“&lsaquo‹&lsquo‘<<&m" 
 . "acr¯&mdash—µµ··  &ndash–¬¬ññóóôô&oeligœòò&or" 
 . "dfªººøøõõöö¶¶&permil‰±±££"""»»&rdquo”®" 
 . "®&rsaquo›&rsquo’&sbquo‚&scaronš§§­­¹¹²²³³ßßþþ&tilde˜&tim" 
 . "es×&trade™úúûûùù¨¨üüýý¥¥ÿÿ" 
 TXT := RegExReplace( HTM,"<[^>]+>" )               ; Remove all tags between  "<" and ">" 
 Loop, Parse, TXT, &`;                              ; Create a list of special characters 
   L := "&" A_LoopField ";", R .= (!(A_Index&1)) ? ( (!InStr(R,L,1)) ? L:"" ) : "" 
 StringTrimRight, R, R, 1 
 Loop, Parse, R , `;                                ; Parse Special Characters 
  If F := InStr( HT, A_LoopField )                  ; Lookup HT Data 
    StringReplace, TXT,TXT, %A_LoopField%`;, % SubStr( HT,F+StrLen(A_LoopField), 1 ), All 
  Else If ( SubStr( A_LoopField,2,1)="#" ) 
    StringReplace, TXT, TXT, %A_LoopField%`;, % Chr(SubStr(A_LoopField,3)), All 
Return RegExReplace( TXT, "(^\s*|\s*$)")            ; Remove leading/trailing white spaces 
} [/code]


me2space
  • Members
  • 9 posts
  • Last active: Mar 24 2012 01:13 PM
  • Joined: 03 Mar 2012
I have been able to isolate the day and hour, and using unHTM(), removed html formatting. Extracting the temperature proves more difficult because the only string I can retrieve looks like this...

<td align="left">Mostly Cloudy</td><td>SCT047 BKN060</td><td>80</td><td>59</td>

The temperature is the 80 in above example. Any ideas on how to isolate the temp?

me2space
  • Members
  • 9 posts
  • Last active: Mar 24 2012 01:13 PM
  • Joined: 03 Mar 2012
Ok, so I missed what you were saying. I was not aware that an xml data set was available from NOAA. This might very well be the solution that I was looking for. Xml is much easier to grab. Thanks!

FileDelete, KBWG.xml
UrlDownloadToFile, http://www.weather.gov/xml/current_obs/KBWG.xml, KBWG.xml
FileRead, XML, KBWG.xml
Temp := StrX( XML, "<temp_f>", 1, 8, "</temp_f>", 1, 9 )
Condition := StrX( XML, "<weather>", 1, 9, "</weather>", 1, 10 )
; ... Add more here.
Return

F1::
MsgBox, %Temp% - %Condition%
return

StrX(H,BS="",BO=0,BT=1,ES="",EO=0,ET=1,ByRef N=""){  ; http://www.autohotkey.com/forum/topic51354.html
Return SubStr(H,P:=(((Z:=StrLen(ES))+(X:=StrLen(H))+StrLen(BS)-Z-X)?((T:=InStr(H,BS,0,((BO
<0)?(1):(BO))))?(T+BT):(X+1)):(1)),(N:=P+((Z)?((T:=InStr(H,ES,0,((EO)?(P+1):(0))))?(T-P+Z
+(0-ET)):(X+P)):(X)))-P)
}



me2space
  • Members
  • 9 posts
  • Last active: Mar 24 2012 01:13 PM
  • Joined: 03 Mar 2012
Thank you everyone for the help. I have changed my original script quite a bit, and now I'm much happier with results. The html parsing was doable but a pain, the xml was easy in comparison. The only drawback is that the xml doesn't have 3 days worth of temps to pull from only the current temp. I got around that by running the script hourly in scheduled tasks.

Cheers.

; Extract weather data from the NOAA site for major airports.
; Visit http://www.weather.gov/xml/current_obs/ to find all available airports.

; ----------------------------------------------Info Gather

Gui, Font, s10, Verdana 
Gui, Add, DropDownList, vAirport_Code, ABQ|ALB|ATL|AUS|BDL|BHM|BNA|BOI|BOS|BUF|BUR|BWI|CHS|CLE|CLT|CMH
   |COS|CVG|DAL|DAY|DCA|DEN|DFW|DSM|DTW|ELP|EWR|FLL|GEG|GRR|GSO|GUM|HNL|HOU|HPN|IAD|IAH|IND|ISP|ITO
   |JAX|JFK|KOA|LAS|LAX|LGA|LGB|LIH|LIT|MCI|MCO||MDW|MEM|MHT|MIA|MKE|MSP|MSY|MYR|OAK|OKC|OMA|ONT
   |ORD|ORF|ORL|PBI|PDX|PHL|PHX|PIT|PNS|PSP|PVD|PWM|RDU|RIC|RNO|ROC|RSW|SAN|SAT|SAV|SDF|SEA|SFB|SFO
   |SJC|SJU|SLC|SMF|SNA|STL|SYR|TPA|TUL|TUS|TYS
Gui, Add, Button, default ym, Scrape
Gui, Show, w325 h80, NOAA Weather Data Scrape
Gui, Color, White

Sleep, 15000  ;Give time to make selection, otherwise use Default
Send, {Enter}
Return

ButtonScrape:
GuiClose:
GuiEscape:
Gui, Submit  ; Save each control's contents to its associated variable.

; ----------------------------------------------Add Variables
SetWorkingDir %A_ScriptDir% ;sets current directory as working directory
Site = http://www.weather.gov/xml/current_obs/K%Airport_Code%.xml
Web_XML = %Airport_Code%_Web.htm

; ----------------------------------------------Download Data
UrlDownloadToFile, %Site%, %Web_XML%

FileRead, XML, %Web_XML%
Date := StrX( XML, "<observation_time_rfc822>", 1, 29, "</observation_time_rfc822>", 1, 31 )
Temperature := StrX( XML, "<temp_f>", 1, 8, "</temp_f>", 1, 9 )
FileAppend, %Date%`, %Temperature%`t`n, %Airport_Code%_Data.txt


FileDelete, %Web_XML%

ExitApp
Return

; ----------------------------------------------Functions

StrX(H,  BS="",BO=0,BT=1,   ES="",EO=0,ET=1,  ByRef N="" ) { ;    | by Skan | 19-Nov-2009 
Return SubStr(H,P:=(((Z:=StrLen(ES))+(X:=StrLen(H))+StrLen(BS)-Z-X)?((T:=InStr(H,BS,0,((BO 
 <0)?(1):(BO))))?(T+BT):(X+1)):(1)),(N:=P+((Z)?((T:=InStr(H,ES,0,((EO)?(P+1):(0))))?(T-P+Z 
 +(0-ET)):(X+P)):(X)))-P) ; v1.0-196c 21-Nov-2009 www.autohotkey.com/forum/topic51354.html 
}