Url Parser

Post your working scripts, libraries and tools
Bruttosozialprodukt
Posts: 462
Joined: 24 Jan 2014, 22:28

Url Parser

01 Mar 2014, 16:25

I wrote a function that allows you to extract any information form any url you want and by that I really mean ANY information:
protocol, ftp username, ftp password, subdomain directory folders, domain name, domain extension, port, subdirectory folders

Example:

Code: Select all

urlInfo := ParseUrl("ftp://testuser:[email protected]:21/home/index.php/")

MsgBox % urlInfo.protocol ; contains: "ftp"
MsgBox % urlInfo.ftpUser ; contains: "testuser"
MsgBox % urlInfo.ftpPass ; contains: "testpass"
Loop, % urlInfo.subDomainDir.MaxIndex()
    MsgBox % urlInfo.subDomainDir[A_Index]; first iteration: "subdomain", second iteration "ftp"
MsgBox % urlInfo.domainName ; contains: "example"
MsgBox % urlInfo.domainExt ; contains: "com"
MsgBox % urlInfo.port ; contains: "21"
Loop, % urlInfo.subDir.MaxIndex()
    MsgBox % urlInfo.subDir[A_Index] ; first iteration: "home", second iteration "index.php"

MsgBox % urlInfo.subDomainDir[1] ; contains: "subdomain"
MsgBox % urlInfo.subDomainDir[2] ; contains: "ftp"
MsgBox % urlInfo.subDir[1] ; contains: "home"
MsgBox % urlInfo.subDir[2] ; contains: "index.php"
Here is the function:

Code: Select all

ParseUrl(url) {
    urlObj := Object()
    If (p := InStr(url,"://")) {
        urlObj.protocol := SubStr(url, 1, p-1)
        url := SubStr(url, p+3)
    }
    If (p := InStr(url,"/")) {
        url2 := SubStr(url, p+1)
        url := SubStr(url, 1, p-1)
        Loop, parse, url2, /
        { 
            If (A_LoopField != "")
                urlObj.subDir[A_Index] := A_LoopField
        }
    }
    If (p := InStr(url,"@")) {
        url2 := SubStr(url, 1, p-1)
        url := SubStr(url, p+1)
        p := InStr(url2,":")
        urlObj.ftpUser := SubStr(url2, 1, p-1)
        urlObj.ftpPass := SubStr(url2, p+1)
    }
    If (p := InStr(url,":")) {
        urlObj.port := SubStr(url, p+1)
        url := SubStr(url, 1, p-1)
    }
    Loop, parse, url, .
    {
        url3 := A_LoopField . "." . url3
        count := A_Index
    }
    url3 := SubStr(url3, 1, StrLen(url3)-1)
    p := InStr(url3,".")
    urlObj.domainExt := SubStr(url3, 1, p-1)
    url3 := SubStr(url3, p+1)
    p := InStr(url3,".")
    urlObj.domainName := SubStr(url3, 1, p-1)
    url := SubStr(url3, StrLen(urlObj.domainExt)+StrLen(urlObj.domainName)-1)
    If (count > 1) {
        Loop, parse, url, .
            urlObj.subDomainDir[A_Index] := A_LoopField
    }
    Return % urlObj
}
If you have any ideas on how to put this in one regex tell me. :)
User avatar
atnbueno
Posts: 85
Joined: 12 Oct 2013, 04:45

Re: Url Parser

02 Mar 2014, 08:30

RegEx ideas? Here's mine:

[attachment=0]url_check.png[/attachment]

Code: Select all

; AutoHotkey Version: AutoHotkey_L 1.1 (Unicode x86)
; Language:           English
; Platform:           Win7 SP1
; Author:             Antonio Bueno <user atnbueno of Google's popular e-mail service>
; Short description:  Playing with URLs and regular expression
; Last Mod:           2014-03-02

#NoEnv
AppTitle := "URL Check"
Menu, Tray, Icon, %A_WinDir%\system32\shell32.dll, 14
Menu, Tray, Tip, %AppTitle% (AHK)

Gui, New, AlwaysOnTop, %AppTitle%
Gui, Font, s10, Consolas
Gui, Add, Text, Section, URL:
Gui, Add, Edit, ys W575 gCheckURL vURL, http://ahkscript.org/boards/viewtopic.php?f=6&t=2512&p=13706#p13706
Gui, Add, Text, xm Section, RegExp:
Gui, Add, Edit, ys W550 gCheckURL vRE, ^(?<Protocol>https?|ftp)://(?:(?<Username>[^:]+)(?::(?<Password>[^@]+))[email protected])?(?<Domain>(?:[\w-]+\.)+\w\w+)(?::(?<Port>\d+))?/?(?<Path>(?:[^/?# ]*/?)+)(?:\?(?<Query>[^#]+)?)?(?:\#(?<Hash>.+)?)?$
Gui, Add, Text, xm Section, Protocol:
Gui, Add, Edit, ys W50 vProtocol
Gui, Add, Text, ys, Username:
Gui, Add, Edit, ys W100 vUsername
Gui, Add, Text, ys, Password:
Gui, Add, Edit, ys W200 vPassword
Gui, Add, Text, xm Section, Domain:
Gui, Add, Edit, ys W200 vDomain
Gui, Add, Text, ys, Port:
Gui, Add, Edit, ys W25 vPort
Gui, Add, Text, xm Section, Path:
Gui, Add, Edit, ys W550 vPath
Gui, Add, Text, xm Section, Query:
Gui, Add, Edit, ys W400 vQuery
Gui, Add, Text, ys, Hash:
Gui, Add, Edit, ys W75 vHash
Gui, Show
Goto CheckURL
Return

CheckURL:
Gui, Submit, NoHide
RegExMatch(URL, RE, URL_)
GuiControl, , Protocol, %URL_Protocol%
GuiControl, , Username, %URL_Username%
GuiControl, , Password, %URL_Password%
GuiControl, , Domain, %URL_Domain%
GuiControl, , Port, %URL_Port%
GuiControl, , Path, %URL_Path%
GuiControl, , Query, %URL_Query%
GuiControl, , Hash, %URL_Hash%
Return

Esc::
GuiClose:
Gui, Destroy
ExitApp
After the RegEx you can StringSplit both URL_Domain and URL_Path to get closer to your code. I'm aware the RegEx is not foolproof, but I'd appreciate any reasonable improvement to it.

Regards,
Antonio

[EDIT]Updates:
- Added "#" and " " to invalid <Path> characters
- Single dot required before the end of <Domain>
- Hyphen allowed in subdomains
- TLD must be at least two characters
[/EDIT]
Attachments
url_check.png
Last edited by atnbueno on 08 Jun 2014, 12:33, edited 2 times in total.
Bruttosozialprodukt
Posts: 462
Joined: 24 Jan 2014, 22:28

Re: Url Parser

07 Mar 2014, 20:16

Looks good, but something foolproof would be better. ;)
I just realized that I totally forgot the query string and the fragment id.
User avatar
joedf
Posts: 7696
Joined: 29 Sep 2013, 17:08
Facebook: J0EDF
Google: +joedf
GitHub: joedf
Location: Canada
Contact:

Re: Url Parser

16 Mar 2014, 00:54

nice!
User avatar
lifeweaver
Posts: 144
Joined: 10 May 2014, 05:57
GitHub: lifeweaver
Location: OH

Re: Url Parser

10 May 2014, 06:35

If you want foolproof you might check out InternetCrackURL, it does however need to use the URL_COMPONENTS structure so your getting into DllCall and NumPuts and such.
Guest10
Posts: 578
Joined: 01 Oct 2013, 02:50

Re: Url Parser

10 May 2014, 12:50

the examples are for ftp://... . can this be used also with http://... varieties? any examples for this? :ugeek:
User avatar
atnbueno
Posts: 85
Joined: 12 Oct 2013, 04:45

Re: Url Parser

10 May 2014, 13:33

Thanks lifeweaver for the links. I'll look into it (btw, hilarious function name).

And Guest10, my code covers ftp, http and https. Try it.
GeekDude
Posts: 879
Joined: 02 Oct 2013, 22:13

Re: Url Parser

10 May 2014, 13:37

Reminds me of min

Code: Select all

global UrlRegEx := "^(?P<Scheme>[^\:]+)\:\/\/(?:(?P<User>[^\@\:]*)\:?(?P<Pass>[^\@\:]*)?\@)?(?P<Domain>[^\/\:]+)(?:\:(?P<Port>[^\/]*))?(?:\/(?P<Path>.+\/)*(?P<File>[^\/]+?(?:\.(?P<Extension>[^\/\.""]*?))?)?(?:\?(?P<Query>[^\?]*?))?(?:\#(?P<Fragment>[^\#]*?))?)?$"
Guest10
Posts: 578
Joined: 01 Oct 2013, 02:50

Re: Url Parser

10 May 2014, 17:07

atnbueno wrote:Thanks lifeweaver for the links. I'll look into it (btw, hilarious function name).

And Guest10, my code covers ftp, http and https. Try it.
thanks, tried and it is not just good...it is FANTASTIC! :ugeek:
Bruttosozialprodukt
Posts: 462
Joined: 24 Jan 2014, 22:28

Re: Url Parser

11 May 2014, 07:57

@GeekDude This regex is freaking amazing! (I don't fully understand it, but it seems pretty waterproof.)
I just love it!

For those of you who are insterested in a CrackUrl solution:

Code: Select all

url := CrackUrl("http://user:[email protected]:80/qwe/index.php?asd=123#test")
MsgBox % url.scheme "://" url.userName ":" url.password "@" url.hostName ":" url.port url.urlPath url.extraInfo

CrackUrl(url) {
    VarSetCapacity(myStruct,60,0)
    numput(60,myStruct,0,"Uint") ; this dll function requires this to be set
    numput(1,myStruct,8,"Uint") ; SchemeLength
    numput(1,myStruct,20,"Uint") ; HostNameLength
    numput(1,myStruct,32,"Uint") ; UserNameLength
    numput(1,myStruct,40,"Uint") ; PasswordLength
    numput(1,myStruct,48,"Uint") ; UrlPathLength
    numput(1,myStruct,56,"Uint") ; ExtraInfoLength
    DllCall("Winhttp.dll\WinHttpCrackUrl","PTR",&url,"UInt",StrLen(url),"UInt",0,"PTR",&myStruct)
    
    urlObj := Object()
    urlObj.scheme := StrGet(NumGet(myStruct,4,"Ptr"),NumGet(myStruct,8,"UInt"))
    urlObj.userName := StrGet(NumGet(myStruct,28,"Ptr"),NumGet(myStruct,32,"UInt"))
    urlObj.password := StrGet(NumGet(myStruct,36,"Ptr"),NumGet(myStruct,40,"UInt"))
    urlObj.hostName := StrGet(NumGet(myStruct,16,"Ptr"),NumGet(myStruct,20,"UInt"))
    urlObj.port := NumGet(myStruct,24,"Int")
    urlObj.urlPath := StrGet(NumGet(myStruct,44,"Ptr"),NumGet(myStruct,48,"UInt"))
    urlObj.extraInfo := StrGet(NumGet(myStruct,52,"Ptr"),NumGet(myStruct,56,"UInt"))
    Return urlObj
}
Big thanks to RHCP who teached me a lot of stuff about DllCalls and pretty much did all the work on this one!
User avatar
lmstearn
Posts: 332
Joined: 11 Aug 2016, 02:32
GitHub: lmstearn
Contact:

Re: Url Parser

10 Apr 2019, 05:34

Thanks. There are a variety of regex validation parsers on offer, but will there ever be one to rule them all?
In addition to @GeekDude's, this one for absolute URIs comes close, but what's the performance cost?

Code: Select all

/^[a-z](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;[email protected]])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?$/i
There is also this. Currently using the following, but it hasn't been given a rigorous testing:

Code: Select all

^(https?://|www\.)[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$")
The Regular Expressions Cookbook looks like it has some of the answers. Grabbed a copy here. :)
Last edited by lmstearn on 10 Apr 2019, 22:57, edited 1 time in total.
:arrow: itros "ylbbub eht tuO kaerB" a ni kcuts m'I pleH
SOTE
Posts: 1035
Joined: 15 Jun 2015, 06:21

Re: Url Parser

10 Apr 2019, 06:37

I'm not trying to rain on any Regex/Regular Expressions parades, but using InStr, SubStr, and Loop Parse are arguably more straightforward and are often visually easier to comprehend. Though I won't argue that if someone is really good at Regex, it can be a moot point.
User avatar
lmstearn
Posts: 332
Joined: 11 Aug 2016, 02:32
GitHub: lmstearn
Contact:

Re: Url Parser

10 Apr 2019, 09:32

If raining on this off-topic brain fried regex parade consider the following from the above linked Cookbook:
You cannot create a regular expression that matches every valid URL without matching any invalid URLs. The reason is that pretty much anything could be a valid URL in some as of yet uninvented scheme.
How about an irregular expression to do it? The assertion that the set of valid URLs is dependent on time infers the set of valid URLs is indeterminate, although it must be finite, so long as a string variable is finite. The set of invalid URLs is also finite, although in this case a tested sample from that group justifies its status.
Back to AHK, the related problem of estimating the proportion of valid URLs vs. invalid URLs by Monte Carlo methods on random samples provides further insights.
:arrow: itros "ylbbub eht tuO kaerB" a ni kcuts m'I pleH
User avatar
DuyMinh
Posts: 24
Joined: 05 May 2017, 08:34
Facebook: DuyMinh
Google: Duy Minh

Re: Url Parser

15 Apr 2019, 20:15

This is my function URLSplit for my HttpRequest lib... Maybe it's useful for someone.
SciTE_KEbIkDsJne.png
SciTE_KEbIkDsJne.png (33.23 KiB) Viewed 2536 times

Code: Select all

__HTTPRequest_URLSplit(sURL) {
	Local oResult := {}
	; ------------------------------------------------------------
	; Khởi tạo các biến key, value của object mặc định
	oResult.Port := 80, oResult.Scheme := "http", oResult.iSchemeFlag := 1
	; ------------------------------------------------------------
	; Tách scheme và domain
	try {
		RegExMatch(sURL, "iO)^\h*(?:(?:(https?)|(ftp)|(wss?))://)?(www\.)?(.*?)\h*$", aURL1)
		; ------------------------------------------------------------
		; FPT - using WinINet.dll
		if (aURL1[2]) {
			oResult.iSchemeFlag := 3
			oResult.Port := 0
		}
		; ------------------------------------------------------------
		; Websocket
		else if (aURL1[3]) {
			; Check version of windows support wss
			if (RegExMatch(A_OSVersion, "^(10|8|8\.1)"))
				Return __HTTPRequest_DebugError(A_ThisFunc, "Websocket chỉ áp dụng cho Windows 8.1 trở lên", "", 1)
			; Set các tham số của wss giống https
			if (aURL1[3] = "wss" || aURL1[3] = "ws") {
				oResult.iSchemeFlag := 2
				oResult.Scheme := "https"
				oResult.Port := 443
				oResult.WebsocketRequest := 1
			}
		}
		; ------------------------------------------------------------
		; HTTPS
		else if (aURL1[1] = "https") {
			oResult.iSchemeFlag := 2
			oResult.Scheme := "https"
			oResult.Port := 443
		}
	} catch {
		return __HTTPRequest_DebugError(A_ThisFunc, __HTTPRequest_GetLastErrorString(A_LastError), "", A_LastError)
	}

	; ------------------------------------------------------------
	; Tách Username, Password, Credential và URL
	try {
		RegExMatch(aURL1[5], "O)^(?:(\w+):(\w+)@)?(.+)$", aURL2)
		oResult.Username := aURL2[1]
		oResult.Password := aURL2[2]
	} catch {
		return __HTTPRequest_DebugError(A_ThisFunc, __HTTPRequest_GetLastErrorString(A_LastError), "", A_LastError)
	}
	; ------------------------------------------------------------
	; Tách Host và URI
	try {
		RegExMatch(aURL2[3], "O)^([^\/\:]+)(?::(\d+))?(/.*)?($)", aURL3)
		; ------------------------------------------------------------
		; Nếu là localhost hoặc không có protocol
		if (aURL1[1] == "" && !RegExMatch(aURL3[1], "\.\w+$") || aURL3[1] = "localhost")
			return __HTTPRequest_DebugError(A_ThisFunc, "sURL sai định dạng chuẩn", "", 2)
		oResult.Host := RegExReplace(aURL1[4] . aURL3[1], "(\#[\w\-]+)$", "", "", 1)
		oResult.URL := aURL3[3]
		; ------------------------------------------------------------
		; Nếu tách được port
		if (aURL3[2])
			oResult.Port := aURL3[2]
		; ------------------------------------------------------------
		; Cookie Domain
		oResult.CookieDomain := RegExReplace(oResult.Host, ".*?([\w\-]*?\.?[\w\-]+\.[\w\-]+)$", "$1")
	} catch {
		return __HTTPRequest_DebugError(A_ThisFunc, __HTTPRequest_GetLastErrorString(A_LastError), "", A_LastError)
	}
	return oResult
}

Return to “Scripts and Functions”

Who is online

Users browsing this forum: enrica, Smile_, Tom Harding and 30 guests