AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

UnHTM() :: Remove HTML formatting from a String

 
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Scripts & Functions
View previous topic :: View next topic  
Author Message
SKAN



Joined: 26 Dec 2005
Posts: 7159

PostPosted: Thu Nov 19, 2009 5:25 pm    Post subject: UnHTM() :: Remove HTML formatting from a String Reply with quote

Please do not expect UnHTM() to unformat a whole HTML file. If you have already parsed out a string, and need to unformat it to plain text, then UnHTM() would be handy.

Code:
UnHTM( HTM ) { ; Remove HTML formatting / Convert to ordinary text     by SKAN 19-Nov-2009
 Static HT     ; Forum Topic: www.autohotkey.com/forum/topic51342.html
 IfEqual,HT,,   SetEnv,HT, % "&aacuteá&acircâ&acute´&aeligæ&agraveà&amp&aringå&atildeã&au"
 . "mlä&bdquo„&brvbar¦&bull•&ccedilç&cedil¸&cent¢&circˆ&copy©&curren¤&dagger†&dagger‡&deg"
 . "°&divide÷&eacuteé&ecircê&egraveè&ethð&eumlë&euro€&fnofƒ&frac12½&frac14¼&frac34¾&gt>&h"
 . "ellip…&iacuteí&icircî&iexcl¡&igraveì&iquest¿&iumlï&laquo«&ldquo“&lsaquo‹&lsquo‘&lt<&m"
 . "acr¯&mdash—&microµ&middot·&nbsp &ndash–&not¬&ntildeñ&oacuteó&ocircô&oeligœ&ograveò&or"
 . "dfª&ordmº&oslashø&otildeõ&oumlö&para¶&permil‰&plusmn±&pound£&quot""&raquo»&rdquo”&reg"
 . "®&rsaquo›&rsquo’&sbquo‚&scaronš&sect§&shy­&sup1¹&sup2²&sup3³&szligß&thornþ&tilde˜&tim"
 . "es×&trade™&uacuteú&ucircû&ugraveù&uml¨&uumlü&yacuteý&yen¥&yumlÿ
"
 TXT := RegExReplace( HTM,"<[^>]+>" )               ; Remove all tags between  "<" and ">"
 Loop, Parse, TXT, &`;                              ; Create a list of special characters
   L := "&" A_LoopField ";", R .= (!(A_Index&1)) ? ( (!InStr(R,L,1)) ? L:"" ) : ""
 StringTrimRight, R, R, 1
 Loop, Parse, R , `;                                ; Parse Special Characters
  If F := InStr( HT, A_LoopField )                  ; Lookup HT Data
    StringReplace, TXT,TXT, %A_LoopField%`;, % SubStr( HT,F+StrLen(A_LoopField), 1 ), All
  Else If ( SubStr( A_LoopField,2,1)="#" )
    StringReplace, TXT, TXT, %A_LoopField%`;, % Chr(SubStr(A_LoopField,3)), All
Return RegExReplace( TXT, "(^\s*|\s*$)")            ; Remove leading/trailing white spaces
}


Code:
; Example:
HTM = <a href="/intl/en/ads/">Advertising&nbsp;Programs</a>
MsgBox, % UnHTM( HTM )


Thanks to AGermanUser for NeedleRegEx

Smile





Code:
; Array of Special Character Entities was created with following code
Loop % 256-33 {
Transform, F, HTML, % Chr( A := A_Index+33 )
If Strlen(F) > 1 && !Instr( F, "#" )
  list .= "&" SubStr(F,2, StrLen(F)-2) Chr(A )
}
StringLower, List, List
Sort, List, D& U
Clipboard := List
MsgBox, 0, % StrLen( List ), % Clipboard

_________________
Suresh Kumar A N
Back to top
View user's profile Send private message
hugov



Joined: 27 May 2007
Posts: 2181

PostPosted: Thu Nov 19, 2009 5:46 pm    Post subject: Reply with quote

Hi Skan,

what would be the difference between yours and "[stdlib] unHTML - Strips Tags and Entities from given Source" by derRaphael?
http://www.autohotkey.com/forum/topic38183.html
_________________
Tut 4 Newbies
TF : Text file & string lib, TF Forum
Back to top
View user's profile Send private message Visit poster's website
SKAN



Joined: 26 Dec 2005
Posts: 7159

PostPosted: Thu Nov 19, 2009 9:08 pm    Post subject: Reply with quote

HugoV wrote:
what would be the difference between yours and .... by derRaphael?


Well. There is also StripHTMLItems() posted by Jamie. The functionality between these three are more or less the same but I wrote mine from scratch for it to be Standalone and Compact... and to be an accessory to StrX() [a wrapper for SubStr()] that I am about to post. The examples for StrX() will be pointing towards this thread.
Back to top
View user's profile Send private message
hugov



Joined: 27 May 2007
Posts: 2181

PostPosted: Thu Nov 19, 2009 9:28 pm    Post subject: Reply with quote

Can't wait Very Happy
_________________
Tut 4 Newbies
TF : Text file & string lib, TF Forum
Back to top
View user's profile Send private message Visit poster's website
First Toy Lab



Joined: 15 Nov 2009
Posts: 21
Location: London, UK

PostPosted: Thu Nov 19, 2009 9:39 pm    Post subject: Reply with quote

Well done!
_________________


http://www.autohotkey.net/~FirstToyLab/
Back to top
View user's profile Send private message MSN Messenger
SKAN



Joined: 26 Dec 2005
Posts: 7159

PostPosted: Fri Nov 20, 2009 12:25 am    Post subject: Reply with quote

Thanks "First Toy Labs". Smile

HugoV wrote:
Can't wait Very Happy


Hope I do not disappoint you! Smile

StrX() : http://www.autohotkey.com/forum/topic51354.html
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Scripts & Functions All times are GMT
Page 1 of 1

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group