Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Convert HTML to Text


  • Please log in to reply
15 replies to this topic
Jon
  • Members
  • 349 posts
  • Last active: Aug 30 2011 08:35 PM
  • Joined: 28 Apr 2004
http://www.theabsolu...sware/#viewhtml

VIEWHTML v2.5: Quickie HTML Viewer

Quickie HTML viewer and converter. Are you like me in prefering to read text in easy-to-read DOS text mode? A very useful tool for viewing web documents off-line, with hypertext linking and colours. You can also use this program to simply convert HTML documents to standard text files. Uses extended memory for loading large files. You can now also copy text to the Windows clipboard. Open links in your main browser with the click of the mouse. This program can save a lot of time.

- Click here for screen shot -

DOWNLOAD Viewhtml (71kb .ZIP, for DOS, also runs in Windows)


It sems to work reasonably well. I've only tested it on a couple HTML documents.

Usage-

VH (to be prompted for file to view)
VH filename[.htm] (to view file)
VH /b filename.htm (to convert to .txt)
VH /b *.htm (to convert all .htm files to .txt)
VH /? (for help)

Jimmy2Times
  • Members
  • 65 posts
  • Last active: Jul 15 2014 07:56 PM
  • Joined: 07 Apr 2004
God bless you - I was looking for this!
Should clear the forest a bit when parsing html files :)



-by the way, how did you find this if I may ask? I had looked alot (download.com, google,zdnet,winfiles,tucows) and couldn't find anything good.

BoBo
  • Guests
  • Last active:
  • Joined: --
PureText 2.0 95/98/Me/NT/2000/XP/2003

Have you ever copied some text from a web page, a word document, help, etc., and wanted to paste it as simple text into another application without getting all the formatting from the original source? PureText makes this simple. Just copy/cut whatever you want to the clipboard, click on the PureText tray icon, and then paste to any application. Better yet, you can configure a hot-key to convert and paste the text for you. The pasted text will be pure and free from all formatting.

[More...]
[Download]

Jon
  • Members
  • 349 posts
  • Last active: Aug 30 2011 08:35 PM
  • Joined: 28 Apr 2004

-by the way, how did you find this if I may ask? I had looked alot (download.com, google,zdnet,winfiles,tucows) and couldn't find anything good.


I've looked for something to do this in the past as well but never found anything. I came accross it by accident while I was doing a search in google. I found it here-

http://www.freewareh... ... ine_t.html

lingoist
  • Members
  • 122 posts
  • Last active: Jan 28 2014 03:50 PM
  • Joined: 05 Oct 2004
Do you think there would be an AHK solution to convert HTML into TXT?

Thanks,
Lingoist

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
It should be easy with some regular expressions, for example.
It depends on what you want on the output, pure streamed text, or something more structured, taking in account

,
or
,

  • and so on...
  • Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

    lingoist
    • Members
    • 122 posts
    • Last active: Jan 28 2014 03:50 PM
    • Joined: 05 Oct 2004
    I just wanted:

    - cells of tables in html to become TAB separated (%A_Tab%)
    -
    or

    as linefeed (`n)
    - ignore
    - etc...


    PhiLho
    • Moderators
    • 6850 posts
    • Last active: Jan 02 2012 10:09 PM
    • Joined: 27 Dec 2005
    A bit too simplistic, a number of tags provide paragraph separation...
    Here is a quick and dirty script that might do the job on a number of simple files...
    FileSelectFile htmlFile, 1, , , *.html
    FileRead data, %htmlFile%
    data := RegExReplace(data, "m)^ +")
    StringReplace data, data, `r`n, , All
    StringReplace data, data, `n, , All
    StringReplace data, data, %A_Tab%, %A_Space%, All
    data := RegExReplace(data, " +", " ")
    data := RegExReplace(data, "", "`n")
    data := RegExReplace(data, "<p ?.*?>", "`n")
    data := RegExReplace(data, "<h\d ?.*?>", "`n")
    data := RegExReplace(data, "<li ?.*?>", "`n")
    data := RegExReplace(data, "<td ?.*?>", "`t")
    data := RegExReplace(data, "<td ?.*?>", "`t")
    data := RegExReplace(data, "<.*?>")
    FileAppend %data%, ResultText.txt
    

    Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

    lingoist
    • Members
    • 122 posts
    • Last active: Jan 28 2014 03:50 PM
    • Joined: 05 Oct 2004

    A bit too simplistic, a number of tags provide paragraph separation...
    Here is a quick and dirty script that might do the job on a number of simple files...

    FileSelectFile htmlFile, 1, , , *.html
    FileRead data, %htmlFile%
    data := RegExReplace(data, "m)^ +")
    StringReplace data, data, `r`n, , All
    StringReplace data, data, `n, , All
    StringReplace data, data, %A_Tab%, %A_Space%, All
    data := RegExReplace(data, " +", " ")
    data := RegExReplace(data, "", "`n")
    data := RegExReplace(data, "<p ?.*?>", "`n")
    data := RegExReplace(data, "<h\d ?.*?>", "`n")
    data := RegExReplace(data, "<li ?.*?>", "`n")
    data := RegExReplace(data, "<td ?.*?>", "`t")
    data := RegExReplace(data, "<td ?.*?>", "`t")
    data := RegExReplace(data, "<.*?>")
    FileAppend %data%, ResultText.txt
    


    Thank you very much PhiLho!! I would just add 3 lines to your script, ok!

    		FileSelectFile htmlFile, 1, , , *.html
    		FileRead data, %htmlFile%
    		data := RegExReplace(data, "m)^ +")
    		StringReplace data, data, `r`n, , All
    		StringReplace data, data, `n, , All
    		StringReplace data, data, %A_Tab%, %A_Space%, All
    		data := RegExReplace(data, " +", " ")
    		data := RegExReplace(data, "", "`n")
    		data := RegExReplace(data, "<p ?.*?>", "`n")
    		data := RegExReplace(data, "<h\d ?.*?>", "`n")
    		data := RegExReplace(data, "<li ?.*?>", "`n")
    		data := RegExReplace(data, "<td ?.*?>", "`t")
    		data := RegExReplace(data, "<tr ?.*?>", "`n")
    		data := RegExReplace(data, "<td ?.*?>", "`t")
    		data := RegExReplace(data, "<.*?>")
    		StringReplace data, data,  , %A_Space%, All
    
    		FileAppend %data%, ResultText.txt
    		MsgBox %data%
    
    


    incith
    • Members
    • 130 posts
    • Last active: Apr 03 2010 03:08 AM
    • Joined: 01 Oct 2005
    How about converting a website to a jpeg/other image format? I'm trying to automate the processing of taking an hourly screen capture of a website. The problem comes in to play when you have to scroll, would be neat to make one long (dimensionally) jpeg file as a screenshot of the website. I searched and searched but could only find non-free utilities.

    haichen
    • Members
    • 200 posts
    • Last active: Oct 20 2013 01:14 PM
    • Joined: 05 Feb 2007
    @lingoist
    If you only want to copy some data from tables have a look at this Firefoxextension: table2Clipboard

    engunneer
    • Moderators
    • 9162 posts
    • Last active: Sep 12 2014 10:36 PM
    • Joined: 30 Aug 2005
    if you progrmmatically can scroll the page, and can take a screenshot every screenful (with 20-40% overlap), then you can stitch them together using a free tool such as nona, hugin, panotools, enblend, or other stitching software (you may want/need many of the tools, they tend to be collections of multiple tools to do the task.)

    incith
    • Members
    • 130 posts
    • Last active: Apr 03 2010 03:08 AM
    • Joined: 01 Oct 2005

    if you progrmmatically can scroll the page, and can take a screenshot every screenful (with 20-40% overlap), then you can stitch them together using a free tool such as nona, hugin, panotools, enblend, or other stitching software (you may want/need many of the tools, they tend to be collections of multiple tools to do the task.)

    This was essentially the idea I brewed up in my head last night, activate the window, send Alt + PrntScrn, , send PgDn, repeat... but as I looked through the help files I cannot find a way to tell when you are at the bottom of the document, to stop taking screenshots. Some way to monitor the scrollbar.

    engunneer
    • Moderators
    • 9162 posts
    • Last active: Sep 12 2014 10:36 PM
    • Joined: 30 Aug 2005
    if you compare the latest image with the previous one, and they are the same, then you were at the end of the page.

    incith
    • Members
    • 130 posts
    • Last active: Apr 03 2010 03:08 AM
    • Joined: 01 Oct 2005
    Wise! I'll have to whip something up in a while. Thanks for the idea.