[CMD/COM/DLL] xdoc2txt - Extract text from pdf/doc/xls...

Discuss other useful utilities, general computing tips & tricks, Internet resources, etc.
tmplinshi
Posts: 1604
Joined: 01 Oct 2013, 14:57

[CMD/COM/DLL] xdoc2txt - Extract text from pdf/doc/xls...

Post by tmplinshi » 12 Oct 2013, 11:41

xdoc2txt can convert many document formats to txt, without needed to install Acrobat and WORD.

Supported formats:
Image

Three versions provided:
  • xdoc2txt.exe - Command line tool
  • xd2txcom.dll - COM component version
  • xd2txlib.dll - Dll version
Examples:
Command line example:

Code: Select all

xdoc2txt.exe -8 test.doc | iconv -f utf-8 -c
Dll example:

Code: Select all

if !A_IsUnicode {
	MsgBox, Please use unicode AutoHotkey to run.
	ExitApp
}

xdoc2txt_load(1)
MsgBox, % xdoc2txt("test.doc")
xdoc2txt_load(0)

xdoc2txt(fileName) { ; by HotKeyIt (http://ahkscript.org/boards/viewtopic.php?f=5&t=267&p=2157#p9515)
	fileLength := DllCall("xd2txlib\ExtractText", "Str", fileName, "Int", False, "Int*", fileText)
	Return StrGet( fileText, fileLength / 2 )
}

xdoc2txt_load(Load := True) {
	static hModule

	if Load
		Return, hModule := DllCall("LoadLibrary", "Str", "xd2txlib.dll")
	else
		Return, DllCall("FreeLibrary", UInt, hModule)
}
Homepage: http://ebstudio.info/home/xdoc2txt.html

hasantr
Posts: 933
Joined: 05 Apr 2016, 14:18
Location: İstanbul

Re: [CMD/COM/DLL] xdoc2txt - Extract text from pdf/doc/xls...

Post by hasantr » 24 Nov 2020, 08:42

Thanks tmplinshi. I really take advantage of this. But one problem is forcing me too much. Some pdf format files downloaded from the internet will be blocked automatically. When trying to open those files it crashes. Since I am working with files in the local network, block checking and unblock methods do not work properly.
I wonder. Do you have any solution that we can work with xd2txcom.dll?. Thank you sir.

Post Reply

Return to “Other Utilities & Resources”