Hi. replaced the files, with files encoded as follows:
italics.ahk - UTF-8 WITH BOM PC line termination
lib.ahk - UTF-8 WITH BOM PC line termination and removed the double quotes
page_text_after_proofreading - UTF-8 WITHOUT BOM, LINUX line termination
Your version of "italics.ahk", above runs through text of "page_text_after_proofreading.txt" but does nothing.
Identifying unique strings in a block of text
Re: Identifying unique strings in a block of text
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
-
- Posts: 264
- Joined: 09 Mar 2019, 01:52
- Location: Germany
- Contact:
Re: Identifying unique strings in a block of text
I thought that, too. But there a not so many "hits" in your example.
Launch italic.ahk. Open the "page_text_after_proofreading.txt" in an editor, I prefer Notepad++. Click on the text. Press alt+F12.
After that search for a ' and you will finde them in every changed position.
Did you change the files? Yesterday I had other ones. No matter.
1st you don't need the "dict:=" in the lib.ahk anymore. This file is only the dictionay. You could name it lib.txt, so anybody can add more substitutions. So please remove all (don't forget the } at the end) except the pairs without quotes.
Follow the steps above and Tada!!!
Launch italic.ahk. Open the "page_text_after_proofreading.txt" in an editor, I prefer Notepad++. Click on the text. Press alt+F12.
After that search for a ' and you will finde them in every changed position.
Did you change the files? Yesterday I had other ones. No matter.
1st you don't need the "dict:=" in the lib.ahk anymore. This file is only the dictionay. You could name it lib.txt, so anybody can add more substitutions. So please remove all (don't forget the } at the end) except the pairs without quotes.
Follow the steps above and Tada!!!
Re: Identifying unique strings in a block of text
I use TextPad which is the predecessor of Notepad++. Have both, but prefer TP because I am using it for about 14 years. They are very close in features, but some things TP is better at.Kobaltauge wrote: ↑19 Sep 2019, 03:02
Launch italic.ahk. Open the "page_text_after_proofreading.txt" in an editor, I prefer Notepad++. Click on the text. Press alt+F12.
After that search for a ' and you will finde them in every changed position.
Did you change the files? Yesterday I had other ones. No matter.
1st you don't need the "dict:=" in the lib.ahk anymore. This file is only the dictionay. You could name it lib.txt, so anybody can add more substitutions. So please remove all (don't forget the } at the end) except the pairs without quotes.
I added some additional words, and noted on the top line the encoding and line termination. The script works exactly as you mentioned. The results are identical to yours, unfortunately there are over 100 matching references to the dictionary, and the few marked are incorrect.
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
-
- Posts: 264
- Joined: 09 Mar 2019, 01:52
- Location: Germany
- Contact:
Re: Identifying unique strings in a block of text
Sorry, I don't understand the problem.
I took a closer look and found this:
But that is unfortunately doing the right thing. In the lib the needle "Kingsborough's Mex. Antiq" is without "." at the end. Therefore, the needle is different to the text and it will not be found. I think it was your first problem, that you don't like to find the needle in the words.
I took a closer look and found this:
But that is unfortunately doing the right thing. In the lib the needle "Kingsborough's Mex. Antiq" is without "." at the end. Therefore, the needle is different to the text and it will not be found. I think it was your first problem, that you don't like to find the needle in the words.
Re: Identifying unique strings in a block of text
I know that there are duplicates because of typos and missing punctuation. Please ignore these. I only want to enclose those that are identical to the library. There are numerous variations of the same reference source and I try to include all variations. The book was published 1883, and these are result of poor typesetting and proofreading. Also, please don't waste your time on it and take a break.
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
Re: Identifying unique strings in a block of text
Kobaltauge, Thanks for all your help. For the time being I reworked my original, StrReplace() where the references are sorted in a descenting order based on their length. Everything gets enclosed but duplicate words are enclosed both ways. I am also considering using InString(). The problem is to be able to differentiate, is the key.
This is a part of the reworked code:
This is a part of the reworked code:
Code: Select all
it := "''"
in_put := "Jalisco or Nueva Galicia. Cartog. Pac. Coast"
out_put := it . in_put . it
clipwait, 5
clipboard := strreplace(clipboard, in_put, out_put)
in_put := "Provisiones, Cedulas, Instrumentos, etc"
out_put := it . in_put . it
clipwait, 5
clipboard := strreplace(clipboard, in_put, out_put)
in_put := "Alfonso el Sabio, Laz Siete, Partidas"
out_put := it . in_put . it
clipwait, 5
clipboard := strreplace(clipboard, in_put, out_put)
in_put := "Brasseur de Bourbourg, Hist. Nat. Civ"
out_put := it . in_put . it
clipwait, 5
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
-
- Posts: 264
- Joined: 09 Mar 2019, 01:52
- Location: Germany
- Contact:
Re: Identifying unique strings in a block of text
I had another idea. You could search with InString. Then compare the result with RegEx. If the it match, then go on. If it doesn't match, a pop up appears with the possibility to ad the new "needle" to the dictionary.
Probably we could build a possibility to correct it, too.
As I'm writing this, I think I try to reprogram the Word auto correction.
Probably we could build a possibility to correct it, too.
As I'm writing this, I think I try to reprogram the Word auto correction.
Re: Identifying unique strings in a block of text
It was 5am EST when your post notification arrived and by that time I was pretty incoherent. So, only now I got the chance to reply. I had an idea about InStr(). It finds everything without a problem, but two subsearches are needed (using the position of the main InStr), whether the quotes should be added. The subsearches are to check the previous 5-10(?) and the following 5-10(?) characters and see if it contains the quotes or not, and then act on it. I've never done a nested If() in AHK.Kobaltauge wrote: ↑21 Sep 2019, 04:06I had another idea. You could search with InString. Then compare the result with RegEx. If the it match, then go on. If it doesn't match, a pop up appears with the possibility to ad the new "needle" to the dictionary.
Probably we could build a possibility to correct it, too.
As I'm writing this, I think I try to reprogram the Word auto correction.
P.S: The more I think about, only the string's length -1 and +1 need to be checked if it is enclosed. This is the first to be tested.
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
-
- Posts: 264
- Joined: 09 Mar 2019, 01:52
- Location: Germany
- Contact:
Re: Identifying unique strings in a block of text
One problem was, that InStr() does only catch the first occurrence. I tested a few RegEx but didn't get the right combination.
With a little help of Google I found this https://autohotkey.com/board/topic/115744-instr-for-multiple-occurrences and could modify "our" script.
With findstr() it find all starting positions of the needle. For every position it gets the string and the leading and trailing 2 characters. After that it checks if this four are ''. If not, they will be added.
Now the lib.ahk should only contain the "needles".
With a little help of Google I found this https://autohotkey.com/board/topic/115744-instr-for-multiple-occurrences and could modify "our" script.
With findstr() it find all starting positions of the needle. For every position it gets the string and the leading and trailing 2 characters. After that it checks if this four are ''. If not, they will be added.
Now the lib.ahk should only contain the "needles".
Code: Select all
;2019-09-18 12:29 AM
;===================
;this file is #included in the autoexec.ahk
;italics.ahk
;Stolen from https://autohotkey.com/board/topic/115744-instr-for-multiple-occurrences/#entry669623
findstr(h,n,ic=1) ; h=haystack, n=needle ,ic=ignore case
{
while pos := regexmatch(h,(ic?"i)":"")"\Q" n "\E",m,a_index=1?1:pos+strlen(m))
fp .= pos " "
return trim(fp)
}
; Alt F12
!f12::
critical, on
autotrim, on
send, ^a^c ; select all text on the page,
dict := {}
Loop, Read, lib.ahk
{
row := StrSplit(A_LoopReadLine, ":")
key := Trim(row[1])
dict[key] := Trim(row[2])
}
clipwait, 5
;code ------------------------------------------------------------
for in_put, out_put in dict
{
strposis := StrSplit(findstr(clipboard, in_put), " ")
for index, strpos in strposis
{
match := SubStr(clipboard, strpos, StrLen(in_put))
match2 := SubStr(clipboard, (strpos-2), (StrLen(in_put)+4))
RegExMatch(match2,"^''" . match . "''?", found)
if !found
clipboard := RegExReplace(clipboard, match, "''" . match . "''",,,strpos)
}
}
;cleanup ---------------------------------------------------------
clipwait, 5
;if there are quadruple single quotes
in_put :="''''"
out_put :="''"
clipboard :=strreplace(clipboard, in_put, out_put)
clipwait, 5
;underscore does not exist in the original
in_put :="_"
out_put :=""
clipboard :=strreplace(clipboard, in_put, out_put)
send, ^v
critical, off
return
ExitApp
*Esc::
ExitApp
Re: Identifying unique strings in a block of text
Wow, thank you. I was also wondering if we could return to your earlier method. It's much easier to manage. I will let you know my results soon.
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
Re: Identifying unique strings in a block of text
I can't believe it. Thank you.
I had an error message because #Warn was enabled in the Autoexec.ahk. Disabled it and the results are instantaneous.
I had an error message because #Warn was enabled in the Autoexec.ahk. Disabled it and the results are instantaneous.
- Attachments
-
- fp_undeclared_local_var.jpg (240.89 KiB) Viewed 1119 times
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
Re: Identifying unique strings in a block of text
Kobaltauge, I removed two standalone references, "Id" and "Mex", because they could not be uniquely identifiable. Also inserted "sleep, 10" in the "for" loop because either it ran through the script instantly and did nothing, or inserted the '' quotes everywhere. Since then it's been a pleasure to work with. Many thanks for the your help.
Code: Select all
for in_put, out_put in dict
{
strposis := StrSplit(findstr(clipboard, in_put), " ")
for index, strpos in strposis
{
match := SubStr(clipboard, strpos, StrLen(in_put))
match2 := SubStr(clipboard, (strpos-2), (StrLen(in_put)+4))
RegExMatch(match2,"^''" . match . "''?", found)
if !found
clipboard := RegExReplace(clipboard, match, "''" . match . "''",,,strpos)
}
sleep, 10
}
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
-
- Posts: 264
- Joined: 09 Mar 2019, 01:52
- Location: Germany
- Contact:
Who is online
Users browsing this forum: Google [Bot] and 285 guests