remove lines present in both long sorted lists
remove lines present in both long sorted lists
Hi all, have 2 lists, both sorted. Would like to remove lines that are identical in both lists.
Can somebody please help with the algorithm to be more effective than m*n (compare each line from 1st list with each line from 2nd list)?
Thank you!!
Can somebody please help with the algorithm to be more effective than m*n (compare each line from 1st list with each line from 2nd list)?
Thank you!!
Re: remove lines present in both long sorted lists
The answer will depend on your definition of "more effective".
Re: remove lines present in both long sorted lists
https://biga-ahk.github.io/biga.ahk/#/?id=intersection or https://biga-ahk.github.io/biga.ahk/#/?id=difference may work
They don't employ any superfast tricks or hacks though.
Code: Select all
A := new biga() ; requires https://github.com/biga-ahk/biga.ahk
linesArr1 := ["line 1", "line 2", "line3"]
linesArr2 := ["line 1", "line 2", "line8"]
presentOnly := A.concat(A.difference(linesArr1, linesArr2), A.difference(linesArr2, linesArr1))
; => ["line3", "line8"]
Re: remove lines present in both long sorted lists
thank you Chunjee! Tested, anyway its not much fast if the lists are long.
Re: remove lines present in both long sorted lists
any other suggestion please ?
- flyingDman
- Posts: 2791
- Joined: 29 Sep 2013, 19:01
Re: remove lines present in both long sorted lists
This perhaps?: (tested on a 10,000 wordlist; it took ~1600 ms)
Code: Select all
var1 =
(
a
b
1
c
d
)
var2 =
(
a
c
d
e
f
g
h
j
)
var1 := strreplace(var1,"`n",",")
var2 := strreplace(var2,"`n",",")
for x,y in strsplit(var1,",")
if y not in %var2%
nvar1 .= y "`n"
msgbox % nvar1
for x,y in strsplit(var2,",")
if y not in %var1%
nvar2 .= y "`n"
msgbox % nvar2
14.3 & 1.3.7
Re: remove lines present in both long sorted lists
thank you, but still for my lists 40 minutes.
Re: remove lines present in both long sorted lists
The .= can become quite slow with larger variables, using VarSetCapacity() will speed things up, so
VarSetCapacity(nvar1, 10240000) ; ~10 MB should help
or try StrLen VarSetCapacity(nvar1, StrLen(var1)) (the new output will be smaller as the original so no need to go bigger)
VarSetCapacity(nvar1, 10240000) ; ~10 MB should help
or try StrLen VarSetCapacity(nvar1, StrLen(var1)) (the new output will be smaller as the original so no need to go bigger)
Re: remove lines present in both long sorted lists
If you're not tied to AutoHotkey solutions only, I would look into awk a cmd line utility
Re: remove lines present in both long sorted lists
the original code is this:
if a match is found, how can i remove the found line from the 2nd list (contentD)
if a match is found, how can i remove the found line from the 2nd list (contentD)
Code: Select all
Loop, Parse, contentS, `n, `r
{
StringSplit, arrayS, A_LoopField, |
Loop, Parse, contentD, `n, `r
{
StringSplit, arrayD, A_LoopField, |
If (arrayS1 = arrayD1) { ; if same files
If (arrayS2 <> arrayD2) { ; if different pathes
FileMove, % driveD . arrayD2, % driveD . arrayS2, 0 ;-(1 = overwrite)
Break
}
}
}
}
- flyingDman
- Posts: 2791
- Joined: 29 Sep 2013, 19:01
Re: remove lines present in both long sorted lists
How long are your lists and what is the length (min max) of each line ?
14.3 & 1.3.7
Re: remove lines present in both long sorted lists
40 mins, my goodness
Have you remembered to SetBatchLines, -1
Re: remove lines present in both long sorted lists
SetBatchLines, -1 its there
lists are ~ 70k lines
i think if we remove the found line from the 2nd list (contentD) this will speedup the process.
lists are ~ 70k lines
i think if we remove the found line from the 2nd list (contentD) this will speedup the process.
Re: remove lines present in both long sorted lists
Code: Select all
SetBatchLines -1
dir := A_ScriptDir
out = %dir%\result.txt
FileRead, var1, %dir%\test1.txt
FileRead, var2, %dir%\test2.txt
Sort, var1, U
Loop, Parse, var1, `n, `r
lines++
var2 := "`n" var2 "`r"
start := A_TickCount
Loop, Parse, var1, `n, `r
(Instr(var2, "`n" A_LoopField "`r"))
&& var2 := "`n" Trim(StrReplace(var2, "`n" A_LoopField "`r"), "`r`n") "`r"
var2 := Trim(var2, "`r`n")
sec := Round((A_TickCount - start) / 1000, 1)
FileRecycle, %out%
FileAppend, %var2%, %out%
; Run, %out%
MsgBox, 64, Result, Lines = %lines%`n`nSeconds = %sec%
Re: remove lines present in both long sorted lists
Thank you for helping, really appreciate it.
Lines in the list are ~120 chars long. Running your script but its 10+ minutes and still running.
have ryzen 3900x, can there be something else wrong?
Lines in the list are ~120 chars long. Running your script but its 10+ minutes and still running.
have ryzen 3900x, can there be something else wrong?
Re: remove lines present in both long sorted lists
700seconds, yes, this is improvement. Thank you!
But as i said, might be there something else wrong?
But as i said, might be there something else wrong?
Re: remove lines present in both long sorted lists
My lists are all numbers or perhaps I have messed up some other way.
Re: remove lines present in both long sorted lists
Or could be both.
Last edited by mikeyww on 03 Dec 2022, 17:31, edited 2 times in total.
Re: remove lines present in both long sorted lists
I'm lost. Who is var3