remove lines present in both long sorted lists

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Terka
Posts: 157
Joined: 05 Nov 2015, 04:59

remove lines present in both long sorted lists

Post by Terka » 01 Dec 2022, 19:14

Hi all, have 2 lists, both sorted. Would like to remove lines that are identical in both lists.
Can somebody please help with the algorithm to be more effective than m*n (compare each line from 1st list with each line from 2nd list)?
Thank you!!

User avatar
mikeyww
Posts: 26588
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 01 Dec 2022, 20:31

The answer will depend on your definition of "more effective".

User avatar
Chunjee
Posts: 1400
Joined: 18 Apr 2014, 19:05
Contact:

Re: remove lines present in both long sorted lists

Post by Chunjee » 01 Dec 2022, 21:07

https://biga-ahk.github.io/biga.ahk/#/?id=intersection or https://biga-ahk.github.io/biga.ahk/#/?id=difference may work

Code: Select all

A := new biga() ; requires https://github.com/biga-ahk/biga.ahk


linesArr1 := ["line 1", "line 2", "line3"]
linesArr2 := ["line 1", "line 2", "line8"]

presentOnly := A.concat(A.difference(linesArr1, linesArr2), A.difference(linesArr2, linesArr1))
; => ["line3", "line8"]
They don't employ any superfast tricks or hacks though.

Terka
Posts: 157
Joined: 05 Nov 2015, 04:59

Re: remove lines present in both long sorted lists

Post by Terka » 02 Dec 2022, 04:50

thank you Chunjee! Tested, anyway its not much fast if the lists are long.

Terka
Posts: 157
Joined: 05 Nov 2015, 04:59

Re: remove lines present in both long sorted lists

Post by Terka » 02 Dec 2022, 17:59

any other suggestion please ?

User avatar
flyingDman
Posts: 2791
Joined: 29 Sep 2013, 19:01

Re: remove lines present in both long sorted lists

Post by flyingDman » 02 Dec 2022, 20:57

This perhaps?: (tested on a 10,000 wordlist; it took ~1600 ms)

Code: Select all

var1 =
(
a
b
1
c
d
)

var2 =
(
a
c
d
e
f
g
h
j
)

var1 := strreplace(var1,"`n",",")
var2 := strreplace(var2,"`n",",")

for x,y in strsplit(var1,",")
	if y not in %var2%
		nvar1 .= y "`n"
msgbox % nvar1

for x,y in strsplit(var2,",")
	if y not in %var1%
		nvar2 .= y "`n"
msgbox % nvar2
14.3 & 1.3.7

Terka
Posts: 157
Joined: 05 Nov 2015, 04:59

Re: remove lines present in both long sorted lists

Post by Terka » 03 Dec 2022, 07:07

thank you, but still for my lists 40 minutes.

ahk7
Posts: 574
Joined: 06 Nov 2013, 16:35

Re: remove lines present in both long sorted lists

Post by ahk7 » 03 Dec 2022, 07:44

The .= can become quite slow with larger variables, using VarSetCapacity() will speed things up, so
VarSetCapacity(nvar1, 10240000) ; ~10 MB should help
or try StrLen VarSetCapacity(nvar1, StrLen(var1)) (the new output will be smaller as the original so no need to go bigger)

ahk7
Posts: 574
Joined: 06 Nov 2013, 16:35

Re: remove lines present in both long sorted lists

Post by ahk7 » 03 Dec 2022, 07:47

If you're not tied to AutoHotkey solutions only, I would look into awk a cmd line utility

Terka
Posts: 157
Joined: 05 Nov 2015, 04:59

Re: remove lines present in both long sorted lists

Post by Terka » 03 Dec 2022, 08:45

the original code is this:
if a match is found, how can i remove the found line from the 2nd list (contentD)

Code: Select all

Loop, Parse, contentS, `n, `r
{
   StringSplit, arrayS, A_LoopField, |
   Loop, Parse, contentD, `n, `r
   {
      StringSplit, arrayD, A_LoopField, |
      If (arrayS1 = arrayD1) { ; if same files
         If (arrayS2 <> arrayD2) { ; if different pathes
            FileMove, % driveD . arrayD2, % driveD . arrayS2, 0  ;-(1 = overwrite)
            Break
         }
      }
   }
}

User avatar
flyingDman
Posts: 2791
Joined: 29 Sep 2013, 19:01

Re: remove lines present in both long sorted lists

Post by flyingDman » 03 Dec 2022, 11:58

How long are your lists and what is the length (min max) of each line ?
14.3 & 1.3.7

User avatar
Chunjee
Posts: 1400
Joined: 18 Apr 2014, 19:05
Contact:

Re: remove lines present in both long sorted lists

Post by Chunjee » 03 Dec 2022, 12:01

Terka wrote:
03 Dec 2022, 07:07
thank you, but still for my lists 40 minutes.
40 mins, my goodness :oops:

Have you remembered to SetBatchLines, -1

Terka
Posts: 157
Joined: 05 Nov 2015, 04:59

Re: remove lines present in both long sorted lists

Post by Terka » 03 Dec 2022, 12:51

SetBatchLines, -1 its there
lists are ~ 70k lines
i think if we remove the found line from the 2nd list (contentD) this will speedup the process.

User avatar
mikeyww
Posts: 26588
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 03 Dec 2022, 14:08

Code: Select all

SetBatchLines -1
dir := A_ScriptDir
out  = %dir%\result.txt
FileRead, var1, %dir%\test1.txt
FileRead, var2, %dir%\test2.txt
Sort, var1, U
Loop, Parse, var1, `n, `r
 lines++
var2  := "`n" var2 "`r"
start := A_TickCount
Loop, Parse, var1, `n, `r
 (Instr(var2, "`n" A_LoopField "`r"))
  && var2 := "`n" Trim(StrReplace(var2, "`n" A_LoopField "`r"), "`r`n") "`r"
var2 := Trim(var2, "`r`n")
sec  := Round((A_TickCount - start) / 1000, 1)
FileRecycle, %out%
FileAppend, %var2%, %out%
; Run, %out%
MsgBox, 64, Result, Lines = %lines%`n`nSeconds = %sec%
image221203-1520-002.png
Output
image221203-1520-002.png (7.76 KiB) Viewed 784 times

Terka
Posts: 157
Joined: 05 Nov 2015, 04:59

Re: remove lines present in both long sorted lists

Post by Terka » 03 Dec 2022, 16:50

Thank you for helping, really appreciate it.
Lines in the list are ~120 chars long. Running your script but its 10+ minutes and still running.
have ryzen 3900x, can there be something else wrong?

Terka
Posts: 157
Joined: 05 Nov 2015, 04:59

Re: remove lines present in both long sorted lists

Post by Terka » 03 Dec 2022, 16:54

700seconds, yes, this is improvement. Thank you!
But as i said, might be there something else wrong?

Terka
Posts: 157
Joined: 05 Nov 2015, 04:59

Re: remove lines present in both long sorted lists

Post by Terka » 03 Dec 2022, 16:59

Please how can i modify the script so that the output is
shortened test1 and test2 (the same lines removed)?
mikeyww wrote:
03 Dec 2022, 14:08


User avatar
mikeyww
Posts: 26588
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 03 Dec 2022, 17:29

Or could be both. :)
Last edited by mikeyww on 03 Dec 2022, 17:31, edited 2 times in total.


Post Reply

Return to “Ask for Help (v1)”