remove lines present in both long sorted lists

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 03 Dec 2022, 17:31

To remove from both strings, set var3 equal to var1. When you remove a string from var2, remove it from var3. When I tested this, the time elapsed was the same.

It's possible that your supercomputer is old, and you just need to get a new one. :)

User avatar
Smile_
Posts: 857
Joined: 03 May 2020, 00:51

Re: remove lines present in both long sorted lists

Post by Smile_ » 03 Dec 2022, 20:59

@Terka

Code: Select all

SetBatchLines, -1

C := FileOpen("LISTC.txt", "w")     ; Output
D := {}

TK := A_TickCount
Loop, Read, LISTA.txt               ; Input A
    D["" A_LoopReadLine ""] := ""

Loop, Read, LISTB.txt               ; Input B
    D["" A_LoopReadLine ""] := ""

For Line in D
    C.Write(Line "`n")

C := ""

Msgbox, % (A_TickCount - TK) / 1000 " s"
Last edited by Smile_ on 04 Dec 2022, 10:24, edited 4 times in total.

User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 03 Dec 2022, 21:51

It's fast, but I don't think it meets the goal.

User avatar
Smile_
Posts: 857
Joined: 03 May 2020, 00:51

Re: remove lines present in both long sorted lists

Post by Smile_ » 04 Dec 2022, 07:11

mikeyww wrote:
03 Dec 2022, 21:51
It's fast, but I don't think it meets the goal.
Edited the above, it took me about 16 seconds working on 70 k lines with ~140 character long with each file and a ~75 % of difference.

User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 04 Dec 2022, 08:16

Is yours doing a union? I think it's supposed to be removing the duplicates.

This revision does seem to get the elapsed time down to approximately zero seconds.

Code: Select all

SetBatchLines -1
dir    = %A_ScriptDir%
out   := FileOpen(dir "\result.txt", "w `n")
line  := {}
start := A_TickCount
Loop, Read, %dir%\LISTA.txt
 line[A_LoopReadLine] := True
Loop, Read, %dir%\LISTB.txt
 (!line.HasKey(A_LoopReadLine)) && out.WriteLine(A_LoopReadLine)
out := ""
MsgBox, 64, Elapsed time, % A_TickCount - start " ms"
image221204-0837-001.png
Output
image221204-0837-001.png (6.44 KiB) Viewed 861 times

User avatar
Smile_
Posts: 857
Joined: 03 May 2020, 00:51

Re: remove lines present in both long sorted lists

Post by Smile_ » 04 Dec 2022, 10:21

You don't have to worry about duplication, because keys values are overwritten when there is an already defined key (as I noticed).

User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 04 Dec 2022, 10:36

It seems that your approach pools the lines instead of omitting any of them. My script omits the "intersecting" lines.

User avatar
Smile_
Posts: 857
Joined: 03 May 2020, 00:51

Re: remove lines present in both long sorted lists

Post by Smile_ » 04 Dec 2022, 11:07

mikeyww wrote:
04 Dec 2022, 10:36
It seems that your approach pools the lines instead of omitting any of them. My script omits the "intersecting" lines.
That depend on OP needs. Maybe I understood the OP wrongly.

User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 04 Dec 2022, 11:09

Maybe. This is what I understood.
Would like to remove lines that are identical in both lists.

malcev
Posts: 1769
Joined: 12 Aug 2014, 12:37

Re: remove lines present in both long sorted lists

Post by malcev » 04 Dec 2022, 11:40

mikeyww, You have bug in Your code

Code: Select all

a := 0
b := 00
line  := {}
line[a] := True
msgbox % line.HasKey(b)

User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 04 Dec 2022, 11:46

Fair enough.

Code: Select all

SetBatchLines -1
dir    = %A_ScriptDir%
out   := FileOpen(dir "\result.txt", "w `n")
line  := {}
start := A_TickCount
Loop, Read, %dir%\LISTA.txt
 line[A_LoopReadLine ""] := True
Loop, Read, %dir%\LISTB.txt
 (!line.HasKey(A_LoopReadLine "")) && out.WriteLine(A_LoopReadLine)
out := ""
MsgBox, 64, Elapsed time, % A_TickCount - start " ms"
Credit to Smile_ & malcev.

User avatar
Smile_
Posts: 857
Joined: 03 May 2020, 00:51

Re: remove lines present in both long sorted lists

Post by Smile_ » 04 Dec 2022, 11:52

Another thing try with this example: (Got blank result)

LISTA.txt:

Code: Select all

1
2
4
5
11
12
LISTB.txt:

Code: Select all

1
2
11
12
Supposed to give this right?

Code: Select all

4
5
Last edited by Smile_ on 04 Dec 2022, 11:54, edited 1 time in total.

User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 04 Dec 2022, 11:53

No. The script removes from list B the lines that are in list A.

When the script loops through list B, it writes the output only if list A did not contain that line (key).

User avatar
Smile_
Posts: 857
Joined: 03 May 2020, 00:51

Re: remove lines present in both long sorted lists

Post by Smile_ » 04 Dec 2022, 12:12

Yes I got you, LISTB.txt has no more identical lines from LISTA.txt, so they are totally different lines.
I mean with "Supposed to give this right?" what the OP wants
Hi all, have 2 lists, both sorted. Would like to remove lines that are identical in both lists.
So I thought he would like to remove identical lines from both sides (LineA.txt & LineB.txt) and leave only lines that are not.
Last edited by Smile_ on 04 Dec 2022, 12:14, edited 1 time in total.

User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 04 Dec 2022, 12:14

Yes, you are right. I showed an example for one side. The other side could be done in another 32 ms! Or could have an array with the intersection, and then write the lines without those.

Technique below: for each text line, a new array dimension is added, representing each list. After lists are read, text lines in the array with two items in the second dimension are present in both lists. Therefore, the output is written only when the number of items in the second dimension is less than two, because that means that the text line is not present in both lists.

Code: Select all

SetBatchLines -1
dir  = %A_ScriptDir%
out := [], line := [], start := A_TickCount
For each, fn in input := ["LISTA", "LISTB"] {
 Loop, Read, %dir%\%fn%.txt
  line[A_LoopReadLine "", fn] := True
 out[fn] := FileOpen(dir "\" fn "-result.txt", "w `n")
}
For each, fn in input
 Loop, Read, %dir%\%fn%.txt
  (line[A_LoopReadLine ""].Count() < input.Count()) && out[fn].WriteLine(A_LoopReadLine)
out := ""
MsgBox, 64, Time elapsed, % A_TickCount - start " ms"
image221204-1356-001.png
Output
image221204-1356-001.png (6.34 KiB) Viewed 712 times

Descolada
Posts: 1099
Joined: 23 Dec 2021, 02:30

Re: remove lines present in both long sorted lists

Post by Descolada » 04 Dec 2022, 15:24

My naive approach which probably sucks:

Code: Select all

SetBatchLines -1

; Assumes sorted arrays with no duplicates
arr1 := [1,2,4,5,11,12], arr2 := [1,2,11,12], new1 := [], new2 := []
;arr1 := ["line 1", "line 2", "line3"], arr2 := ["line 1", "line 2", "line8"]

i := 1, j := 1
Loop {
    if (i > arr1.MaxIndex()) {
        j--
        loop, % arr2.MaxIndex()-j
            new2.Push(arr2[j+A_Index])
        break
    }
    if (j > arr2.MaxIndex()) {
        i--
        loop, % arr1.MaxIndex()-i
            new1.Push(arr1[i+A_Index])
        break
    }
    if (arr1[i] == arr2[j])
        i++, j++
    else {
        if arr1[i] < arr2[j]
            new1.Push(arr1[i]), i++
        else
            new2.Push(arr2[j]), j++
    }
}

out1 := ""
for _, v in new1
    out1 .= v ", "
out1 := SubStr(out1, 1, -2)

out2 := ""
for _, v in new2
    out2 .= v ", "
out2 := SubStr(out2, 1, -2)

MsgBox, % "First array: " out1 "`nSecond array: " out2

User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 04 Dec 2022, 16:42

For some reason, it did not seem to work when I tried it with text files.

Terka
Posts: 157
Joined: 05 Nov 2015, 04:59

Re: remove lines present in both long sorted lists

Post by Terka » 04 Dec 2022, 17:47

@mikeyww 400ms, great!! thank you wery much!
if you would need to learn some sport send me a message, can help.

User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: remove lines present in both long sorted lists

Post by mikeyww » 04 Dec 2022, 17:50

Even if you tried to teach me, it wouldn't help in my case! :D

Terka
Posts: 157
Joined: 05 Nov 2015, 04:59

Re: remove lines present in both long sorted lists

Post by Terka » 05 Dec 2022, 02:11

everybody is an expert in another area ;)

Post Reply

Return to “Ask for Help (v1)”