Page 2 of 3

Re: remove lines present in both long sorted lists

Posted: 03 Dec 2022, 17:31
by mikeyww
To remove from both strings, set var3 equal to var1. When you remove a string from var2, remove it from var3. When I tested this, the time elapsed was the same.

It's possible that your supercomputer is old, and you just need to get a new one. :)

Re: remove lines present in both long sorted lists

Posted: 03 Dec 2022, 20:59
by Smile_
@Terka

Code: Select all

SetBatchLines, -1

C := FileOpen("LISTC.txt", "w")     ; Output
D := {}

TK := A_TickCount
Loop, Read, LISTA.txt               ; Input A
    D["" A_LoopReadLine ""] := ""

Loop, Read, LISTB.txt               ; Input B
    D["" A_LoopReadLine ""] := ""

For Line in D
    C.Write(Line "`n")

C := ""

Msgbox, % (A_TickCount - TK) / 1000 " s"

Re: remove lines present in both long sorted lists

Posted: 03 Dec 2022, 21:51
by mikeyww
It's fast, but I don't think it meets the goal.

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 07:11
by Smile_
mikeyww wrote:
03 Dec 2022, 21:51
It's fast, but I don't think it meets the goal.
Edited the above, it took me about 16 seconds working on 70 k lines with ~140 character long with each file and a ~75 % of difference.

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 08:16
by mikeyww
Is yours doing a union? I think it's supposed to be removing the duplicates.

This revision does seem to get the elapsed time down to approximately zero seconds.

Code: Select all

SetBatchLines -1
dir    = %A_ScriptDir%
out   := FileOpen(dir "\result.txt", "w `n")
line  := {}
start := A_TickCount
Loop, Read, %dir%\LISTA.txt
 line[A_LoopReadLine] := True
Loop, Read, %dir%\LISTB.txt
 (!line.HasKey(A_LoopReadLine)) && out.WriteLine(A_LoopReadLine)
out := ""
MsgBox, 64, Elapsed time, % A_TickCount - start " ms"
image221204-0837-001.png
Output
image221204-0837-001.png (6.44 KiB) Viewed 914 times

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 10:21
by Smile_
You don't have to worry about duplication, because keys values are overwritten when there is an already defined key (as I noticed).

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 10:36
by mikeyww
It seems that your approach pools the lines instead of omitting any of them. My script omits the "intersecting" lines.

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 11:07
by Smile_
mikeyww wrote:
04 Dec 2022, 10:36
It seems that your approach pools the lines instead of omitting any of them. My script omits the "intersecting" lines.
That depend on OP needs. Maybe I understood the OP wrongly.

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 11:09
by mikeyww
Maybe. This is what I understood.
Would like to remove lines that are identical in both lists.

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 11:40
by malcev
mikeyww, You have bug in Your code

Code: Select all

a := 0
b := 00
line  := {}
line[a] := True
msgbox % line.HasKey(b)

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 11:46
by mikeyww
Fair enough.

Code: Select all

SetBatchLines -1
dir    = %A_ScriptDir%
out   := FileOpen(dir "\result.txt", "w `n")
line  := {}
start := A_TickCount
Loop, Read, %dir%\LISTA.txt
 line[A_LoopReadLine ""] := True
Loop, Read, %dir%\LISTB.txt
 (!line.HasKey(A_LoopReadLine "")) && out.WriteLine(A_LoopReadLine)
out := ""
MsgBox, 64, Elapsed time, % A_TickCount - start " ms"
Credit to Smile_ & malcev.

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 11:52
by Smile_
Another thing try with this example: (Got blank result)

LISTA.txt:

Code: Select all

1
2
4
5
11
12
LISTB.txt:

Code: Select all

1
2
11
12
Supposed to give this right?

Code: Select all

4
5

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 11:53
by mikeyww
No. The script removes from list B the lines that are in list A.

When the script loops through list B, it writes the output only if list A did not contain that line (key).

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 12:12
by Smile_
Yes I got you, LISTB.txt has no more identical lines from LISTA.txt, so they are totally different lines.
I mean with "Supposed to give this right?" what the OP wants
Hi all, have 2 lists, both sorted. Would like to remove lines that are identical in both lists.
So I thought he would like to remove identical lines from both sides (LineA.txt & LineB.txt) and leave only lines that are not.

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 12:14
by mikeyww
Yes, you are right. I showed an example for one side. The other side could be done in another 32 ms! Or could have an array with the intersection, and then write the lines without those.

Technique below: for each text line, a new array dimension is added, representing each list. After lists are read, text lines in the array with two items in the second dimension are present in both lists. Therefore, the output is written only when the number of items in the second dimension is less than two, because that means that the text line is not present in both lists.

Code: Select all

SetBatchLines -1
dir  = %A_ScriptDir%
out := [], line := [], start := A_TickCount
For each, fn in input := ["LISTA", "LISTB"] {
 Loop, Read, %dir%\%fn%.txt
  line[A_LoopReadLine "", fn] := True
 out[fn] := FileOpen(dir "\" fn "-result.txt", "w `n")
}
For each, fn in input
 Loop, Read, %dir%\%fn%.txt
  (line[A_LoopReadLine ""].Count() < input.Count()) && out[fn].WriteLine(A_LoopReadLine)
out := ""
MsgBox, 64, Time elapsed, % A_TickCount - start " ms"
image221204-1356-001.png
Output
image221204-1356-001.png (6.34 KiB) Viewed 765 times

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 15:24
by Descolada
My naive approach which probably sucks:

Code: Select all

SetBatchLines -1

; Assumes sorted arrays with no duplicates
arr1 := [1,2,4,5,11,12], arr2 := [1,2,11,12], new1 := [], new2 := []
;arr1 := ["line 1", "line 2", "line3"], arr2 := ["line 1", "line 2", "line8"]

i := 1, j := 1
Loop {
    if (i > arr1.MaxIndex()) {
        j--
        loop, % arr2.MaxIndex()-j
            new2.Push(arr2[j+A_Index])
        break
    }
    if (j > arr2.MaxIndex()) {
        i--
        loop, % arr1.MaxIndex()-i
            new1.Push(arr1[i+A_Index])
        break
    }
    if (arr1[i] == arr2[j])
        i++, j++
    else {
        if arr1[i] < arr2[j]
            new1.Push(arr1[i]), i++
        else
            new2.Push(arr2[j]), j++
    }
}

out1 := ""
for _, v in new1
    out1 .= v ", "
out1 := SubStr(out1, 1, -2)

out2 := ""
for _, v in new2
    out2 .= v ", "
out2 := SubStr(out2, 1, -2)

MsgBox, % "First array: " out1 "`nSecond array: " out2

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 16:42
by mikeyww
For some reason, it did not seem to work when I tried it with text files.

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 17:47
by Terka
@mikeyww 400ms, great!! thank you wery much!
if you would need to learn some sport send me a message, can help.

Re: remove lines present in both long sorted lists

Posted: 04 Dec 2022, 17:50
by mikeyww
Even if you tried to teach me, it wouldn't help in my case! :D

Re: remove lines present in both long sorted lists

Posted: 05 Dec 2022, 02:11
by Terka
everybody is an expert in another area ;)