How to find duplicate lines? Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
afe
Posts: 615
Joined: 06 Dec 2018, 04:36

How to find duplicate lines?

16 Jan 2019, 07:51

Hello,

How do I find duplicate lines and assign duplicate lines to another variable?


Such as,

Code: Select all

a := "
(
1
2
3
2
1
)"
Expected value of b is

1
2
2
1

Thanks.
User avatar
sinkfaze
Posts: 616
Joined: 01 Oct 2013, 08:01

Re: How to find duplicate lines?  Topic is solved

16 Jan 2019, 08:56

Code: Select all

a=
(
1
2
3
2
1
)
temp :=	b:=	a	; pass contents to two variables
Sort, temp, U		; remove duplicates from temp variable
Loop, parse, temp, `n, `r	; parse through temp variable contents
{
	StrReplace(b,A_LoopField,A_LoopField,c)	; count number of occurrences
	if	(c=1)	; if line occurs only once, remove
		b :=	StrReplace(b,A_LoopField)
}
b :=	RegExReplace(RegExReplace(b,"`a)^\v+|\v+$"),"\v+","`n")	; whitespace cleanup
MsgBox %	b
return
afe
Posts: 615
Joined: 06 Dec 2018, 04:36

Re: How to find duplicate lines?

16 Jan 2019, 23:37

Thank you very much.
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: How to find duplicate lines?

16 Jan 2019, 23:47

afe wrote:
16 Jan 2019, 23:37
Thank you very much.
If this solved your problem, you might want to show this as being solved. :dance:
This will help other users too, because by seeing that this was solved, they know they can find a viable solution and answer inside of the thread.
afe
Posts: 615
Joined: 06 Dec 2018, 04:36

Re: How to find duplicate lines?

17 Jan 2019, 01:31

I know, I am still thinking about this answer.
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: How to find duplicate lines?

17 Jan 2019, 22:37

afe wrote:
17 Jan 2019, 01:31
I know, I am still thinking about this answer.
Oops, thought you had the answer you needed. I will give you another reliable solution which you can use that is a little more straightforward.

Code: Select all

BadData =
(
Bad
Home
Bad
Love
Good
Heaven
Bad
)

msgbox % BadData

GoodData :=""

Loop, parse, BadData, `n, `r
If not InStr(GoodData, A_LoopField) 
{
   GoodData .= GoodData ? "`n" : ""
   GoodData .= A_LoopField
}

msgbox % GoodData
The below is with explanations.

Code: Select all

BadData =
(
Bad
Home
Bad
Love
Good
Heaven
Bad
)

msgbox % BadData ; show contents of variable

GoodData :=""  ; blank variable

Loop, parse, BadData, `n, `r  ; Loops through BadData based on the new line delimiter, "`n", and creates the A_LoopField for each
If not InStr(GoodData, A_LoopField)  ; If what is in A_LoopField is not in GoodData then do the below...
{
   GoodData .= GoodData ? "`n" : ""  ; ternary operator- "?" = If true, ":" = else... If true then do this (new line), else do that (blank)...
   GoodData .= A_LoopField  ; add contents of A_LoopField per each loop, ".=" is for concatenating
}

msgbox % GoodData  ; show contents of variable
afe
Posts: 615
Joined: 06 Dec 2018, 04:36

Re: How to find duplicate lines?

18 Jan 2019, 01:46

Thank you very much. But I want to find duplicates instead of removing duplicates.

Since my actual code needs to match two items in each line, I used the following code.

Code: Select all

a := "
(
1,1
2,2
3
2,3
1,1
)"

Loop, Parse, a, `n
{
	n := 0
	RegExMatch(A_LoopField, "^\d", x1)
	RegExMatch(A_LoopField, "(?<=,)\d", y1)

	Loop, Parse, a, `n
	{
		RegExMatch(A_LoopField, "^\d", x2)
		if ( x2 = x1 )
		{
			RegExMatch(A_LoopField, "(?<=,)\d", y2)
			if ( y2 = y1 )
			{
				++n
				if ( n = 2 )
				{
					r .= A_LoopField . "`n"
					break
				}
			}
		}
	}
}
r := RegExReplace(r, "`n$")
msgbox % r
return
carno
Posts: 265
Joined: 20 Jun 2014, 16:48

Re: How to find duplicate lines?

18 Jan 2019, 03:15

This archived thread also provides a few other methods:
https://autohotkey.com/board/topic/9168 ... ith-regex/
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: How to find duplicate lines?

18 Jan 2019, 03:26

afe wrote:
18 Jan 2019, 01:46
Thank you very much. But I want to find duplicates instead of removing duplicates.

Since my actual code needs to match two items in each line, I used the following code.
If you are still thinking of a simplistic way to solve it, using the same method, below will work.
It will work on both your original and new example, to find duplicate rows. However, keep in mind that it parses per row based on the delimiter "`n"

Code: Select all

BadData =
(
1
2
3
2
1
)

msgbox % BadData

GoodData :=""
Duplicate :=""

Loop, parse, BadData, `n, `r
{
	If not InStr(GoodData, A_LoopField) 
	{
		GoodData .= A_LoopField "`n"
	}
	else
	{
		Duplicate .= A_LoopField "`n"
	}
}

msgbox % GoodData
msgbox % Duplicate
afe
Posts: 615
Joined: 06 Dec 2018, 04:36

Re: How to find duplicate lines?

18 Jan 2019, 03:47

Thank you very much.
However, I expect the value of Duplicate to be

1
2
2
1
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: How to find duplicate lines?

18 Jan 2019, 08:35

afe wrote:
18 Jan 2019, 03:47
Thank you very much.
However, I expect the value of Duplicate to be

1
2
2
1
Well, you do have some other ways, such as eliminating duplicates and putting duplicates in a variable. Just for fun, I wanted to see if you can use the same simplistic method to come out with your expected answer. I don't know if it will be of any use, but as a thought experiment, here you go. :lol:

Code: Select all

BadData =
(
1
2
3
2
1
)

msgbox % BadData

GoodData :=""
Duplicate :=""
UniqueData :=""
AllDuplicates :=""

Loop, parse, BadData, `n, `r
{
	If not InStr(GoodData, A_LoopField) 
	{
		GoodData .= A_LoopField "`n"
	}
	else
	{
		Duplicate .= A_LoopField "`n"
	}
}
Loop, parse, GoodData, `n, `r
{
	If not InStr(Duplicate, A_LoopField) 
	{
		UniqueData .= A_LoopField "`n"
	}
}
Loop, parse, BadData, `n, `r
{
	If not InStr(UniqueData, A_LoopField) 
	{
		AllDuplicates .= A_LoopField "`n"
	}
}

msgbox % Duplicate
msgbox % UniqueData
msgbox % AllDuplicates
Last edited by SOTE on 20 Jan 2019, 02:36, edited 1 time in total.
afe
Posts: 615
Joined: 06 Dec 2018, 04:36

Re: How to find duplicate lines?

18 Jan 2019, 09:13

Thank you. I understand your algorithm. Since the first loop does not make GoodData a unique value, it must perform a second loop to filter again.
User avatar
AlphaBravo
Posts: 586
Joined: 29 Sep 2013, 22:59

Re: How to find duplicate lines?

18 Jan 2019, 12:34

I don't mean to pick on you sinkfaze, but there is a flaw in the answer due to the use of StrReplace()
try it on this input

Code: Select all

a=
(
1
2
3
32
2
1
)
b := a
elements := []
loop, parse, b, `n, `r
	elements[A_LoopField] := elements[A_LoopField] ? elements[A_LoopField] + 1 : 1
for element, count in elements
	if count =1
		b := RegExReplace(b, "`am)^\Q" element "\E$\R*")
MsgBox % b
carno
Posts: 265
Joined: 20 Jun 2014, 16:48

Re: How to find duplicate lines?

19 Jan 2019, 02:43

I was using sinkfaze's version for a very long file with over 4,700 lines and noticed it not only finds identical lines but also strings within those lines (and in my longer file version, space, hyphen and some other cases as duplicates) as demonstrated in this short version below:

Code: Select all

#NoEnv
#SingleInstance Force

a := "
(
this is example1
this is example2
this is example3 - this is example2 - this is example1
this is example4 - this is example3 - this is example4
this is example3
)"

temp :=	b:=	a	; pass contents to two variables
Sort, temp, U		; remove duplicates from temp variable
Loop, parse, temp, `n, `r	; parse through temp variable contents
{
	StrReplace(b,A_LoopField,A_LoopField,c)	; count number of occurrences
	if	(c=1)	; if line occurs only once, remove
		b :=	StrReplace(b,A_LoopField)
}
b :=	RegExReplace(RegExReplace(b,"`a)^\v+|\v+$"),"\v+","`n")	; whitespace cleanup
MsgBox %	b
Clipboard := b
return
Last edited by carno on 19 Jan 2019, 11:59, edited 2 times in total.
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: How to find duplicate lines?

19 Jan 2019, 03:06

carno wrote:
19 Jan 2019, 02:43
I was using sinkfaze's version for a very long file with over 4,700 lines and noticed it not only finds identical lines but also strings within those lines (and in my longer file version, space and dash/hyphen cases as duplicates) as demonstrated in this short version below:

Code: Select all

#NoEnv
#SingleInstance Force

a := "
(
Join
this is example1
this is example2
this is example3 - this is example2 - this is example1
this is example4 - this is example3 - this is example4
this is example3
)"

temp :=	b:=	a	; pass contents to two variables
Sort, temp, U		; remove duplicates from temp variable
Loop, parse, temp, `n, `r	; parse through temp variable contents
{
	StrReplace(b,A_LoopField,A_LoopField,c)	; count number of occurrences
	if	(c=1)	; if line occurs only once, remove
		b :=	StrReplace(b,A_LoopField)
}
b :=	RegExReplace(RegExReplace(b,"`a)^\v+|\v+$"),"\v+","`n")	; whitespace cleanup
MsgBox %	b
Clipboard := b
return
In your example, you don't really have duplicating lines. You have a partial duplicate, which is this is example3. There are arguably other partial duplicates, but that point is debatable since they are not unique to that row. My version (which is going by row) finds your partial duplicate, and puts it in the Duplicate variable, but will be blank for all duplicates (AllDuplicates) because you don't have a completely duplicate row. The great thing is that the different ways to solve the duplicate line issue, will give you different solutions with various strengths and weaknesses.

If you had a totally duplicate row such as the below, the script I gave will find it. It will find the partial (by row and put into the Duplicate variable) and it will find the completely duplicate rows and put into the variable AllDuplicates.

Code: Select all

this is example1
this is example2
this is example3 - this is example2 - this is example1
this is example4 - this is example3 - this is example4
this is example3
this is example4 - this is example3 - this is example4
carno
Posts: 265
Joined: 20 Jun 2014, 16:48

Re: How to find duplicate lines?

19 Jan 2019, 12:32

I ran your version with my big file of over 4,700 lines. It seems case-sensitive and flexible with various options as you mentioned. Thanks! :)
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: How to find duplicate lines?

19 Jan 2019, 14:26

I have some content: FREQUENCY COUNT and REMOVE DUPLICATES, here:
jeeswg's objects tutorial - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=7&t=29232

It uses AHK's built-in associative array or a Scripting.Dictionary object to achieve the counts/remove duplicates.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
sinkfaze
Posts: 616
Joined: 01 Oct 2013, 08:01

Re: How to find duplicate lines?

21 Jan 2019, 08:30

AlphaBravo wrote:
18 Jan 2019, 12:34
I don't mean to pick on you sinkfaze, but there is a flaw in the answer due to the use of StrReplace()
Not picking on me at all, I was tailoring an answer to the specific data without thinking about the bigger picture. Thanks for catching it!

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: doodles333 and 330 guests