Change duplicate lines in file

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
smbs
Posts: 98
Joined: 27 Feb 2014, 11:07

Change duplicate lines in file

01 Oct 2017, 09:33

I have a large text file which cannot be sorted (line order must remain as is)- I want to append text to all duplicate lines (see snippet below) so no duplicate lines remain in file
Many thanx

Code: Select all

hhh  mike
some string
k kkpeter
some string 1
h uhjon
some string
45paul
some string
h uhjon
some string
k kkpeter
some string
3344555
some string
k kkpeter
some sting


*****************Result should be 


hhh  mike
some string
k kkpeter
some string 1
h uhjon
some string
77777
some string
h uhjon-extra1      (this line add text because exists above)
some string
k kkpeter-extra1    (this line add text because exists above)
some string
3344555
some string
k kkpeter-extra2    (this line add text because exists above 2 added because 1 exists)
some sting
User avatar
dmg
Posts: 287
Joined: 02 Oct 2013, 01:43
Location: "Twelve days north of Hopeless and a few degrees south of Freezing to Death"
Contact:

Re: Change duplicate lines in file

01 Oct 2017, 10:50

"My dear Mr Gyrth, I am never more serious than when I am joking."
~Albert Campion
------------------------------------------------------------------------
Website | Demo scripts | Blog | External contact
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Change duplicate lines in file

01 Oct 2017, 11:32

Here's an example, these things can be quite simple with arrays. I prepend a z to each key name, to avoid any interference with array methods.

Code: Select all

q:: ;mark duplicate lines (case insensitive)
vText := "a b c d e f a b c g h i a b c"
vText := StrReplace(vText, " ", "`n")
StrReplace(vText, "`n", "", vCount), vCount += 1
(oArray := {}).SetCapacity(vCount)
vSfx := "-extra"
vOutput := ""
VarSetCapacity(vOutput, StrLen(vText)*2)
Loop, Parse, vText, `n, `r
{
	vTemp := A_LoopField
	if oArray.HasKey("z" vTemp)
		vOutput .= vTemp vSfx oArray["z" vTemp] "`r`n", oArray["z" vTemp] += 1
	else
		vOutput .= vTemp "`r`n", oArray["z" vTemp] := 1
}
MsgBox, % vOutput

vOutput2 := ""
for vKey, vValue in oArray
	vOutput2 .= SubStr(vKey, 2) " " vValue "`r`n"
MsgBox, % vOutput2

oArray := ""
return
==================================================

A case sensitive version:

Code: Select all

q:: ;mark duplicate lines (case sensitive)
vText := "a b c d e f a b c g h i a b c"
vText := StrReplace(vText, " ", "`n")
StrReplace(vText, "`n", "", vCount), vCount += 1
oDict := ComObjCreate("Scripting.Dictionary")
vSfx := "-extra"
vOutput := ""
VarSetCapacity(vOutput, StrLen(vText)*2)
Loop, Parse, vText, `n, `r
{
	vTemp := A_LoopField
	if oDict.Exists("" vTemp)
		vOutput .= vTemp vSfx oDict.Item("" vTemp) "`r`n", oDict.Item("" vTemp) += 1
	else
		vOutput .= vTemp "`r`n", oDict.Item("" vTemp) := 1
}
MsgBox, % vOutput

vOutput2 := ""
for vKey in oDict
	vOutput2 .= vKey " " oDict.Item(vKey) "`r`n"
MsgBox, % vOutput2

oArray := ""
return
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
smbs
Posts: 98
Joined: 27 Feb 2014, 11:07

Re: Change duplicate lines in file

01 Oct 2017, 16:03

Brilliant !!!
Trying to understand how it works but beyond me--can't make head or tails of your code!
If you have the patience could u explain.
Anyway thanx very much for the solution it works great for what I need to do.
Again many thanx!!
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Change duplicate lines in file

01 Oct 2017, 16:31

- Create an array (and set the capacity equal to the number of lines).
- Create a variable for the output text (and prepare it with a large capacity).
- (In both cases setting the capacity improves performance but isn't strictly necessary.)
- Parse each line of text.
- If the line hasn't been seen before, add it as a key to the array with value 1, and append the line to the output variable.
- If the line has been seen before, increment its value in the array by 1, and append the line to the output variable, and append the suffix text and number.
- Then a demo to show the contents of the array.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
smbs
Posts: 98
Joined: 27 Feb 2014, 11:07

Re: Change duplicate lines in file

02 Oct 2017, 04:17

Many thanx
Studied your code and feel I really learnt a lot regarding working with arrays!
Thanx for your patience and sharing your knowledge
Regards
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Change duplicate lines in file

02 Oct 2017, 07:07

Cheers. Here are two more points:

This forces a string rather than a number: ("" vTemp)
This forces a number rather than a string: (vTemp + 0)
Key names can be numeric or strings. Keeping all key names as strings, even when they look numeric, can avoid certain numeric keys with slightly different appearances overwriting each other:

Code: Select all

q::
oArray := {}
oArray[1] := "a"
oArray[01] := "b" ;overwrites first key
oArray["1"] := "c"
oArray["01"] := "d"
vOutput := ""
for vKey, vValue in oArray
	vOutput .= vKey " " vValue "`r`n"
MsgBox, % vOutput
oArray := ""
return
(oArray := {}).SetCapacity(vCount)
This is a one-liner for:
oArray := {}
oArray.SetCapacity(vCount)
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
smbs
Posts: 98
Joined: 27 Feb 2014, 11:07

Re: Change duplicate lines in file

03 Nov 2017, 06:31

Many thanx again

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: gongnl, Joey5, RandomBoy and 339 guests