Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Generating unique random integers


  • Please log in to reply
86 replies to this topic
Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
In this particular case the following script is pretty fast:
totalitem = 200000
startitem = 100000

List = 0
Loop % totalitem-1   ; collect all possible values (all different)
   List .= "`n" . A_Index
Sort List, Random    ; very fast random re-order

Loop Parse, List, `n ; for the first 100000 items in the list
   IfGreater A_Index,100000, Break
   Else File .= "`nItem = " A_LoopField + startitem

FileAppend %File%, items.txt
- If the data fits in memory, it is much faster to collect it in an AHK variable and write it with one FileAppend command to a file, than append the data piecewise.
- If the range of the values is not too large, and the number of requested entries is not much less than all of them, the fastest procedure is the following:
--- write all possible values in a variable
--- rearrange them in random order with the Sort..Random command
--- use the beginning of this randomized list
This ensures that all entries are different, and because the bulk of the work is done with compiled code in the Sort command, it is fast.

polyethene
  • Administrators
  • 5517 posts
  • Last active: Nov 15 2014 07:55 AM
  • Joined: 26 Oct 2012
If you want a truly unique random number why not use a time based algorithm like my UUID function. According to the wiki the probably of collision is so rare that "1 trillion UUIDs have to be created every nanosecond for 10 billion years to exhaust the number of UUIDs." Since AutoHotkey is not used for critical crypto security purposes I think it will satisfy most users, in fact the Random command itself is very good anyway. Here's a modified version that returns a 64-bit int which can of course be brought down to any s.f. using integer or float divide:

uuidInt() {
	static n = 0, l
	t := A_Now
	t -= 1970, s
	t := (t . A_MSec) * 10000 + 122192928000000000
	Random, z, 0x10000, 0xfffff
	t += n += l = A_Now, l := A_Now
	Return, z << 40 ^ t | z >> 2 & 0xf
}


Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
Unfortunately, it does not work: when you reduce the range to the desired one (100000..200000) you map a huge number of UUID values to same number, so very often there will be collisions.

There are many fast invertible functions published with very complicated structure (like ciphers), which provide pseudorandom sequences with absolutely no repetitions in their range: f(key,0), f(key,1), f(key,2)…. They cannot be distinguished from true random (using reasonable computational resources). However, the problem is the same with them: if the range of the desired random numbers is smaller than that of the function f, any mapping would introduce repetitions.

polyethene
  • Administrators
  • 5517 posts
  • Last active: Nov 15 2014 07:55 AM
  • Joined: 26 Oct 2012

when you reduce the range to the desired one (100000..200000) you map a huge number of UUID values to same number

What do you mean? If you raise LSB would it be such a problem? A performance counter can be used in place of unix epoch for microsecond precision.

There are many fast invertible functions published with very complicated structure (like ciphers), which provide pseudorandom sequences with absolutely no repetitions in their range

What like fibonacci+n? Although AutoHotkey doesn't have support for generators a function could yield a static variable, so I don't see how complicated that would be.

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005

when you reduce the range to the desired one (100000..200000) you map a huge number of UUID values to the same number

What do you mean?

Let us assume that you generate 2 digit numbers, all different: 11, 20, 51, 22, 70, but you only need one digit random numbers. You can return the LS digits: 1, 0, 1, 2, 0. They are not different, even though the original 2-digit numbers were. The request was to generate 100,000 different random numbers between 100,000 and 200,000. You cannot return 64-bit integers instead, so you have to reduce the range, for example, taking your 64-bit numbers mod 200,001 and add 100,000. Here the mod operator has identical results for 0, 200001, 400002, 600003…, that is, you don't necessarily provide unique numbers in the desired range.

There are many fast invertible functions published with very complicated structure (like ciphers), which provide pseudorandom sequences with absolutely no repetitions in their range

What like fibonacci+n? Although AutoHotkey doesn't have support for generators a function could yield a static variable, so I don't see how complicated that would be.

Fibonacci(n) is not invertible and defined on all integers, while we need a finite range. What I meant is, e.g. f(x) := DES(key,x) is a key dependent function, which is invertible by the decryption function g(y) := DES1(key,y): DES1(key,DES(key,x)) = x. Because of the invertibility, f(x) = f(y) if and only if x = y mod 2**64, therefore f(0), f(1),…f(2**64-1) will be all different. But it does not help us solving the problem at hand.

There is another issue with the above uuidInt() function. The Random command is initialized with A_TickCount, which is reset at each start of the PC. We can assume 0..12 hours of operation when the Random command is first called, and it is seeded with one of 12*60*60*100 = 4,320,000 possible values (10 ms tick resolution). The result is only 4 million different random number sequences you could ever encounter in AHK.

Then you mix in A_Now, which is the same for everybody, except the seconds and ms values. Here you add 6,000 possible variations, only. The result is that if hundred thousand people generate these UUID values about the same time, the chance of a collision is close to 50%, far worse than the quoted trillions and billions let us think. If you used the performance counter in place of the tick counter, it could have improved the collision resistance only by 100 fold. Still not acceptable for cryptographic applications.

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
WARNING: Be careful. AHK's built in Random command uses the Mersenne Twister random number generator, initialized with A_TickCount. This is a quite poor function:
1. The number of different random sequences are about 4 million under normal circumstances, but never more than 2**32, which is too small for security related applications.
2. The internal tables are initialized by a weak linear congruential generator. Consequently, the first several hundred entries are correlated by very simple formulae.
3. Observing the generated random numbers an attacker can reconstruct the internal tables, and so he will know all the numbers to be generated in the future.
4. The period = 2**19937-1 is ridiculous. No one will ever be able to generate 2**80 random numbers, so a period of this length is sufficient.
5. It uses a large table, which is often swapped in- and out of the processor cash, slowing it down.

polyethene
  • Administrators
  • 5517 posts
  • Last active: Nov 15 2014 07:55 AM
  • Joined: 26 Oct 2012

the mod operator has identical results

I said diving not modulos, i.e. using a common divisor should ensure equidistant sequential terms. Significance is proportional to accuracy so a weaker cipher can only be expected for smaller range.

The Random command is initialized with A_TickCount, which is reset at each start of the PC. We can assume 0..12 hours of operation when the Random command is first called, and it is seeded with one of 12*60*60*100 = 4,320,000 possible values

Random reseeds every 7.2mins, so for a 20 bit integer the collision probably would be much higher, exponentially perhaps.

Then you mix in A_Now, which is the same for everybody

The MSB is inverted (z << 40) since the year/month values can be considered constant.

Still not acceptable for cryptographic applications.

Again, who in their right mind would use AutoHotkey for this?

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005

div[id]ing not modulos, i.e. using a common divisor should ensure equidistant sequential terms.

It does not matter: many different values mapped to the same shorter value, which, therefore, will not be unique.

Random reseeds every 7.2mins, so for a 20 bit integer the collision probably would be much higher, exponentially perhaps.

My mistake. Earlier it was the tick count.

the seed starts off as the low-order 32-bits of the 64-bit value that is the number of 100-nanosecond intervals since January 1, 1601. This value travels from 0 to 4294967295 every ~7.2 minutes.

It is not reseeding, but wrap around, before the first call of Random. Still, most likely, scripts cannot read this counter arbitrarily often, so the low order bits increment by a large amount, and the number of possible sequences will be much less than 2**32.

The MSB is inverted (z << 40) since the year/month values can be considered constant.

This was the point. The possible variation is little (6000): the ms and s values in 10 ms clock resolution.

who in their right mind would use AutoHotkey for [security]?

I, for example. There is no difference if you encrypt a file with an AHK script or another program. The same hash values are computed. If you send a random number (nonce) to some other entity, it does not matter if it was generated by an AHK script or not, if done properly. Of course, we don't intend to use scripts for protection against malware already running in your computer, but as long as your PC is clean, you can use scripts for many security applications.

polyethene
  • Administrators
  • 5517 posts
  • Last active: Nov 15 2014 07:55 AM
  • Joined: 26 Oct 2012

This was the point. The possible variation is little (6000)

I would have thought 2^20 but as I haven't studied the Mersenne Twister algorithm I can't be so sure.

Generating n-1 random numbers is still weak, the creation time can be learned and a perpetual list can be reverse engineered. uuidInt() hashes millisecond time counts (10-15x more precise than Random clock) with mixed random numbers which is harder to decipher, not to mention that it's dramatically quicker to generate per se.

There is no difference if you encrypt a file with an AHK script or another program.

Yes, speed (have you even tried...?)

jetskijoe
  • Members
  • 7 posts
  • Last active: Jun 11 2007 06:28 PM
  • Joined: 02 Jun 2006
Thanks that script did seem to work well.

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005

There is no difference if you encrypt a file with an AHK script or another program.

Yes, speed (have you even tried...?)

I have. If the file is small, memory resident AHK based encryption can finish faster than an external program is loaded into memory.

jetskijoe
  • Members
  • 7 posts
  • Last active: Jun 11 2007 06:28 PM
  • Joined: 02 Jun 2006
here is my other problem. Thanks again for all the help with the first one it did help but I need to do a count on each random number per store. I can't figure out the best way to do it so I did it via ini read and ini write commands I will post an example below. I found out that is the slow part of the script is there any other way to keep a count of the items. If anyone has any suggestions please let me know. Thanks again Joe:

stores = 200
estore = 100000
totalitem = 200000
startitem = 100000
Total_qty = 100
Store_no = 10000
List = 0
Loop % totalitem-1 {
List .= "`n" . A_Index
}

Loop %stores%
	{ 	
	Sort List, Random    ; very fast random re-order
	Loop Parse, List, `n
	{
		Random, Total_qty, 75, 100
	IfGreater A_Index, %estore%, Break
	Else 
		{
		itemno := A_LoopField + startitem
		itemqutxt =%itemqutxt%`n%itemno%,%Store_no%,%Total_qty%
		IniRead, num, count.ini, count, %itemno%
			num += 1
		IniWrite, %num%, count.ini, count, %itemno%
		}
}
FileAppend,%itemqutxt%, itemqu.txt
	store_no += 1
}


Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
Soon we reach the limits of AHK: your script reads/writes the ini file 2*200*200,000 times, which take a lot of time. You can avoid that by using arrays, but AHK was not designed to process 200,000 variables. The internal tables become too large and slow to search for a variable name. Still, it is orders of magnitude faster than 80 million file operations:
#SingleInstance Force

#NoEnv

SetBatchLines -1



stores    = 200

estore    = 100000

totalitem = 200000

startitem = 100000

Total_qty = 100

Store_no  = 10000



List = 0

Loop % totalitem-1 {

   List .= "`n" . A_Index

}



Loop %stores% {

   Sort List, Random

   Loop Parse, List, `n

      IfGreater A_Index,%estore%, Break

      Else {

         Random Total_qty, 75, 100

         itemno := A_LoopField + startitem

         itemqutxt = %itemqutxt%`n%itemno%,%Store_no%,%Total_qty%

         num%itemno% += 1

      }

   FileAppend %itemqutxt%, itemqu.txt

   store_no += 1

}



Loop %totalitem% {

    itemno := startitem + A_Index

    If (num%itemno% <> "")

       items .= itemno . " = " . num%itemno% . "`n"

}

FileAppend %items%, itemcount.txt
It looks like the largest data set processed with AHK so far. Please post the running time you measure and the parameters of your machine. It looks like an excellent benchmark for interpreted languages.

jetskijoe
  • Members
  • 7 posts
  • Last active: Jun 11 2007 06:28 PM
  • Joined: 02 Jun 2006
Will do thanks for the help. I am going to let it run over night on a dual 3.0gh hyper threaded with 3 gig of memory. I will post my times after. I have noticed that AHK is not multi processor so it seems to only use 25% of the cpu. Would it be a good idea to run 4 sets of the script each doing 50 stores ?

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
Yes, this parallel processing should give you a 4 fold speedup, but even in one thread it should not take more than a couple of hours.