AutoHotkey Community

It is currently May 27th, 2012, 5:24 am

All times are UTC [ DST ]




Post new topic Reply to topic  [ 17 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Random Number Generator
PostPosted: December 30th, 2005, 3:06 am 
Just on a whim, I decided to make a random number generator for the lottery. In the code below, it correctly generates six non repeating numbers to represent the 6 numbers that are drawn and it loops as many times as I need (in this program--5000) to get hundreds of random lottery drawings and pastes them to the Random_1.txt file. It then takes these figures, counts how many times each number is used, and prints them to the Random_2.txt file. Finally, it reorganizes the figures in the Random_2.txt file and places the most often used numbers in accending order in the Random_3.txt file. Here in lies the problem. After many tests of this, it seems to pick all sixty numbers almost evenly every time. Here's an example of 5000 randomly picked number results:
(the # to the left of the = sign is how many times the # to the right of the = sign came up in Random_1.txt)
537 = 13
534 = 17
531 = 26
523 = 12
521 = 52
520 = 4
519 = 55
519 = 21
519 = 36
519 = 39
518 = 19
515 = 11
514 = 3
514 = 54
513 = 50
513 = 6
512 = 43
511 = 28
509 = 49
509 = 23
509 = 1
508 = 51
507 = 2
505 = 46
505 = 45
504 = 9
503 = 60
503 = 40
501 = 30
501 = 25
501 = 18
500 = 47
500 = 48
499 = 42
499 = 31
499 = 15
498 = 38
497 = 16
496 = 22
496 = 14
495 = 20
494 = 10
494 = 59
490 = 24
489 = 37
488 = 5
488 = 32
487 = 7
486 = 8
483 = 33
483 = 44
482 = 35
481 = 58
480 = 53
476 = 34
475 = 41
471 = 57
457 = 56
456 = 29
444 = 27
It seems to do this every time. It just doesn't seem to be pulling true random numbers. What I would like to know is why this is happening in such a seemingly unrandomly and orderly fashion. Thanks for any input on why this might be happening...

Code:
#SingleInstance Force
SetBatchLines, -1

Loop, 60
{
Number_%A_Index% = 0
}

Loop, 5000
{
Num_1 =
Num_2 =
Num_3 =
Num_4 =
Num_5 =
Num_6 =


Num_1:
Random, Num_1, 1, 60

Num_2:
Random, Num_2, 1, 60
If Num_1 = %Num_2%
   Goto, Num_2

Num_3:
Random, Num_3, 1, 60
If Num_1 = %Num_3%
   Goto, Num_3
If Num_2 = %Num_3%
   Goto, Num_3

Num_4:
Random, Num_4, 1, 60
If Num_1 = %Num_4%
   Goto, Num_4
If Num_2 = %Num_4%
   Goto, Num_4
If Num_3 = %Num_4%
   Goto, Num_4


Num_5:
Random, Num_5, 1, 60
If Num_1 = %Num_5%
   Goto, Num_5
If Num_2 = %Num_5%
   Goto, Num_5
If Num_3 = %Num_5%
   Goto, Num_5
If Num_4 = %Num_5%
   Goto, Num_5

Num_6:
Random, Num_6, 1, 60
If Num_1 = %Num_6%
   Goto, Num_6
If Num_2 = %Num_6%
   Goto, Num_6
If Num_3 = %Num_6%
   Goto, Num_6
If Num_4 = %Num_6%
   Goto, Num_6
If Num_5 = %Num_6%
   Goto, Num_6

FileAppend, `n %Num_1%-%Num_2%-%Num_3%-%Num_4%-%Num_5%-%Num_6%, %A_WorkingDir%\Random_1.txt
}

Loop, Read, %A_WorkingDir%\Random_1.txt,
{
Loop, Parse, A_LoopReadLine, -
   {
   If A_LoopField = 1
      Number_1 += 1
   If A_LoopField = 2
      Number_2 += 1
; ... etc.
   If A_LoopField = 60
      Number_60 += 1
   }
}

FileAppend, %Number_1% = 1 `n, %A_WorkingDir%\Random_2.txt
FileAppend, %Number_2% = 2 `n, %A_WorkingDir%\Random_2.txt
; ... etc.
FileAppend, %Number_60% = 60 `n, %A_WorkingDir%\Random_2.txt
FileRead, Sort_List, %A_WorkingDir%\Random_2.txt
Clipboard =
Clipboard = %Sort_List%
Sort, Clipboard, N R
FileAppend, %Clipboard%, %A_WorkingDir%\Random_3.txt


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: December 30th, 2005, 5:17 am 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
AHK allows your program to be written in a much shorter form:
Code:
Loop 3
   FileDelete Random_%A_Index%.txt
Loop, 60
   Number_%A_Index% = 0

Loop, 5000
{
   If (mod(A_Index,100)=0)
      TrayTip,,%A_Index%   ; Show the progress at the TRAY

   Random, Num, 1, 60
   Loop 5
   {
      Loop
      {
         Random, temp, 1, 60
         If temp not in %Num%
            Break
      }
      Num = %Num%,%temp%
   }
   StringSplit Num_, Num,`,

   FileAppend, %Num_1%-%Num_2%-%Num_3%-%Num_4%-%Num_5%-%Num_6%`n, Random_1.txt
}

Loop, Read, Random_1.txt,
   Loop, Parse, A_LoopReadLine, -
      Number_%A_LoopField% += 1

Loop 60
{
   temp := Number_%A_index%
   FileAppend, %temp% = %A_index%`n, Random_2.txt
}

FileRead, Sort_List, Random_2.txt
Sort, Sort_List, N R
FileAppend, %Sort_List%, Random_3.txt
Having run that three times I got the following values in Random_3.txt:
Code:
562 = 7     557 = 32    552 = 54
541 = 13    551 = 54    536 = 28
534 = 44    537 = 25    534 = 24
530 = 12    533 = 49    531 = 50
528 = 41    530 = 52    528 = 16
526 = 32    527 = 16    526 = 35
523 = 53    526 = 59    524 = 41
522 = 15    525 = 42    523 = 26
520 = 42    521 = 9     523 = 7
519 = 49    521 = 14    521 = 46
519 = 19    520 = 20    519 = 44
518 = 35    518 = 48    516 = 59
518 = 33    515 = 23    515 = 53
518 = 31    514 = 3     514 = 60
517 = 56    513 = 58    514 = 45
516 = 14    513 = 38    512 = 15
515 = 43    512 = 5     512 = 13
514 = 50    512 = 30    511 = 6
514 = 27    511 = 27    509 = 34
511 = 57    511 = 28    509 = 9
511 = 34    510 = 31    508 = 37
510 = 26    508 = 60    507 = 43
510 = 36    507 = 40    506 = 25
508 = 58    507 = 46    505 = 3
507 = 38    506 = 24    505 = 22
507 = 23    506 = 45    503 = 40
505 = 45    505 = 41    503 = 4
502 = 48    501 = 1     503 = 39
502 = 17    500 = 2     502 = 21
502 = 6     500 = 13    501 = 38
500 = 54    500 = 57    500 = 18
498 = 29    499 = 4     499 = 57
493 = 11    499 = 39    499 = 47
493 = 3     496 = 22    498 = 2
493 = 20    496 = 37    496 = 5
492 = 60    494 = 29    496 = 32
490 = 5     494 = 56    496 = 29
489 = 40    492 = 15    495 = 14
487 = 21    492 = 10    494 = 51
487 = 1     491 = 55    494 = 12
486 = 25    489 = 12    492 = 48
486 = 24    489 = 35    491 = 58
486 = 46    487 = 7     489 = 30
485 = 9     486 = 43    489 = 31
484 = 28    485 = 21    488 = 20
483 = 4     484 = 44    488 = 49
483 = 51    483 = 6     487 = 56
482 = 18    479 = 47    486 = 27
482 = 30    478 = 50    485 = 42
480 = 55    478 = 17    482 = 23
478 = 37    478 = 19    481 = 19
478 = 2     477 = 11    475 = 33
476 = 39    475 = 51    474 = 17
474 = 16    471 = 8     473 = 36
473 = 8     470 = 34    471 = 55
473 = 22    465 = 33    470 = 52
470 = 52    465 = 26    469 = 1
468 = 59    465 = 53    462 = 10
463 = 47    463 = 36    459 = 11
459 = 10    463 = 18    450 = 8
I don't see anything wrong with these values. If you see always the same results, it could be, because you did not delete the files, just append the new results to the end.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: December 30th, 2005, 5:32 am 
Offline

Joined: November 12th, 2005, 10:25 pm
Posts: 73
Hi.
I wrote some 30 lines before I went experiment it...
I wrote about SRand and stuff from C, but than it seems the function ahk uses for rand already dependes on time :\

But: Your code to me allways give a diferent result :S
I have done some times, and can't get a pattern or prefered number whatsoever.

Could you explain better what's the problem? :\


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: December 30th, 2005, 5:36 am 
im still new to ahk and i figured someone would post a better/shorter version. i can learn from this, so thank you! my question is why all 60 numbers always drawn roughly beteween 460 and 560 times? this seems odd, or is it just me. why is not 10 only drawn lets say around 225 times, why are all 60 drawn close to the same amount each time?


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: December 30th, 2005, 7:42 am 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
Let's define a random variable x1, such that x1 = 1 if 1 appears among the 6 chosen lottery numbers, and x1 = 0 otherwise. In 5000 lottery drawings, the number of occurrences of the number 1 is the sum of 5000 values of this random variable x1. The so-called Central Limit Theorem states, that the distribution of this sum is approximately normal (bell curve), with standard deviation of the original standard deviation times the square-root of 5000 (= 70.71...). It is the same with all the numbers 1, 2,…60, meaning, that the majority of the values in the first column of Random_3.txt are within +50 and – 50 from the average. This is what we observe, so the AHK random number generator is statistically random (for this test).

In general, the number of occurrences of an event in independent trials, or the sum of many random values of the same distribution, forms a very nice bell curve distribution. Its width (the standard deviation) increases slower than the number of trials, therefore, more and more of the values concentrate to the middle of the possible range of values. Or, the average of many random values of the same distribution varies less and less as we increase the number of trials.


Last edited by Laszlo on December 30th, 2005, 7:49 am, edited 1 time in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: December 30th, 2005, 7:44 am 
Offline

Joined: September 25th, 2005, 4:31 pm
Posts: 610
Royce wrote:
why all 60 numbers always drawn roughly beteween 460 and 560 times


The AHk Random function uses, internally, the Mersenne Twister algorithm to generate pseudo-random numbers. The algorithm is designed to produce numbers over an interval which are equidistributed over 623 dimensions (i.e., subintervals). Thus for a large quantity of generated numbers, the output of the algorithm should approximate the properties of a uniform distribution.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: December 30th, 2005, 8:01 am 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
The phenomenon of concentration of the sums of increasing number of random variables to the center of the range is true even if the random number generator is not uniform. Only (statistical) independence is needed between samples. The many dimensional independence of the Mersenne Twister algorithm does not play a role here.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: December 30th, 2005, 9:11 am 
Offline

Joined: September 25th, 2005, 4:31 pm
Posts: 610
Applying the CLT to the sum of independent binomial distributed variables explains the distribution of values for each number.

However, aren't the sums of occurrence a frequency distribution (i.e, representable as a histogram). If you divide each frequency by the number of experiments you would realize the density function of the underlying distribution (i.e, uniform).


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: December 30th, 2005, 10:23 am 
WOW! looks like this is all way over my head. :? im not sure how the whole random number generator works as far as the code for it goes, but it just seems odd to me that all 60 numbers are drawn close to the same amout of times. how does the generator know what numbers it has already given and not to give that number more than others. if there is a simple way to explain this to me that would be great, if not, no worries. just was very curious. well Laszlo thanks for the new code, i can at least learn a little more about ahk from that... and shimanov thanks for your input as well.


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: December 30th, 2005, 5:26 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
Quote:
how does the generator know what numbers it has already given
It does not know. Throw a dice. The probability that you get a 6 is 1/6. Throw it 10 times. What is the probability that you don't have a six? The probability that the first throw is not six times the probably that the second throw is not six, and so on: 5/6*... * 5/6 = (5/6)^10 = 0.1615... The probability that you have 1 six is the sum of 10 cases: the probability that you have it in the first throw (and all other throws are not six), plus the probability that you have it in the second throw (and all other throws are not six), etc. All are the same, so we get 10*(1/6)*(5/6)^9 = 0.323… It is larger than having no six. How about having 2 sixes? It can be in throws 1,2 or in 1,3 ... or in throws 8,10 or in throws 9,10. There are 45 pairs of places in the sequence of throws. In each of them the probability that the sixes occur there is (1/6)^2*(5/6)^8, and 45 of this gives 0.2907...

Does the dice know, that it has to show one six out of 10 throws roughly in every third experiment? Of course, not. It is just an event of probability of 32%. It can come up many likely ways.

Sometimes you have 10 sixes in a row. Very rarely. Not because the dice remembers that it showed already many sixes and now it is time to show something else, but simply because the probability of 10 sixes is low: (1/6)^10 = 1.6538e-8 (once in every 60 million experiments). It means, if you perform just 5000 series of 10 dice throws, you will almost surely not see 10 sixes. This is the same with the frequencies of numbers you encounter in random lottery drawings. The probability that you see any given number between 450 and 560 times is 95%. That is, in the average, you have to perform 20 experiments (of 5000 lottery drawings each) to see an outsider.

If you perform many series of drawings, you will eventually see one, where, for example, the number 13 does not come up at all. But it is very-very unlikely. Its chance is ((59/60)^6)^5000 = 10^-219 (219 zeros after the decimal point). I would bet my yearly salary against a penny, that you never see this event in 100 years of continuous experimenting.

@Shimanov: As any random variable, the frequencies of occurrences can be shown in a histogram, but it is close to normal (bell curve), regardless of the underlying distribution (which only affects the position of the peak).


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: December 31st, 2005, 6:02 am 
Offline

Joined: November 12th, 2005, 10:25 pm
Posts: 73
Wow...
do all of you guys have a degree in maths or study maths at university?? talking about bell curves, ok, but some things I read... Wow. Maybe it's because I'm not english that a miss some phrases :/

@ royce:
basically, the algorythm ahk uses to generate random gives the same probability to all numbers to appear. It is allways the best case, or they aren't true randoms...
because they have the same probability, they will all tend to appear the same amount of times when you extend the tries...
that is what the 'bell' and the other ugly names explained. (I think) :S

you can make your own algorythm, and test it. the more the numbers get closed to each other, the best your algorythm is... not exactly like this, but if the opposite happens, it sucks ;)


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 4th, 2006, 10:52 pm 
Offline

Joined: September 25th, 2005, 4:31 pm
Posts: 610
An explanation of frequency distribution and frequency per value:

Mersenne Twister algorithm:

* 623 dimensions - negligible correlation between consecutive samples (NOT independence)

* approximates Discrete Uniform distribution (i.e, each value is equiprobable)

Experiment:

30000 trials

values (X) = 1..60

p = P[X=#] = 1/60

trial outcomes can be considered as a Binomial distribution (i.e., success or failure) with assumption of trial independence
note: the CLT can be applied, to approximate the Binomial distribution, as the number of trials grows large

frequency distribution properties

mean = 30000*p = 500
variance = mean*( 1-p ) = 491
standard deviation = sqrt( variance ) = 22


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 4th, 2006, 11:54 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
shimanov wrote:
negligible correlation between consecutive samples (NOT independence)
Any (finite state) pseudorandom number generator is periodic, completely deterministic, including the Mersenne Twister algorithm. Consequently, no two entries are independent or uncorrelated. I referred to statistical independence. In a finite set of samples the measured correlation almost never is exactly 0, nor it is expected to be 0. When a large set of observations cannot disprove independence (cannot find likely correlations), we accept the hypothesis of independence (with a certain confidence level, dependent on the observations and their number).

We can check the average, some other statistical moments (like the square of the standard deviation) of any empirical sequence. They are called (statistical) randomness tests. There are infinitely many tests, a few hundred are commonly used to assess sequences. To get meaningful results at least 10 MB of data is customarily tested. 30,000 numbers are just too few to draw any conclusion.

623 dimension statistical independence could be important for huge physical simulations. For cryptographic applications it is irrelevant, there we need other properties: analyzing a manageable amount of data in reasonable computational time allows no adversary to predict future entries with significantly higher probability than a random guess. (Here the expressions "manageable amount", "reasonable" and "significantly" depends of our understanding, what the limitations of computing are today and in the near future. Sometimes we set the limits wrong, as it happened with the cipher DES.) The Mersenne Twister algorithm does not satisfy this requirement, but it is much faster than any known cryptographically secure pseudorandom number generator.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 5th, 2006, 12:35 am 
Offline

Joined: September 25th, 2005, 4:31 pm
Posts: 610
Laszlo wrote:
Mersenne Twister... for cryptographic applications it is irrelevant


You can always use a transformation of its output as after processing with a hash function. But it is a useful, purposeful algorithm nonetheless.

Maybe Schneier's Yarrow algorithm would be a good candidate for implementation. It is apparently free of legal incumbrances. What do you think?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 5th, 2006, 12:46 am 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
Yarrow is OK, as any secure hash, MAC or cipher with a secret key. There are different modes of operations, like counter mode, output feedback mode or their combinations. In general, you have to make sure that if the output is fed back to the input, not all of it is shown. These are secure until proven otherwise. There are provably secure random number generators, like BBS. They are based on some mathematical problems, which are believed to be intractable. These include the discrete logarithm or the integer factorization. Although there are no proofs that they are really hard, you could get a professor position at a university of your choice if you show it otherwise.


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 17 posts ]  Go to page 1, 2  Next

All times are UTC [ DST ]


Who is online

Users browsing this forum: Bing [Bot], Exabot [Bot], HotkeyStick, rbrtryn, XstatyK, Yahoo [Bot] and 86 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group