Weighted Random Distribution

Helpful script writing tricks and HowTo's
User avatar
rommmcek
Posts: 1123
Joined: 15 Aug 2014, 15:18

Weighted Random Distribution

19 Oct 2020, 20:09

Weighted Random Distribution
for AutoHotkey v1.1 (Original article - Thanks Ferry Boender)

Preface

Randomly selecting elements from a set of items is easy. Just generate a random number between 0 and the length of the set minus one, and use that as an index in the set [if it is an array] to get a random entry. The chance that an entry is picked is the same for each entry in the set. This is called even distribution or uniform distribution.

But what if we do not want each entry to appear as much as every other? Suppose we’re creating a question-answer game, and we want the questions the user got wrong previously to appear more often than the question he or she got right? This is called a Weighted Random Distribution, or sometimes Weighted Random Choice, and there are multiple methods of implementing such as random picker.

This article explains these various methods of implementing Weighted Random Distribution along with their pros and cons. We use AutoHotkey as our language of choice, because it has an easy to read syntax, and provides many useful tools which would take many more lines of code in most other languages. Along the way all AutoHotkey “tricks” will be explained.

The latest version of original article is always available here in PDF, HTML and AsciiDoc format.

Methods

In each of the methods below, we assume a default set of choices and weights which we’ll need to make random selections from. When I say “set”, I don’t mean a AutoHotkey set, which is basically an immutable array which does not allow duplicate values, but a collection of choices with their associated weight.

We assume we have the following dictionary [Jargon for an associative array, also known as a hashmap] of choices with their weights:

Code: Select all

w:= {"A": 2, "B": 4, "C": 3, "D": 1}
This means we want B to appear twice as much as A, and we want A to appear twice as much as D. In other words, if we generate ten random picks, we want two of those to be A, four of those to be B, etc. [This wont happen with only ten random picks of course, but you get the point].
User avatar
rommmcek
Posts: 1123
Joined: 15 Aug 2014, 15:18

Re: Weighted Random Distribution

19 Oct 2020, 20:10

Expanding

One of the easiest solutions is to simply expand our set so that each entry in it appears as many times as its weight. We start with our basic set of choices with their associated weights:

Code: Select all

{"A": 2, "B": 4, "C": 3, "D": 1}
We create a new list [AutoHotkey array], in which we put each choice as many times as its associated weight.

Code: Select all

for i in (w, s:=[], p:=[])
    loop, % (w[i], p[A_Index]:=0)
        s.Push(i)
The list s now looks like this:

Code: Select all

["A", "A", "B", "B", "B", "B", "C", "C", "C", "D"]
We can now make a random selection from the list by generating a random number between 1 and the length of the list, and use that as an index in the list to get our weighted random choice:

Code: Select all

loop, 100000 {
    Random, Rn, 1, t
    for i in w
        if (i = s[Rn], n:=A_Index)
            Break
    p[n]++
}
We keep a score of how many times each choice has appeared in p output array (below), so we can easy extract how many times each of the choices from the list has been randomly selected:

Code: Select all

{20140, 29880, 39986, 9994}
As we can see, our random distribution is weighted properly. Of course there are some pros and cons to using this method.

Pros:
  • We can make very fast random selections from the set. In fact, selection time is O[1], which is the fastest we can attain.
  • It allows for reasonably easy and fast updating of the weights. If we want to lower the weight of a choice, we simply scan the list and remove as many occurrences of the choice as we need. Increasing weight or adding new choices is even simpler, because we can just add as many as we need to the end of the list. The order in which they appear does not matter.
  • It’s very simple. It’s easy to see what’s going on here.
Cons:
  • For large sets, or large values for weights, this will obviously take up too much memory. One possible way to optimize this would be to find the greatest common divisor, but this will take more processing time up front and will make it much slower to update our weights.
  • If you only need to make a few random selections, it will probably not be worth preparing the expanded set.
I’ve looked around the Internet for various Weighted Random Distribution algorithms, and nobody seems to be suggesting this method in particular. I can understand why; it looks amateurish. But from my tests I have found that this is the easiest and fastest way to make Weighted Random Distributions for small sets and small values.

The full example:

Code: Select all

#NoEnv
SetBatchLines, -1

w:= {"A": 2, "B": 4, "C": 3, "D": 1}

for i in (w, s:=[], p:=[], t:=0)
    loop, % (w[i], p[A_Index]:=0)
        s.Push(i), t++

loop, 100000 {
    Random, Rn, 1, t
    for i in w
        if (i = s[Rn], n:=A_Index)
            Break
    p[n]++
}

for i in w
    r.= i ": " p[A_Index] "`n", A_Index<2? sum:= 0: "", sum+= p[A_Index]
MsgBox % r "-------------`nSum: " sum
User avatar
rommmcek
Posts: 1123
Joined: 15 Aug 2014, 15:18

Re: Weighted Random Distribution

19 Oct 2020, 20:11

In-place [Unsorted]

Instead of expanding the set as in the method used above, we can also keep the set in its current form and simply emulate the expansion of the set in a loop. In order to do this, we first have to know the total weight of the set.

Code: Select all

for i, j in (w, t:= 0, p:= [])
    t+= j, p[A_Index]:= 0
We then select a random value between 1 and that total weight. We emulate the expanded set we’ve seen in the previous method by looping over the elements of the set and keeping score of the total value we’ve seen so far. When that value is larger than the random value we picked, we’ve found our random choice.

Code: Select all

for i, j in (w, z:=0)
    if (Rn <= z+=j, n:=A_Index)
        break
At the positon n of the array w is now our weighted random choice.

Pros:
  • It is very easy and fast to update our set of weights. Adding and removing items; lowering and heightening weights: all are equally fast. All we have to do is keep an eye on our total weight and either update or recalculate it when we add or remove values or change weights.
  • This method uses as little memory as possible. No duplicates of our original set have to be made.
Cons:
  • Selecting random values is somewhat slower because of the added calculation in the loop. The larger our initial set, the slower this becomes. The complexity of selecting is O[n], where n is the number of elements in the set.
The full example:

Code: Select all

#NoEnv
SetBatchLines, -1

w:= {"A": 2, "B": 4, "C": 3, "D": 1}

for i, j in (w, t:= 0, p:= [])
    t+= j, p[A_Index]:= 0

loop, 100000 {
    Random, Rn, 1, t
    for i, j in (w, z:=0)
        if (Rn <= z+=j, n:=A_Index)
            break
    p[n]++
}

for i in w
    r.= i ": " p[A_Index] "`n", A_Index=1? sum:= p[A_Index]: sum+= p[A_Index]
MsgBox % r "-------------`nSum: " sum
User avatar
rommmcek
Posts: 1123
Joined: 15 Aug 2014, 15:18

Re: Weighted Random Distribution

19 Oct 2020, 20:11

In-place [Sorted]

In theory, we can speed up our previous in-place algorithm by sorting the set before we start selecting from it. Since the set is sorted, we can start scanning through. Since the highest weights will appear at the beginning of the set, and those are the most likely to be chosen randomly, we can get a speed increase when selecting random numbers from our set. Whether we get a speed increase in practice depends on our initial set of weights.

First we prepare a new set which contains our sorted weigths.

Code: Select all

w:= {"A": 2, "B": 4, "C": 3, "D": 1}

for i, j in (w, s:=[], p:=[], t:=0)
    s[-j]? "": s[-j]:= [], s[-j].Push([i, j]), p[A_Index]:= 0, t+= j
You’ll see a little bit of AutoHotkey magic right here, so let me explain. Parsing through w array we store each element pare into a two dimesional array s in the positon based on current element weight (here the greatest first).

s now looks like this:

Code: Select all

[["B", 4], ["C", 3], ["A", 2], ["D", 1]]
Then we walk through the set and instead of adding up the weights we’ve seen so far (like we did in the unsorted in-place example), we subtract the weights from the total weight:

Code: Select all

Random, Rn, 1, t
    for i, j in (s, z:=t, n:=0)
        for, k, l in j
            if (Rn > z-=l.2, n++)
                Break, 2
    p[n]++
Pros:
  • Increase in selection speed over unsorted, as long as our set has at least one weight which is significantly larger than the other weights.
Cons:
  • Speed decrease in pre-selection [due to sorting time].
  • Speed decrease in updating the set of weights [due to resorting].
  • Increase in complexity may not be worth the speed gain.
The full example:

Code: Select all

#NoEnv
SetBatchLines, -1

w:= {"A": 2, "B": 4, "C": 3, "D": 1}

for i, j in (w, s:=[], p:=[], t:=0)
    s[-j]? "": s[-j]:= [], s[-j].Push([i, j]), p[A_Index]:= 0, t+= j

loop, % 100000 {
    Random, Rn, 1, t
    for i, j in (s, z:=t, n:=0)
        for, k, l in j
            if (Rn > z-=l.2, n++)
                Break, 2
    p[n]++
}

for i, j in (s, sum:=n:=0)
    for k, l in j
        r.= l.1 ": " p[++n] "`n", sum+= p[n]
MsgBox % r "-------------`nSum: " sum
Last edited by rommmcek on 19 Oct 2020, 20:26, edited 1 time in total.
User avatar
rommmcek
Posts: 1123
Joined: 15 Aug 2014, 15:18

Re: Weighted Random Distribution

19 Oct 2020, 20:12

Pre-calculated

One other possibility remains for performing random weighted selections. We can prepare a new set with pre-calculated weights. So instead of calculating the weights in the loops, we already calculate them beforehand. This will give a performance increase in selecting random values because we don’t have to subtract or add values in the loop. One other important optimization this allows us to make is that we can apply a Divide-and-conquer (Wikipedia) algorithm [also known as bisecting] to quickly find the random choice. The drawbacks are that updating the weights becomes much harder.

First we create a new set in which the weights have already been added to each other. In all the other examples we created a set with [choice, weight] members. Now we create [weight, choice] members, so that we can use AutoHotkey’s bisect module.

The bisect module implements a Divide-and-conquer (AutoHotkey forum) algorithm to find the index in the set where a new value should be inserted. It does this by starting in the middle of the set and checks if the value there is higher or lower than the value we need. It then divides the left-hand or right-hand side again and does the same, until it has found the correct index. For this to work, the set must be sorted. In our case, this isn’t a problem, since we sum up our weights the further we get into the set, so the weights will always be sorted.

Code: Select all

for i, j in (w, s:=[], p:=[], t:=0)
    s.Push(t+= j), p[A_Index]:= 0
The set now looks like this:

Code: Select all

[[2, "A"], [5, "C"], [9, "B"], [10, "D"]]
Next we get random weighted selections from the set by picking a random number between 0 and the total weight. We create a temporary fake member [[wRndNr, None]] so we can pass it to the bisect module. The bisect function finds our index for us, and we use that index to get the member of our set.

Code: Select all

loop, 100000 {
    Random, Rn, 1 , (t, u0:=u1:=1, v0:=v1:=s.Length(), vs:=v0-u0)
    while (v0-u0 > 0, u0:= u1, v0:= v1)
        Rn <= s[vs:=(v0+u0)//2]? (u1:=u0, v1:=vs): (u1:=vs+1, v1:=v0)
    p[vs]++
}
Pros:
  • Fast selection, even on large sets, due to the bisection.
Cons:
  • Slow updates of weights due to having to scan through the list for the value we need to update, then have to update all the items in the set after it.
The full example:

Code: Select all

#NoEnv
SetBatchLines, -1

w:= {"A": 2, "B": 4, "C": 3, "D": 1}

for i, j in (w, s:=[], p:=[], t:=0)
    s.Push(t+= j), p[A_Index]:= 0

loop, 100000 {
    Random, Rn, 1 , (t, u0:=u1:=1, v0:=v1:=s.Length(), vs:=v0-u0)
    while (v0-u0 > 0, u0:= u1, v0:= v1)
        Rn <= s[vs:=(v0+u0)//2]? (u1:=u0, v1:=vs): (u1:=vs+1, v1:=v0)
    p[vs]++
}

for i, j in (w, sum:=0)
        r.= i ": " p[A_Index] "`n", sum+= p[A_Index]
MsgBox % time "`n" r "-------------`nSum: " sum
Last edited by rommmcek on 19 Oct 2020, 20:38, edited 1 time in total.
User avatar
rommmcek
Posts: 1123
Joined: 15 Aug 2014, 15:18

Re: Weighted Random Distribution

19 Oct 2020, 20:22

Conclusion

As you can see there are many different ways of performing random weighted selections. Each has it’s own pros and cons. Here’s a small summary:

If you need speedy selections:
  • Use Expanding if the number of items in your set is low, and your weights are not very high.
  • Use Pre-calculated if your set has lots of numbers and/or your weights are high.
If you need speedy updates: If you need low memory usage: If you need simplicity:
SOTE
Posts: 1129
Joined: 15 Jun 2015, 06:21

Re: Weighted Random Distribution

05 Nov 2020, 22:25

Very well done and great read! :clap:
User avatar
rommmcek
Posts: 1123
Joined: 15 Aug 2014, 15:18

Re: Weighted Random Distribution

12 Nov 2020, 19:50

Coming from you has a double value! Your posts are for me nothing but pure wisdom and shear experience!

P.s.: As mentioned at the start I'm not the author (just adapted it for AuthoHotkey and added some links).

Return to “Tutorials”

Who is online

Users browsing this forum: No registered users and 8 guests