Simple constant time weighted random choice algorithm

Rmag Breaking News

Background

So someone posted an article about a “Faster Weighted Random Choice” algorithm on a Discord server I am a member of: https://blog.bruce-hill.com/a-faster-weighted-random-choice

I read through the article and was inspired by the interesting approaches. However, I found the last, and best, algorithm a bit complex. Furthermore, it appears to require additional/special memory to form the lookup table.

I wondered if maybe there wasn’t a simpler approach. I came up with an idea and after some trial and error I managed to get it to work. It has O(NlogN) initialization time and O(1) generation time. It does not require complex buckets and structures.

The Algorithm

The algorithm assumes you have an array of items, each having a weight assigned.

The initialization phase involves the following steps:

Normalize the weights so that their sum equals 1.0. This can be skipped if the data already has such a weight distribution. (O(N))
Sort the items in ascending order based on their weights. (O(NlogN)).
Adjust the item weights using a special equation (explained later). (O(N))

The complexity of the steps above is O(N + NlogN + N) which reduces to O(NlogN).

The generation phase works as follows:

Pick a random integer i in the range [0, N)

Pick a random float p in the range [0.0, 1.0]

If the value p is smaller than the weight of item i, then return item i.
Otherwise pick a random integer j in the range [i+1, N) and return item j.

Explanation

So how does the above algorithm work and what is the special equation that adjusts the weights?

The idea of the algorithm is that we pick a random item. The probability of doing this is 1/N. But the item might have a lower weight distribution. To mitigate that, we assign a probability to the item (the adjusted weight) that determines whether we should stick with it or pick some other item. That probability is such that the final probability of sticking with the item is equal to the original normalized weight.

Let’s take the following distribution as example:

0 1 2
[—20%—] [——30%——] [———-50%———]

ITEM 0: What is the probability to pick item 0? It is the following.

P_final(0) = 1/N x P_keep(0)

There is a 1/N chance to land on the item and then there is P_keep chance to stick with it. We need to adjust P_keep such that P_final is 20%.

2/10 = 1/N x P_keep(0)
P_keep(0) x 1/3 = 2/10
P_keep(0) = 6/10

We adjust the weight of item 0 from 0.2 to 0.6.

ITEM 1: Let’s do the same for item 1.

P_final(1) = 1/N x P_keep(1) + (1/N x P_reject(0) x 1/(N-1))

The 1/N x P_keep(1) part is similar to the one for item 0. It is the probability to land on item 1 and to stick with it.

The 1/N x P_reject(0) x 1/(N-1) part is new. It is the probability that item 0 was picked (1/N), that it was rejected (P_reject(0)) and that after the next random item pick, item 1 was selected (1/(N-1)).

NOTE: We always select a random item from the ones to the right, which is the reason for 1/(N-1) in the above case.

The P_reject(0) probability is actually 1 – P_keep(0), so the equation for item 1 becomes the following:

P_final(1) = 1/N x P_keep(1) + (1/N x (1 – P_keep(0)) x 1/(N-1))

We want P_final(1) to equal 30% so we end up with:

3/10 = 1/N x P_keep(1) + (1 – P_keep(0)) / (N x (N – 1))
N x 3/10 = P_keep(1) + (1 – P_keep(0)) / (N – 1)
P_keep(1) = N x 3/10 – (1 – P_keep(0)) / (N – 1)
P_keep(1) = 3 x 3/10 – (1 – 6/10) / 2
P_keep(1) = 9/10 – 2/10
P_keep(1) = 7/10

ITEM 2: The logic is the same for item 2.

P_final(2) = 1/N x P_keep(2) + (1/N x P_reject(0) x 1/(N-1)) + (1/N x P_reject(1) x 1/(N-2))

There is yet one more term here (1/N x P_reject(1) x 1/(N-2)). This one handles the case where item 1 was picked but was not kept.

Resolving P_keep(2) the same way we did for the previous items, we end up with:

P_keep(2) = 1

This makes sense. If we have randomly picked the last item, we don’t have an option to reject it, since there is no item to the right to randomly pick from.

Now, it appears that we are accumulating terms in the equation for each subsequent item. However, most of the terms are shared. This allows us to accumulate them in a temporary variable during weight recalculation, resulting in a linear algorithm, as long as we process left to right (from smallest to largest).

NOTE: The algorithm above only works if the items are sorted in ascending order. Otherwise, for example, the chance of landing on the last item might be larger than the item’s desired probability. In that case, since there are no follow up items, the algorithm would break. This would also come up during weight recalculation where P_keep will have values outside the [0.0..1.0] range.

The algorithm in code

Following is a Go implementation of the algorithm.

package main

import (
“cmp”
“fmt”
“math/rand/v2”
“slices”
“time”
)

const (
N = 10 // slice length
M = 3_000_000 // number of samples
)

type Item struct {
Key int
Weight float64
}

var random = rand.New(rand.NewPCG(
0x0123456789abcdef,
uint64(time.Now().UnixNano()),
))

func main() {
items := make([]Item, N)
initializeItems(items)
normalizeItems(items)
sortItemsAsc(items)
printItems(items)

// We keep original items as they were for comparison reasons.
// In an actual scenario, the original items can be mutated instead,
// avoiding the need for extra memory.
adjustedItems := slices.Clone(items)
reweight(adjustedItems)
printItems(adjustedItems)

histogram := make(map[int]float64)
for range M {
item := pickRandomItem(adjustedItems)
histogram[item.Key] += 1.0
}
normalizeHisto(histogram)

compareItemsToHisto(items, histogram)
}

func initializeItems(items []Item) {
for i := range items {
items[i] = Item{
Key: i,
Weight: random.Float64(),
}
}
}

func normalizeItems(items []Item) {
var sum float64
for _, item := range items {
sum += item.Weight
}
for i := range items {
items[i].Weight /= sum
}
}

func reweight(items []Item) {
count := len(items)
probabilityPrev := 0.0
for i := range items {
item := &items[i]
newWeight := (item.Weight probabilityPrev) * float64(count)
item.Weight = newWeight
probabilityPrev += (1 newWeight) / (float64(count) * float64(counti1))
}
}

func pickRandomItem(items []Item) Item {
count := len(items)

index := rand.IntN(len(items))
item := items[index]
if (index == count1) || (rand.Float64() <= item.Weight) {
return item
}

index += 1 + rand.IntN(len(items)index1)
return items[index]
}

func sortItemsAsc(items []Item) {
slices.SortFunc(items, func(a, b Item) int {
return cmp.Compare(a.Weight, b.Weight)
})
}

func printItems(items []Item) {
fmt.Println(“Items”)
for _, item := range items {
fmt.Printf(“%-2d: %.4fn, item.Key, item.Weight)
}
fmt.Println()
}

func normalizeHisto(histogram map[int]float64) {
var sum float64
for _, count := range histogram {
sum += count
}
for key, count := range histogram {
histogram[key] = count / sum
}
}

func compareItemsToHisto(items []Item, histogram map[int]float64) {
fmt.Println(“Items vs Histogram”)
for _, item := range items {
fmt.Printf(“%-2d: %.4f%% vs %.4f%%n, item.Key, item.Weight*100, histogram[item.Key]*100)
}
fmt.Println()
}

The code has a bit more going on, since it does some printing and histogram tracking in order to compare the final distribution to the initial weights.

One example run produces the following output:

Items
3 : 0.0194
9 : 0.0247
5 : 0.0444
7 : 0.0866
4 : 0.1200
0 : 0.1227
6 : 0.1391
1 : 0.1415
8 : 0.1464
2 : 0.1551

Items // reweighted
3 : 0.1944
9 : 0.1577
5 : 0.2490
7 : 0.5640
4 : 0.8252
0 : 0.8178
6 : 0.9355
1 : 0.9380
8 : 0.9564
2 : 1.0000

Items vs Histogram
3 : 1.9445% vs 1.9555%
9 : 2.4716% vs 2.4794%
5 : 4.4384% vs 4.4365%
7 : 8.6607% vs 8.6557%
4 : 11.9991% vs 12.0222%
0 : 12.2749% vs 12.2709%
6 : 13.9081% vs 13.8896%
1 : 14.1475% vs 14.1375%
8 : 14.6416% vs 14.6365%
2 : 15.5135% vs 15.5162%

Summary

Well, there you go. My personal constant time weighted random choice algorithm.

It has O(1) extra memory complexity, in case the data is properly set up, O(NlogN) initialization time complexity and O(1) generation time complexity.

Knowing the vastness of the internet and research community, I can bet that I am not the first one to come up with it. Someone must have had the same idea. But since everyone is talking mostly about the Alias algorithm, I decided to write my idea here just in case. If someone knows of such an algorithm described in a paper someone, please share in the comments.

Side note: The following article is an excellent read on the subject matter: https://www.keithschwarz.com/darts-dice-coins/

The algorithm appears to work fairly well from looking at the histogram comparisons, with both small and large samples. However, this is mostly eyeballing it. I am not sure how to do professional measurements on it. If anyone has any ideas, please share in the comments.

Constructive criticism is more than accepted.

Leave a Reply

Your email address will not be published. Required fields are marked *