The cycle of prisoners puzzle

26 Dec, 2024

I learned about this cool math puzzle, from the classic prisoners-with-limited-communication genre,¹ about five years ago. One reason it stuck with me is that, in the Discord server where we discussed it, we came up with two radically different solutions; at the time, I remarked "it's like we're solving different problems". It's fairly difficult (I think it took me a few hours), but very rewarding:

You are in a prison with some unknown (finite) number of other prisoners, each isolated in your own room. Every day, the prison warden chooses an unknown cycle containing all prisoners, including yourself (the cycle may or may not vary between days). Each prisoner then simultaneously sends one bit to the next prisoner in the cycle.

Before the first day, you may send a single strategy email that every other prisoner will read (so every prisoner other than yourself must be following the same strategy). Other than this email and the bits, no communication is allowed.

The warden will set you all free if you can determine the number of prisoners. Can you guarantee doing so in finite time?

Fine print

This is a math puzzle, not a lateral thinking puzzle; you're looking for a logically sound strategy, not a loophole or some way to apply real-world knowledge. (For example, the number of prisoners isn't bounded by Earth's population. Conversely, you may assume every prisoner will read and execute the strategy flawlessly, even if it takes longer than the age of the universe. You can't choose to not send a bit, or squeeze extra information out of the timing of bits being sent or received, or embed a tracking pixel in the email, or anything like that.)
A cycle means that the prisoners are all arranged in a giant circle and their bit goes to the next person in the circle, so every bit gets received by exactly one other prisoner, and every prisoner receives exactly one bit. In particular (and this is maybe a mild hint) the warden will not arrange the prisoners into two or more disjoint circles and have bits go around only within each circle. However, the prisoners get no information about who sent the bit they received or who received the bit they sent.
You only get one guess for the number of prisoners.
"Guaranteed finite time" means that, for each possible number of prisoners $n$ , there is some fixed number $f (n)$ such that the strategy is guaranteed to take no more than $f (n)$ days. However, $f (n)$ can be really, really large. This also means you can't use randomness, which I think makes the puzzle more interesting, though you can try solving it with randomness if you'd like. If it helps, you should pretend the warden also reads your strategy email, monitors all bits, and always chooses the cycle adversarially.
It can also be interesting to optimize the worst-case number of days required to get out of prison, but I didn't go into that.

Sources

I didn't look very hard for sources, but the person who shared this with me cited the xkcd fora, a claim also made by lomont.org. Unfortunately the xkcd link there doesn't work for me right now. This puzzle has also been posted to Puzzling StackExchange and /r/mathriddles. Solutions equivalent to both ones I present below have been posted.

Spoiler padding

Solutions below the Sierpinski gasket!

                                #
                               # #
                              #   #
                             # # # #
                            #       #
                           # #     # #
                          #   #   #   #
                         # # # # # # # #
                        #               #
                       # #             # #
                      #   #           #   #
                     # # # #         # # # #
                    #       #       #       #
                   # #     # #     # #     # #
                  #   #   #   #   #   #   #   #
                 # # # # # # # # # # # # # # # #
                #                               #
               # #                             # #
              #   #                           #   #
             # # # #                         # # # #
            #       #                       #       #
           # #     # #                     # #     # #
          #   #   #   #                   #   #   #   #
         # # # # # # # #                 # # # # # # # #
        #               #               #               #
       # #             # #             # #             # #
      #   #           #   #           #   #           #   #
     # # # #         # # # #         # # # #         # # # #
    #       #       #       #       #       #       #       #
   # #     # #     # #     # #     # #     # #     # #     # #
  #   #   #   #   #   #   #   #   #   #   #   #   #   #   #   #
 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

Upper bounding the number of prisoners

Both solutions I know of start by determining an upper bound $N$ on the number of prisoners (and making it common knowledge).

For each positive integer $n$ , we will attempt to establish that there are at most $2^{n}$ prisoners with the following procedure:

Each prisoner will keep track of their own "on/off" state. To begin the procedure, you will be "on", and every other prisoner will be "off".
For n days, every prisoner sends a 1 iff they are "on". If a prisoner is "off" and receives a 1, they become "on"; otherwise, prisoners preserve their "on/off" state.
- Analysis: During each of these days, if at least one prisoner is still "off", then at least one prisoner must transition from "off" to "on". On the other hand, the number of "on" prisoners can at most double. This means that after $n$ days, either between $n + 1$ and $2^{n}$ prisoners are "on", or all prisoners are "on" and there are at most $2^{n}$ prisoners.
For 2n days, every prisoner sends a 1 iff they are "on". If a prisoner is "on" and receives a 0, they become "off"; otherwise, prisoners preserve their "on/off" state.
- Analysis: During each of these days, if at least one prisoner was "off" and at least one prisoner is still "on", then at least one prisoner must transition from "on" to "off". So, if all prisoners were "on" before these $2^{n}$ days, then all prisoners will still be "on"; otherwise, all prisoners will become "off".
Now:
- If every prisoner is "off", there are more than $n$ prisoners. Increment $n$ and restart the procedure. (The procedure can't repeat indefinitely since at some point $n$ will exceed the true number of prisoners.)
- If every prisoner is "on", you've determined $N = 2^{n}$ to be an upper bound on the number of prisoners. Remember this number and continue with the rest of the strategy.

Solution 1 ("linear algebra")

This solution will use a subroutine we'll call "announcing", which is easier to explain on its own. If each prisoner can determine whether they satisfy some property $P$ themselves. we can "announce" to make it common knowledge whether any prisoner satisfying $P$ exists. This is similar to the core step of the upper bounding procedure. To announce whether anybody satisfies $P$ :

Each prisoner will keep track of their own "on/off" state. Prisoners start out "on" if they satisfy $P$ , "off" otherwise.
Then, for $N$ days, every prisoner sends a 1 iff they are "on". If a prisoner is "off" and receives a 1, they become "on"; otherwise, prisoners preserve their "on/off" state.
Then, for $N$ days, every prisoner sends a 1 iff they are "on". If a prisoner is "on" and receives a 0, they become "off"; otherwise, prisoners preserve their "on/off" state. At the end of the $2 N$ days, either every prisoner is "on", in which case a prisoner satisfying $P$ exists, or every prisoner is "off", in which case no prisoner exists. In either case, the "announcement" is complete.

For the main strategy, we'll maintain an ordered list of disjoint nonempty groups of prisoners, which together contain every prisoner (including yourself). At all times, every prisoner will know how many groups there are and which group they're in. During each phase of the strategy, we'll try to either split a group into two smaller groups, or gather enough information to uniquely determine every group size.

In the beginning, there will be two groups, the first containing yourself and the second containing everybody else.

The main phase of this strategy proceeds as follows:

For each proper nontrivial subset S of groups, in some globally agreed-upon (say lexicographic) order:
- For one day, send 1 if your group is in $S$ , 0 else. This is the "signaling day".
- Then for each group G in the full list of groups, in order:
  - Announce if anybody is both in $G$ and received 0 during the signaling day.
  - Announce if anybody is both in $G$ and received 1 during the signaling day.
  - If both exist, remove $G$ from the list, split it into two smaller groups (of people who received 0 and people who received 1), insert them back into the list of groups in some globally agreed-upon way (say, at the end), and restart the main phase. Otherwise, continue with the next group.

Since all groups are nonempty, there can't be more groups than prisoners, so the main phase can't restart indefinitely. Suppose it terminates when we have a list of $m$ groups. Then, for every proper nontrivial subset of group $S$ , we know a distinct subset of groups $S^{'}$ with equally many prisoners: specifically, the subset of groups that announced receiving 1 during the $S$ loop in the main phase. (Since the cycle includes all prisoners, it must go from some prisoner in $S$ to some prisoner not in $S$ somewhere, so $S$ and $S^{'}$ can't be identical.)

This is a large system of $2^{m} - 2$ linear equations. We also know that the first group, the one with only you in it, has size 1. So we can attempt to solve this system for the size of every group, and from there, the total number of prisoners. The last thing we need to prove is that the solution is unique.

I'll present two proofs of this:

Proof by contradiction

Suppose our system of linear equations has more than one solution. Let A and B be different solutions. Without loss of generality, assume that some group is larger in solution A than in solution B. Let $S$ be the subset of all such groups (that are larger in solution A than in solution B). $S$ can't contain your group, so it's a proper nontrivial subset of groups, so we know a distinct subset of groups $S^{'}$ that has the same number of prisoners as $S$ in either solution.

Let $D = S ⧵ S^{'}$ , and $D^{'} = S^{'} ⧵ S$ . Then $D$ and $D^{'}$ also have the same number of prisoners in either solution, and both are nonempty. However, this is impossible! Because of how $S$ was chosen, every group in $D$ has more prisoners in solution A than in solution B, but no group in $D^{'}$ does.

Constructive proof

This proof is longer and the core step is basically the same as in the proof by contradiction, but it might feel more satisfying.

Let the group sizes be $x_{1}, \dots, x_{m}$ , where we assume your group is group 1 so that $x_{1} = 1$ .

We will write each individual variable $x_{i}$ as a linear combination of (some subset of) $x_{1}, \dots, x_{m}$ , and then, proceeding from $j = m$ down to $2$ , eliminate every use of $x_{j}$ in our linear combinations. Initially, we can write down the trivial $x_{i} = x_{i}$ for every $i$ . At the end of this process, we'll have expressed every variable $x_{1}, \dots, x_{m}$ as a linear combination of $x_{1} = 1$ only, which means we'll have uniquely determined every group size.

Now, to eliminate $x_{j}$ for $j > 1$ , we first need to write down $x_{j}$ as a linear combination of (some subset of) $x_{1}, \dots, x_{j - 1}$ . To do this, let $S$ be the set of $i$ such that the currently written-down linear combination for $x_{i}$ includes $x_{j}$ with a positive coefficient; this is a proper nontrivial subset of ${1, \dots, m}$ because it includes $j$ itself and excludes $1$ . Then, look at the equation the main phase generated that corresponds to $S$ , which is some equation of the form

\sum_{i \in S} x_{i} = \sum_{i \in S^{'}} x_{i}

for $S^{'} \neq S$ . After canceling the terms that appear on both sides, we know that, if we replace every variable with its currently written-down linear combination, we'll get a linear equation in terms of only $x_{1}, \dots, x_{j}$ where the coefficient of $x_{j}$ is positive on the left side and nonpositive on the right. This means we can solve this equation for $x_{j}$ in terms of $x_{1}, \dots, x_{j - 1}$ , exactly as desired.

Once we have a linear combination for $x_{j}$ , we can plug it into every other linear combination we have written down to rewrite all of them in terms of $x_{1}, \dots, x_{j - 1}$ .

As previously described, by repeating this process until $i = 2$ , we'll have written every variable $x_{1}, \dots, x_{m}$ in terms of only $x_{1} = 1$ , so we can determine all group sizes and the number of prisoners.

Solution 2 ("coin passing")

We're going to give each prisoner other than yourself an imaginary stack of $N^{2}$ coins.² Sending a 1 will represent passing a coin to the next prisoner in the cycle; receiving a 1 will represent receiving a coin. Prisoners will keep track of how many coins they have at all times.

At a high level, this strategy's goal is to get most of the coins passed to you. If you can guarantee you've received all but $< N^{2}$ of the coins, then you can just divide the number of coins you have by $N^{2}$ and round up to get the number of other prisoners (and then add 1 for yourself).

Every day there will be a "threshold" $T$ , and every prisoner other than yourself will pass a coin to somebody else iff they started the day with more than $T$ coins. You will never pass a coin to anybody else, only receive coins to count them.

We claim that the strategy above has the desired effect if you just loop $T$ from 0 to $N - 1$ for a total of $N^{5}$ loops ( $N^{6}$ days).

Intuitively, this works because whenever $T$ hits a number of coins that nobody has, the coins will become more evenly distributed, until nobody has $N$ or more coins.³ To prove this rigorously, consider the quantity "the sum of squares of number of coins held by each prisoner (excluding yourself)". We claim that this quantity only stays the same or decreases each day, and will decrease at least once every time $T$ loops from $0$ to $N - 1$ :

Every day, each prisoner's coin count at most increases or decreases by 1 (if they received a coin and didn't pass one, resp. if they didn't receive a coin and did pass one). For simplicity, we'll pretend you start every day with 0 coins and end with 0 or 1 coins, so that your behavior is consistent with any other prisoner with 0 coins and that equally many prisoners gain and lose coins. Suppose there are $K$ such prisoners of either type.
Everybody whose coin count increased must not have passed a coin, so they must have started the day with $T$ or fewer coins. For each of those $K$ people, the square of their coin count went up by at most $2 T + 1$ .
Everybody whose coin count decreased must have passed a coin, so they must have started the day with more than $T$ coins. For each of those $K$ people, the square of their coin count went down by at least $2 T + 1$ .
Putting it together, the coin receivers' sum of squares of coin counts increased by at most $K (2 T + 1)$ , while the coin passers' sum of squares of coin counts decreased by at least $K (2 T + 1)$ , so the overall sum of squares of coin counts must have stayed the same or decreased. Furthermore, the only way the sum of squares can stay exactly the same is if only people with $T + 1$ coins passed coins on net and only people with $T$ coins received coins on net. This means the multiset of coin counts held by prisoners stayed exactly the same (that is, for every $M$ , the number of prisoners with exactly $M$ coins at the start of the day is the same as the number of prisoners with exactly $M$ coins at the end of the day). And, this cannot happen if anybody started the day with more than $T$ coins but nobody started the day with exactly $T$ coins.
At any given point, if anybody has $N$ or more coins, then there must exist some nonnegative integer $M < N$ such that nobody has exactly $M$ coins. This means that during the next loop of $T$ going from 0 to $N - 1$ , the sum of squares must decrease at some point at or before we hit $T = M$ .

At the beginning, the sum of squares is $p N^{4}$ where $p$ is the number of prisoners other than yourself, which is less than $N^{5}$ . The sum of squares can't decrease below 0, so after $N^{5}$ loops, we can guarantee that no prisoner (other than yourself) has $N$ or more coins. So, as described initially, you can count how many coins you've collected, divide by $N^{2}$ , and round up to determine the number of other prisoners.

I don't actually know a comprehensive resources of puzzles of this type. There are a few collected at sigh.github.io (among other kinds of puzzles).↩
Many constant factors can be optimized out of my choices of parameters. I went for the simplest mathematical expressions.↩
The most coins the warden can prevent you from getting in steady state is $1 + 2 + \dots + (N - 1)$ , in a state where one prisoner has 1 coin, another has 2, and so on. To preserve this state, on a day with threshold $T$ , the warden can arrange all prisoners with $> T$ coins contiguously with the prisoner with $T + 1$ coins at the start, followed by all prisoners with $\leq T$ coins contiguously with the prisoner with $T$ coins at the start.↩