Overview
My new puzzle app, SpinShuffle, challenges you to unscramble a grid of tiles. The tiles are attached to horizontal and vertical wheels and you spin the wheels to move the tiles, in a quest to arrange them in the right order. This article won’t make much sense if you haven’t played the game, so first head over to spinshuffle.net, download the app for free and try a few puzzles.
If you want to get competitive with the puzzles, the primary measure of success is the clock. The person who solves in the shortest time is declared the winner (unless they resort to hints or use more than the ‘par’ number of moves). But I wanted to add some extra spice to the challenge by measuring precision – that is, gauging how ‘directly’ you find the solution. The more you go down blind alleys and have to backtrack, the lower your precision score.
In this article we’ll look at the question of how we can quantify the precision concept. It turns out that we can measure precision for each tile, which will be the key to producing a cool Wordle-style heat map. And by taking the average over all the tiles we can easily obtain an overall precision score.
The raw ingredient: tile move history
In designing this feature, I assume that the SpinShuffle app does not know the solution to your puzzle. It has to wait for you to find it! This means that we can’t measure the precision until the puzzle has been solved. We don’t define a precision score for solutions that take more than the ‘par’ number of moves, since these solutions are already imprecise by definition.
Once you have solved the puzzle correctly, the only input the precision calculator needs is your move history. This is your full history, including every move and undo that you made. Even though the typical solution for a hard word puzzle is five wheel spins, it’s not unusual for me to make 100+ spins and undos while searching for the solution.
In measuring precision, we are interested in how each tile moves – we don’t care which wheels were used to move it. So the first chore is to convert the sequence of wheel spins into a list of tile move histories (one for each tile). A tile move history is a sequence of one or more tile positions, starting with the tile’s starting position and ending with its final position. Each time the tile moves, we add an entry to its move history.
Very importantly, we mark each entry in the move history with a flag to indicate whether or not it was an undo move. Think about it: the player’s solution uses the ‘par’ number of moves (otherwise we wouldn’t even be calculating its precision). Therefore, every wrong move that they made must have been undone at some point.
Here’s an example of a typical tile move history:
[1,2] → [1,5] → [1,2]u → [3,2] → [1,2]u → [3,2] → [3,3]
This tile started out at row 1, column 2 (rows and columns are both numbered from zero, so this is on the second row). The player then spun a horizontal wheel that moved it to column 5. Then they realized that this was wrong, so they used the Undo button to move it back (of course, they could have been spinning other wheels that moved other tiles in between all of these actions).
Next, they used a vertical wheel to move it to row 3, column 2. This was the right thing to do, but their next action was to undo it! This actually happens quite often, for example when you realize that the move is the right idea, but you have to set up some other tiles before spinning that wheel. Finally, they moved the tile to row 3, column 3, which was the correct final position.
The key measures of a move history
The above example shows us the three statistics that will influence our precision score:
- Solution path length
- Number of wrong moves
- Number of undos of correct moves
Let’s look at each one …
Solution path length
This is the number of times that the tile had to move in order to get from its start position to its final position (the placing of the tile in its initial position is counted as one ‘move’). The longer the path length, the more complex was the part that the tile played in the puzzle. We should penalize the player less for making mistakes on a tile that had to move a lot than for moving a tile that didn’t need to move at all!
In the above example the solution path is [1,2] → [3,2] → [3,3], so the solution path length is three.
Number of wrong moves
This is the number of times during play that the tile’s actual path deviated from the solution path.
In this example it happened only once: when the path was [1,2] → [1,5], thus the number of wrong moves is one.
Number of undos of correct moves
This is the number of times during play that the most recent move was undone when the actual path matched the solution path.
In this example it happened once, when the path was [1,2] → [3,2] and the [3,2] move was undone.
Converting three statistics to a precision score
Now we have the above three statistics (which are all positive integers or zero) for each tile. This is where things get interesting. We would like to have a ‘precision score’ for each tile. A percentage in the range (0,1]
would be nice. A score of 1 would mean that you moved the tile directly along its final path without a single wrong move or undo. A score of 0 should be impossible (no matter how many times you hit Undo); you must have some precision since, by the rules of our algorithm, you found the solution eventually.
The first step is to combine the number of wrong moves and the number of undos of correct moves, since they are both instances of mistakes that lower your precision. In completely arbitrary fashion, I declare that undoing a correct move is “half as bad” as making a wrong move. Thus, we can declare a mistake score:
Mistake Score = [Number of Wrong Moves] + 0.5 * [Number of Undos of Correct Moves]
As explained earlier, we should penalize the user less harshly for mistakes that they make on tiles with long solution paths. This suggests that we should measure the ratio of mistakes to path length:
Mistake Percentage = [Mistake Score] / [Solution Path Length]
Normalizing the score
Is the Mistake Percentage the score we have been seeking? No, for two reasons:
- It is in the range
[0, ∞)
, whereas we want a percentage in the range(0,1]
. (Okay, it’s never infinity, but the point is that it will be a big number if you take a tile that didn’t need to move and keep moving it back and forth). - A high mistake percentage is bad, whereas we want a high precision score to denote good performance.
These two issues suggest that a negative exponential function could be helpful. After all, if 𝑥 is in the range [0, ∞)
, then exp(-𝑥)
is in the range (0,1]
(and the magnitude is reversed).
My first attempt used e as the exponential base, but this produced pretty extreme swings in the precision scores as I experimented with making mistakes in solving puzzles. Eventually I decided to use a different exponential base number – but which one? Again picking a completely arbitrary standard, I decided that a Mistake Percentage of 25% would be regarded as “90% precision” (why not?). As a reminder, this is equivalent to making (and undoing) one wrong move on a tile that has to move three times (remember, the starting position counts as one “move”).
This benchmark enables us to calculate the exponential base:
b-0.25 = 0.9
-0.25 ln(b) = ln(0.9)
b = eln(0.9) / -0.25
b ≈ 1.52
My un-scientific experiments with this base have been pretty positive. When I feel like I did a pretty good job on a puzzle, I get a precision score in the 80-100% range. On a really hard Daily Challenge that takes me 25+ minutes, my precision score drops to the 30-50% range.