SpinShuffle precision scoring

Overview

My new puzzle app, SpinShuffle, challenges you to unscramble a grid of tiles. The tiles are attached to horizontal and vertical wheels and you spin the wheels to move the tiles, in a quest to arrange them in the right order. This article won’t make much sense if you haven’t played the game, so first head over to spinshuffle.net, download the app for free and try a few puzzles.

If you want to get competitive with the puzzles, the primary measure of success is the clock. The person who solves in the shortest time is declared the winner (unless they resort to hints or use more than the ‘par’ number of moves). But I wanted to add some extra spice to the challenge by measuring precision – that is, gauging how ‘directly’ you find the solution. The more you go down blind alleys and have to backtrack, the lower your precision score.

In this article we’ll look at the question of how we can quantify the precision concept. It turns out that we can measure precision for each tile, which will be the key to producing a cool Wordle-style heat map. And by taking the average over all the tiles we can easily obtain an overall precision score.

The raw ingredient: tile move history

In designing this feature, I assume that the SpinShuffle app does not know the solution to your puzzle. It has to wait for you to find it! This means that we can’t measure the precision until the puzzle has been solved. We don’t define a precision score for solutions that take more than the ‘par’ number of moves, since these solutions are already imprecise by definition.

Once you have solved the puzzle correctly, the only input the precision calculator needs is your move history. This is your full history, including every move and undo that you made. Even though the typical solution for a hard word puzzle is five wheel spins, it’s not unusual for me to make 100+ spins and undos while searching for the solution.

In measuring precision, we are interested in how each tile moves – we don’t care which wheels were used to move it. So the first chore is to convert the sequence of wheel spins into a list of tile move histories (one for each tile). A tile move history is a sequence of one or more tile positions, starting with the tile’s starting position and ending with its final position. Each time the tile moves, we add an entry to its move history.

Very importantly, we mark each entry in the move history with a flag to indicate whether or not it was an undo move. Think about it: the player’s solution uses the ‘par’ number of moves (otherwise we wouldn’t even be calculating its precision). Therefore, every wrong move that they made must have been undone at some point.

Here’s an example of a typical tile move history:

[1,2] → [1,5] → [1,2]u → [3,2] → [1,2]u → [3,2] → [3,3]

This tile started out at row 1, column 2 (rows and columns are both numbered from zero, so this is on the second row). The player then spun a horizontal wheel that moved it to column 5. Then they realized that this was wrong, so they used the Undo button to move it back (of course, they could have been spinning other wheels that moved other tiles in between all of these actions).

Next, they used a vertical wheel to move it to row 3, column 2. This was the right thing to do, but their next action was to undo it! This actually happens quite often, for example when you realize that the move is the right idea, but you have to set up some other tiles before spinning that wheel. Finally, they moved the tile to row 3, column 3, which was the correct final position.

The key measures of a move history

The above example shows us the three statistics that will influence our precision score:

  1. Solution path length
  2. Number of wrong moves
  3. Number of undos of correct moves

Let’s look at each one …

Solution path length

This is the number of times that the tile had to move in order to get from its start position to its final position (the placing of the tile in its initial position is counted as one ‘move’). The longer the path length, the more complex was the part that the tile played in the puzzle. We should penalize the player less for making mistakes on a tile that had to move a lot than for moving a tile that didn’t need to move at all!

In the above example the solution path is [1,2] → [3,2] → [3,3], so the solution path length is three.

Number of wrong moves

This is the number of times during play that the tile’s actual path deviated from the solution path.

In this example it happened only once: when the path was [1,2] → [1,5], thus the number of wrong moves is one.

Number of undos of correct moves

This is the number of times during play that the most recent move was undone when the actual path matched the solution path.

In this example it happened once, when the path was [1,2] → [3,2] and the [3,2] move was undone.

Converting three statistics to a precision score

Now we have the above three statistics (which are all positive integers or zero) for each tile. This is where things get interesting. We would like to have a ‘precision score’ for each tile. A percentage in the range (0,1] would be nice. A score of 1 would mean that you moved the tile directly along its final path without a single wrong move or undo. A score of 0 should be impossible (no matter how many times you hit Undo); you must have some precision since, by the rules of our algorithm, you found the solution eventually.

The first step is to combine the number of wrong moves and the number of undos of correct moves, since they are both instances of mistakes that lower your precision. In completely arbitrary fashion, I declare that undoing a correct move is “half as bad” as making a wrong move. Thus, we can declare a mistake score:

  • Mistake Score = [Number of Wrong Moves] + 0.5 * [Number of Undos of Correct Moves]

As explained earlier, we should penalize the user less harshly for mistakes that they make on tiles with long solution paths. This suggests that we should measure the ratio of mistakes to path length:

  • Mistake Percentage = [Mistake Score] / [Solution Path Length]

Normalizing the score

Is the Mistake Percentage the score we have been seeking? No, for two reasons:

  1. It is in the range [0, ∞), whereas we want a percentage in the range (0,1]. (Okay, it’s never infinity, but the point is that it will be a big number if you take a tile that didn’t need to move and keep moving it back and forth).
  2. A high mistake percentage is bad, whereas we want a high precision score to denote good performance.

These two issues suggest that a negative exponential function could be helpful. After all, if 𝑥 is in the range [0, ∞), then exp(-𝑥) is in the range (0,1] (and the magnitude is reversed).

My first attempt used e as the exponential base, but this produced pretty extreme swings in the precision scores as I experimented with making mistakes in solving puzzles. Eventually I decided to use a different exponential base number – but which one? Again picking a completely arbitrary standard, I decided that a Mistake Percentage of 25% would be regarded as “90% precision” (why not?). As a reminder, this is equivalent to making (and undoing) one wrong move on a tile that has to move three times (remember, the starting position counts as one “move”).

This benchmark enables us to calculate the exponential base:

b-0.25 = 0.9

-0.25 ln(b) = ln(0.9)

b = eln(0.9) / -0.25

b ≈ 1.52

My un-scientific experiments with this base have been pretty positive. When I feel like I did a pretty good job on a puzzle, I get a precision score in the 80-100% range. On a really hard Daily Challenge that takes me 25+ minutes, my precision score drops to the 30-50% range.

NSubstitute code refactoring VS extension

The code for this extension is available here.

NSubstitute is great for mocking heavy interfaces during unit testing, but I’ve always found setting it up to be a pain. Suppose we want to test this (unrealistic) C# code:

public interface IHeavyConcern
{
    Task<Guid> SaveExternalDocumentAsync<T>(string fileName, IDictionary<int, T> docData);
    Task<Guid> SaveExternalDocumentAsync<T>(string fileName, IDictionary<int, T> docData, bool overwrite);
}

class MyClass
{
    private readonly IHeavyConcern _heavyConcern;

    public MyClass(IHeavyConcern heavyConcern)
    {
        _heavyConcern = heavyConcern;
    }

    public Task<Guid> ThingToTestAsync()
    {
        return _heavyConcern.SaveExternalDocumentAsync("file1.txt", new Dictionary<int, string> { { 100, "one hundred" } });
    }
}

We want to call ThingToTestAsync but we want to mock the Save method. We need something like this:

        var heavyConcern = Substitute.For<IHeavyConcern>();

        heavyConcern.SaveExternalDocumentAsync<decimal>(Arg.Any<string>(), Arg.Any<System.Collections.Generic.IDictionary<int, decimal>>(), Arg.Any<bool>())
            .Returns(Guid.NewGuid());

In this case (which comes up a lot) we don’t care what arguments are passed to the Save method; we just want to give it some basic realistic behaviour (in this case, returning a fake new document ID). We do this by configuring the substitute method, but this requires careful coding of an Arg.Any call for each argument. Not my idea of fun!

Anyway, I finally got around to making some lemonade out of this lemon. It’s an opportunity to create my first Visual Studio code refactoring extension (a vsix) and publish it on Visual Studio Marketplace. Even better, it’s a chance to have a lot of fun with the awesome power of Roslyn.

This post describes how I built the extension. If you just want to use it, you don’t need to know any of this – you can just install the extension into your Visual Studio (go to Extensions ➝ Manage Extensions ➝ Online and search for ‘NSubstitute’).

The magic that links our refactoring “offering” to the user’s code is the CodeRefactoringProvider class. To get this I had to install the Visual Studio SDK (which is an option in the VS installer). Then I created a new VSIX code refactoring project, which gave me the skeleton of a refactoring extension – essentially, overriding the CodeRefactoringProvider base class. Now the main chore is to implement the ComputeRefactoringsAsync method that will be called when the user clicks the lightbulb (or presses Ctrl+.). Here we need to do three things:

  1. Look at the context (i.e. the current location within the user’s code) and decide whether our offering is applicable. If it is not (which is the case 99.9% of the time) then bail out – fast! Taking too long here is a great way to bring VS to its knees.
  2. Figure out exactly what refactoring(s) to suggest. This is the fun part. What’s especially interesting is that we must do our analysis without using the usual .NET reflection facilities. All we have access to is Roslyn’s syntactic and semantic models. As we will see, these are similar to the Type that we know and love in reflection, although maybe not quite as convenient.
  3. Bundle the refactoring(s) into a CodeAction and add it to the context so that VS will display an item in its popup menu.

Testing applicability

Visual Studio passes our ComputeRefactoringsAsync method a CodeRefactoringContext object and the first two lines of my implementation are taken straight from the boilerplate code in the VSIX project template:

var root = await context.Document.GetSyntaxRootAsync(context.CancellationToken).ConfigureAwait(false);

            // Find the node at the selection.
            var node = root.FindNode(context.Span);

For those who are new to Roslyn I should explain that the ‘node’ we’re given here is an item from the syntactic model. In other words, it tells us everything that Roslyn can infer from looking at the raw characters of our C# code, but it lacks the deeper understanding that the semantic model will bring. We’ll definitely need the semantic model at some stage, but first we should look at the syntactic model to see if we can ‘fail fast’. So what situation is applicable to our refactoring? Our ‘ideal’ implementation would look for all of the following criteria:

  1. The user’s selection should be inside the argument list of a method invocation.
  2. That argument list should be empty (we don’t want to start competing with the user; we expect to generate all the arguments – of course they can edit our code afterwards).
  3. The user’s project should use NSubstitute (well, duh).
  4. The invocation should be firing on an instance of an NSubstitute mock.

We can implement the first two right away:

            if (!node.IsKind(SyntaxKind.ArgumentList))
            {
                return;
            }
            var argListNode = node as ArgumentListSyntax;
            if (argListNode.Arguments.Any())
            {
                return;
            }

            node = node.Parent;
            if (!node.IsKind(SyntaxKind.InvocationExpression))
            {
                return;
            }

Point #3 requires the semantic model (and is a little bit expensive). But there is no practical way to do #4 – it is really a runtime thing. It seems that we will throw up the occasional false positive, but hopefully this will happen only when the user is working on unit test code (thanks to #3). To reduce this further we can apply a fifth criterion (that we can implement):

5. The invocation should be made an instance of an interface.

This will avoid a false positive when the user is writing a method call on a concrete class object instance. However, you have probably noticed the catch: NSubstitute can create mocks for non-interface types. But I will counter by saying that mocking of non-interface types is generally frowned upon in unit testing. If you want to do that then our code factoring extension won’t assist you!

Here then is the implementation of #5 and #3:

            var model = await context.Document.GetSemanticModelAsync(context.CancellationToken).ConfigureAwait(false);

            var invocationTypeInfo = model.GetTypeInfo((invocationNode.Expression as MemberAccessExpressionSyntax).Expression);

            if (invocationTypeInfo.Type.TypeKind != TypeKind.Interface)
            {             
                return;
            }

            if (!model.Compilation.ExternalReferences.Any(r => r.Display.Contains("NSubstitute")))
            {
                return;
            }

Computing the code output

Once we have decided that our offering applies, we need to look at the semantic model and guess what code the user wants to write. The interesting thing is that at this point, where the user is invoking the popup menu, their code probably contains errors. A typical line will look like this…

substituteObj.SomeMethod()

… and the cursor will be in between the brackets. This is really easy to do, because when you type SomeMethod(, Visual Studio will automatically add the closing bracket and leave the cursor between the brackets. Now just hit Ctrl+. and our extension should be invoked. Notice that there is no semicolon closing the statement (a syntax error) and no arguments (a semantic error). Fortunately, Roslyn is so awesome that it can still figure out what’s going on. It is not flummoxed by the statement error – instead it goes ahead and builds a list of candidates for what the user might be trying to write. These give us just what we need:

            IMethodSymbol[] candiateMethods;

            var invocationSymbolInfo = model.GetSymbolInfo(invocationNode);
            if (invocationSymbolInfo.CandidateReason == CandidateReason.OverloadResolutionFailure)
            {
                var overloadCandidates = invocationSymbolInfo.CandidateSymbols;
                candiateMethods = overloadCandidates.Cast<IMethodSymbol>().ToArray();
            }
            else
            {
                return;
            }

Notice that there could be more than one candidate, because the interface might declare multiple overloads for the method name that the user has written. We will now build a CodeAction for each one. This is an asynchronous function that will modify the user’s Document object (and thus their code). I won’t go through the whole thing here; I just want to point out a few key points. We first ask Roslyn for a collection of the overload’s parameters:

            var paramNames = matchingMethodSymbol.Parameters.Select(p => p.Type.ToString()).ToArray();

Here we break away from Roslyn and turn the types into plain old strings, such as “decimal”. This is almost what we want – for a decimal parameter we want to insert the code string Arg.Any<decimal>(). But there’s a wrinkle: generics. A parameter type like IEnumerable<string> won’t cleanly convert to the string “IEnumerable<string>”. Fortunately, I found a nice function written by StackOverflow user @HappyNomad that solves the problem. It actually creates a tiny C# script consisting of the code typeof(<p.Type.ToString()>) and runs it through Roslyn to create a TypeSyntax node. (Hmm… it seems odd for us to be going from the semantic model world back to the syntax model world. Oh well; it works).

We’re getting there; once we have a TypeSyntax node for a parameter we can do the fiddly work of building a Roslyn syntax tree that represents the call to Arg.Any<T>(). There are cleaner ways to do this, but here’s the low-tech pattern that I used:

        private static ArgumentSyntax CreateArgAny(TypeSyntax argGenericType)
        {
            var genArg = SyntaxFactory.TypeArgumentList(SyntaxFactory.SeparatedList(new[] { argGenericType }));
            var calledMethodNameExp = SyntaxFactory.GenericName(SyntaxFactory.Identifier("Any"), genArg);
            var methodCallExp = SyntaxFactory.MemberAccessExpression(SyntaxKind.SimpleMemberAccessExpression, SyntaxFactory.IdentifierName("Arg"), calledMethodNameExp);
            var invocation = SyntaxFactory.InvocationExpression(methodCallExp);
            return SyntaxFactory.Argument(invocation);
        }

We join all of these together in a single Argument List syntax node:

            var argumentSyntaxes = smartGenTypes.Select(CreateArgAny);
            var arglist = SyntaxFactory.ArgumentList(SyntaxFactory.SeparatedList(argumentSyntaxes));

Now our code action can alter the user’s document. As you would expect nowadays, all the Roslyn data structures are immutable. So to make our change we create a new node, representing the original invocation node with our argument list added to it, then create a new document root node in which the old invocation node is replaced with our new node:

            var oldInvNode = invocationNode;
            var newInvNode = oldInvNode
                                .WithArgumentList(generatedArgList)
                                .WithTrailingTrivia(SyntaxFactory.CarriageReturnLineFeed);
            var newRoot = root.ReplaceNode(oldInvNode, newInvNode);

Creating the menu item(s)

I wanted slightly different functionality depending on whether there are overloads. If there’s just one flavour of the method then you should see just one popup menu – you select it, hit Enter and bam – it’s done. But if there are multiple overloads then that popup menu should have sub-items and you will have to choose which flavour you want. This was easy: the SDK lets us create a CodeAction that contains a collection of other CodeActions. This CodeAction automatically becomes a containing menu item with sub-items.

Functional-style programming in JavaScript with Ramda

mandel
Screenshot from the Mandelbrot web app.

Introduction

The Mandelbrot set is a computer geek’s dream.  It has everything you want for a coding demo.  It’s a simple idea but it produces beautiful images.  And it’s computationally intensive so it can be used to evaluate optimizations.

Let’s use it to practise my current curiosity: functional JavaScript using Ramda.  The source code is on GitHub; you can see it running on Plunker. This article is aimed at people who are familiar with JavaScript and who are curious about functional programming.

Continue reading

Commands and Queries

I’m not sure whether I’m using the accepted terminology here.  People nowadays mean something else by “Command/Query separation.”  But I swear that I first heard the term used (years and years ago) to describe what I consider to be one of the most important principles of method design, and it’s the name I stick with.  What I’m talking about is the idea of classifying methods into two broad categories:

  • Commands are methods that change the state of the world.  To put it another way, they leave visible side effects.
  • Queries are functions that figure out a result and give it to you.
Continue reading

Mistake #1: Abuse of methods

Before we talk about this problem, I have a bit of a theory about how it arises. Your typical developer reads a bunch of books in school about this cool thing called Object Oriented Programming, and how it has these awesome features called inheritance and polymorphism and stuff like that. Then they go on to higher education and they do the mandatory first year programming course. The lecturer starts at the beginning and slowly explains about methods and how they enable the scaling up of design. And the developer-to-be is thinking “Yeah, I know this already… can we just skip to the good stuff about classes and vtables and overrides? Methods are easy and boring.”

Continue reading

Thoughts on naming things

I’m sure you’ve heard the saying: “There are only two hard things in software – cache invalidation, and naming things.”  I’ve thought about this quote for years and I still can’t decide whether I agree that there are indeed only two hard things.  But I definitely agree that Naming Things is hard.

Naming is hard because it gets right to the heart of what we’re doing: building a model in order to solve a real world problem. To be useful, that model must be a clever description of reality that captures the essence of the problem while ignoring what’s irrelevant. And the way we describe things is with words.

Continue reading

Angular 2 directive – square

This article is part of a series – the contents page is here.

In the last article we looked at the component that displays a piece.  It was pretty simple, not least because it didn’t support any kind of user input.  This time we’ll deal with the display of an individual board square.  We’ll make the square respond to mouse clicks so that the user can select it.

Continue reading