Monte Carlo tree search is a staple algorithm in the field of machine learning. It is employed by some of the leading AI in strategy games like chess and Go. It is also a relatively easy algorithm to implement. This article describes the methodology of the Monte Carlo tree search.

The Monte Carlo tree search (MCTS) is a heuristic algorithm used to make a move decision on a turn-based strategy game. It does not require a static evaluator, which makes it powerful and effective over algorithms like minimax for games that lack an accurate static evaluator.

Monte Carlo method

MCTS is part of a class of computational algorithms known as Monte Carlo experiments. These algorithms use repeated sampling to find a numeric solution.

A famous example is throwing darts randomly at a board, then calculating the ratio of darts that land inside the quadrant to find the value of Pi. Assuming the darts are uniformly distributed, this approach eventually yields a very good approximation of Pi.

Your browser does not support canvas.

Darts on board: 0

Number of throws: 0

PI estimate: N/A

How does MCTS work?

A search tree is a tree where paths from the root to the leaf represents a possible sequence of moves in a turn-based game.

example-inkscape

The idea of MCTS is based on expanding the search tree via choosing random moves until the game ends, then using the result to update the weight of parent nodes. (The weight of parent nodes correspond to the evaluation of a given position in the game.)

More formally, each iteration of MCTS is comprised of 4 stages. Each stage is given a brief summary below, but will make more sense with the upcoming example.

  • Selection
    • Start from the current position (root) and traverse to a leaf node. This can either be random or predetermined (e.g. UCT).
  • Expansion
    • If this position has not been previously simulated from, do not expand it. Otherwise, expand the node by creating children corresponding to all valid moves in the position. Then, proceed to simulate from the 1st child node.
  • Simulation
    • Both players make random moves until the game ends. This stage is also called playout or rollout.
  • Backpropagation
    • Use the result of the game to update the weight of parent nodes.

Example: Tic-Tac-Toe

example-position

This is the current position. Our AI plays X, and it uses MCTS to decide its next move.

example-initial-tree

The initial tree consists of 1 node (our current position) with 0 playouts.

Selection

We start from the root. In this case, the root is the leaf, so we are done.

Expansion

example-expansion

This position has not been simulated from, so we expand by creating children. In practice, we create a child node for each valid move, but in this example, here we only create a node for 2 move options for the sake of simplicity.

Simulation

example-expansion

We make random moves until the game ends.

Backpropagation

example-expansion

In our simulation, the game ended in a draw. We update our child node to 0.5/1 (0.5 as we drew, and 1 as we had 1 simulation).

We then update its parent node to reflect this. Notice that the numbers on a node should be equal to the sum of the numbers on its children.

Next steps

example-expansion

Assume we did another iteration and simulated the right leaf node, which we lost. How do we perform the selection stage now?

Instead of traversing to a leaf node at random, we can use the UCT algorithm: Upper Confidence Trees. This is defined as the following.

\[\frac{\text{wins of current node}}{\text{total playouts of current node}} + c \sqrt{\frac{\ln{(\text{total playouts of parent node})}}{\text{total playouts of current node}}}\]

\(c\) is a constant, \(2\) is often used. We start at the root node, and traverse downwards. For each tree level, we choose the child with the highest UCT, until a leaf node is reached. Division by zero counts as infinity, so this prioritises leaf nodes that have not been simulated before.

The UCT algorithm prioritises simulating nodes with a higher simulated chance of winning for the AI. More precisely, the algorithm is defined as:

\[\frac{w_i}{s_i} + c \sqrt{\frac{\ln{s_p}}{s_i}}\]

example-expansion

Going back to our example, we choose the left child node as it has the higher UCT (using \(c=2\)) and continue to the expansion stage.

example-expansion

And the algorithm continues… but our example ends here.

MCTS in practice

MCTS is used in neural networks for games like chess, Go and bridge. Some of the strongest engines like AlphaGo, AlphaZero and Leela Chess Zero uses MCTS with deep learning.

MCTS is a good choice for games that lack an accurate static evaluator, like Ultimate Tic-Tac-Toe.

This is my attempt at creating an AI for Ultimate Tic-Tac-Toe, written in TypeScript. The source code for MCTS is copied below. It is quick to implement, spanning only a few hundred lines.

See if you can beat the AI! Press Config and select Human against AI Hard, then start a new game.

private executeMCTSIter(boardCopy: GlobalBoard): void {
  // Selection
  let currNode = this.head;
  let currMarkType = this.markType;
  // Keep track of parents for backpropagation
  let parents: TreeNode<MoveWithPlayouts>[] = [];

  while (!currNode.isLeaf()) {
    parents.push(currNode);
    currNode = this.getChildWithTopUCT(currNode);
    let moveWithEval = currNode.value!;
    // Update board
    boardCopy.setCellValueWithMove(moveWithEval.markType!, moveWithEval.move);
    boardCopy.updateActiveBoards(moveWithEval.move.localIndex);
    // Keep track of mark type
    currMarkType = this.getOtherMarkType(currMarkType);
  }

  // Expansion
  if (currNode.value!.getNumPlayouts() > 0) {
    const moves = this.getValidMovesThatDoNotLose(boardCopy);
    const children = moves.map(move => {
      return new TreeNode(
        new MoveWithPlayouts(move, boardCopy.copy(), currMarkType)
      );
    });
    currNode.setChildren(children);

    if (children.length > 0) {
      parents.push(currNode);
      currNode = children[0];
      const moveWithEval = currNode.value!;
      // Update board
      boardCopy.setCellValueWithMove(
        moveWithEval.markType!, moveWithEval.move
      );
      boardCopy.updateActiveBoards(moveWithEval.move.localIndex);
      // Keep track of mark type
      currMarkType = this.getOtherMarkType(currMarkType);
    }
  }

  // Simulation
  const chosenNode = currNode;
  const result = this.simulatePlayout(boardCopy, currMarkType);
  chosenNode.value!.update(result);

  // Backpropagation
  for (let node of parents) {
    node.value!.update(result);
  }
}

protected simulatePlayout(board: GlobalBoard,
                          currentMarkType: MarkType): BoardStatus {
  let boardCopy = board.copy();

  // Simulate game until it ends
  while (boardCopy.getStatus() === BoardStatus.InProgress) {
    let move = this.getSmartRandomMove(boardCopy, currentMarkType);

    boardCopy.setCellValue(currentMarkType, move.globalIndex,
                            move.localIndex);
    boardCopy.updateActiveBoards(move.localIndex);

    // Switch player
    currentMarkType = this.getOtherMarkType(currentMarkType);
  }

  return boardCopy.getStatus();
}

Further reading

  • Dr. John Levine of University of Strathclyde gave a fantastic lecture on MCTS. I highly recommend watching this video.
  • The Wikipedia article on MCTS is very well maintained, especially the Principle of operation category.
  • GeeksforGeeks has a good article and provides basic pseudocode on the algorithm.