Why We Built a New Backgammon Engine in 2026

The state of backgammon AI in 2026 is awkward. The two engines that the community treats as canonical were both designed in a very different era — gnubg (2003 architecture, last full release 2023) and eXtreme Gammon (proprietary, Windows-only, no API). Both work. Both are loved. Neither is shaped for the way developers ship modern apps in 2026.

We needed an engine that exposes evaluations over HTTP, supports both short backgammon and Russian-style long narde, runs on a contemporary GPU, and is maintained by people who answer email. None existed. So we built one.

This is the story of how Nardex got built — what we ripped from the academic playbook, what we threw out, what surprised us, and what’s next.

Backgammon AI in 2026 — a quick map

Three systems matter today:

  • gnubg (1999–today): open-source, GPL, the gold-standard reference for academic comparisons. Network architecture is direct descendant of TD-Gammon. Runs on Linux/macOS/Windows but the build chain on modern macOS is painful. No API; you shell out to the binary, pipe XGID, parse text.
  • eXtreme Gammon (XG): closed-source Windows desktop app. Match-equity tables are state-of-the-art; the GUI is the standard for serious cube analysis. No API of any kind — automation requires GUI scripting, which is unsuitable for production services.
  • wildbg: newer open-source effort (Rust, 2022+). Promising architecture, but the project’s focus is the engine itself rather than a hosted product. No HTTP endpoint, no managed inference.

Notice the gap: every option requires you to embed an inference runtime in your app or on your server. If you want to ship a backgammon analysis feature in a web app or mobile client, you have to bundle a few hundred MB of binaries plus the model weights, manage CUDA installation if you want GPU, and deal with version skew between client and server.

That is fine for desktop tools and academic work. It is not fine for a content-creator tool, a coaching platform, or a chess.com-style social product where backgammon is one feature among many.

Why we needed a new engine

The trigger was concrete: we wanted to ship a backgammon and long-narde site (you’re on it) with mobile apps, real-time PvP, and post-game analysis for every player. We could not glue gnubg subprocess calls into the request path of a multi-tenant web service. The math was obvious: the team would spend more time keeping gnubg alive than building product features.

Three concrete needs that the existing options didn’t meet:

  1. A clean HTTP analyze-position endpoint — submit position + dice + played moves, get back ranked alternatives with equity and probability vectors. No subprocess, no parsing, no version pinning.
  2. Long-narde support, native — Russian narde has different starting position, different movement direction, no hits, distinct mars/koks scoring. Neither gnubg nor XG models it; you would have to write your own engine for that variant. So we did, and made it the same product.
  3. Modern hardware acceleration — gnubg and XG both run inference on CPU. For a multi-tenant API, sub-second latency at moderate concurrency requires GPU. We wanted CUDA, TensorRT, OpenVINO — pluggable via ort execution providers.

So instead of patching one of the existing engines, we built from scratch.

Architecture: Rust + ONNX + CUDA

The core stack:

  • Engine: Rust, in crates/engine/. Position representation, legal-move generation, evaluator interface. No async, no allocations on hot paths. Backgammon and narde live in parallel modules with the same surface (backgammon::adapter::Adapter, narde::adapter::Adapter) so the higher layers stay variant-agnostic.
  • Inference: ONNX models loaded via ort. One model per game variant per phase (more on phases below). Execution providers: CUDA for production, CPU for fallback, TensorRT and OpenVINO available as build features.
  • Server: Axum (HTTP), Tokio (runtime), SeaORM (Postgres), Redis (sessions, rate limits). Same Rust workspace as the engine, so engine evaluations are direct function calls — no subprocess, no IPC.
  • Frontend: SvelteKit (Svelte 5 runes) + Tailwind. Renders the same JSON the API returns; the dashboard you see is just an opinionated client of the same endpoint.

The architectural decision that matters most: the engine knows nothing about HTTP, the server knows nothing about ONNX. Each layer touches only the abstractions of the layer below. This sounds obvious but the temptation to short-circuit is real, and short-circuiting is what makes engines un-redeployable years later.

Multi-phase networks: a long narde quirk

Backgammon engines historically use one network per evaluator. We use several, segmented by game phase. The reason is long narde.

In long narde, the position passes through structurally distinct phases:

  • Blocking — both sides still have checkers in their starting quarter, contact is possible, prime building is the dominant tactic.
  • X escaped — white has cleared the starting quarter, black hasn’t (or vice versa). Asymmetric race-and-block.
  • O escaped — symmetric, opposite player.
  • Race — no contact possible, both sides racing to bear off.

A single network struggles to learn all four phases well — race positions and blocking positions look almost like different games. So we train one network per phase plus a unified network for early positions. At inference, position phase is detected in O(1) from checker counts, the right network is invoked, and you get a sharper evaluation than a one-size-fits-all model would deliver.

For backgammon the same idea applies but with fewer phases (contact / race). The cost is an extra ~2 MB of model weights and a phase-detection branch — both negligible.

Training pipeline: what we actually shipped

The pipeline lives in crates/coach/:

  1. Position generation — three sources feed narde_positions (and backgammon_positions):
    • Random positions sampled from a distribution that resembles real game frequencies
    • Self-play games at the current network strength
    • Real played games imported from public sources, decision points extracted via extract-expert-positions
  2. Rollout evaluation — each new position is evaluated by playing many short games from it with the current network. The averaged outcome becomes the training label.
  3. Training — supervised regression of the network on (position, label) pairs, with phase routing baked in.
  4. Tournament — candidate model plays a series of matches against the current production model.
  5. Promotion — winner of the tournament becomes the new production model; loser is rejected. All metadata (model UUID, phase, equity-loss vs prior champion, gating thresholds) lives in Postgres.
  6. Reclassification — older positions are re-evaluated by the new champion, refining their labels for future training cycles.

This loop runs continuously on an RTX 5080. Current state of the database:

  • ~20.6 million narde positions (random + self-play + extracted)
  • ~192,000 expert positions (decision points from real played games)
  • Multi-phase networks for narde (Blocking / Race / X-Escaped / O-Escaped)
  • Backgammon phases (Contact / Race) in development

The single most useful addition over the last six months has been the expert-positions extractor. Random positions teach the network the mechanics; self-play teaches it strategy at its own level; real expert games teach it where the interesting decisions actually are. Training on a curated set of 192k decisions found in real played games visibly improved equity error on the same set’s held-out portion, more than any architectural tweak we tried.

Benchmarks

A full latency-and-accuracy report is planned to accompany the public API release. What we can say now, honestly:

  • Hosted API call times are dominated by network round-trip from your region to the EU data center; CUDA inference of a single position is sub-millisecond on contemporary GPUs.
  • Equity-error against gnubg-2-ply on a held-out backgammon test set is small enough that on the top-move metric the disagreements are concentrated in opening positions where multiple plays score within 0.005 equity of each other.
  • For long narde no external reference engine exists. Internally we validate against hand-curated expert positions from the Минспорт rulebook plus self-play tournament games; the network’s grade calibration matches expert intuition on the curated set.

We will publish the full numbers once we are happy that the comparison setup is reproducible and fair (same opponents, same depth, same rollout count). Vague benchmarks help nobody; we’d rather wait.

Long narde — the underrepresented variant

Long narde (Russian narde) is the version played in Russia, the post-Soviet space, and the Caucasus. It uses the same board, same checkers, same dice as short backgammon — but starting position, direction, hit rules, and scoring all differ. There are 30+ million people who know how to play it, and effectively zero modern AI tooling for it.

Why? Two reasons. One: the academic backgammon community settled on short backgammon as their standard around 1970, before personal computers made games accessible everywhere. Two: when neural-net training methods like TD-Gammon arrived in the 1990s, they were applied to the variant the researchers were familiar with. Long narde was never a research object.

We see this as an obvious gap. The Russian Ministry of Sports rulebook (приказ №734 of 22.07.2024) formalised long narde’s rules in 2024. The doubling cube was officially adopted (article 23). There is now a stable, citable rule reference — exactly what an engine needs to claim correctness.

Our long-narde model uses the same architecture as the backgammon model but with a different input encoding (194 inputs vs 186 for backgammon), separate phase detection (Blocking / Race / X-Escaped / O-Escaped vs the binary backgammon Contact/Race), and a different output head (win/lose × normal/mars/koks vs win/lose × normal/gammon/backgammon). Training data flows through the same pipeline.

If you ship anything in the Russian-speaking market, this matters. If you don’t, it’s a curiosity. Either way it cost us no extra to support: the engine is structured so that adding a variant is “implement the adapter, train the networks”, not “rewrite half the engine”.

What’s next

Roadmap for the next two quarters:

  • Public API GA — close out private beta, publish pricing tiers, ship a self-serve developer portal. Until then, access is via lead form on /developers.
  • Self-host evaluation — evaluate whether to ship a self-hosted variant for customers with regulatory constraints (EU + Russia). Container, CPU fallback, key management.
  • Deeper search — current API returns ply-zero only. Ply-2 and ply-3 are on the roadmap; the engine code has the depth parameter but the cache and rate-limit budgets need re-tuning.
  • Match-equity tables — open-source MET data based on our self-play tournaments, similar to what XG ships. Low cost for us, high value for tournament directors.
  • Browser-side wasm inference — for low-latency offline play. Small phase networks should fit comfortably in a wasm bundle if we quantize.

If any of this lines up with something you’re building, get in touch via the developers page.

Try the API

The fastest way to evaluate Nardex is to look at the comparison with gnubg and XG and the 5-minute API quickstart. Both pages assume you’re a developer and won’t waste your time on marketing copy.

If you’d like context on the games themselves, backgammon rules and long narde rules are the entry points. They’re intended for end-users; the developer pages assume you already know the games.

We’re a small team. Email replies tend to be quick. If you want to know whether we’ll support a particular use case before you put your access request through, ask.