Methodology

Full math, modelling choices, and limitations. This is the long page — the rest of the study is non-technical; this is where the equations, formal definitions, and citations live. The five public tabs all sit on top of the pipeline described here.

1. Notation & state

The tracking transformer's input is a single frame: a snapshot of all 22 players plus the ball at a single 5 Hz sample. We represent each frame as a sequence of 23 tokens (22 players + 1 ball). Each token is a 7-dimensional feature vector.

1.1 Per-token features

FeatureRangeDefinition
x_norm\([-1, 1]\)Pitch x-coordinate, centred on midfield, scaled so each touchline sits at \(\pm 1\).
y_norm\([-1, 1]\)Pitch y-coordinate, centred and scaled identically.
vxm/sx-velocity from finite differences, clamped to \([-25, 25]\).
vym/sy-velocity from finite differences, clamped to \([-25, 25]\).
is_attacking_side\(\{0,1\}\)1 iff the token belongs to the team currently in possession (ball token: 0).
is_goalkeeper\(\{0,1\}\)1 iff the player was identified as GK via the extreme-x heuristic over a calibration window.
has_possession\(\{0,1\}\)1 iff this token is the ball, or the player currently nearest the ball as a possession proxy.

Player ordering within the player block is arbitrary; the transformer is permutation-equivariant over tokens (no role one-hots, no positional encoding on the player axis). This is a deliberate inductive bias — identity is what the model has to infer from geometry + velocity, not what it is told.

1.2 Sampling & coordinates

2. VAEP framework

VAEP (Valuing Actions by Estimating Probabilities) is Decroos et al.'s framework for assigning a continuous expected-goal value to every on-ball action. The headline idea: an action's value is the change in expected goal difference it produces, where "expected goal difference" is the probability the possessing team scores in the next \(K\) actions minus the probability they concede in the same window.

\[ V(a) = \big[P_{\mathrm{score}}(s_a) - P_{\mathrm{score}}(s_{a-1})\big] - \big[P_{\mathrm{concede}}(s_a) - P_{\mathrm{concede}}(s_{a-1})\big] \]

where \(s_a\) is the post-action game state, \(s_{a-1}\) the pre-action state, and \(P_{\mathrm{score}}(s)\), \(P_{\mathrm{concede}}(s)\) are estimated by gradient-boosted classifiers conditioned on the previous three actions (SPADL encoding). The standard look-ahead is \(K = 10\) actions.

Equation derivation & implementation note

If the previous action \(a-1\) was performed by the opposing team, \(P_{\mathrm{score}}(s_{a-1})\) and \(P_{\mathrm{concede}}(s_{a-1})\) must be swapped before taking the difference, so that the prior state is expressed from the perspective of the team performing \(a\):

# research/src/chemistry/vaep/model.py
same = (teams == prev_team)
prev_value_self_persp = np.where(same, prev_pS - prev_pC, prev_pC - prev_pS)
df["vaep_value"] = (pS - pC) - prev_value_self_persp

\(P_{\mathrm{score}}\) and \(P_{\mathrm{concede}}\) are trained independently via XGBoost on SPADL-encoded action sequences. Out-of-fold AUCs typically land in the 0.85–0.92 range for \(P_{\mathrm{score}}\) and 0.65–0.75 for \(P_{\mathrm{concede}}\).

VAEP out-of-fold metrics will load here.

Reference: Decroos T., Bransen L., Van Haaren J., Davis J. (2019). Actions Speak Louder than Goals: Valuing Player Actions in Soccer. KDD 2019. See §9 for citation.

3. JOI / JDI (event-based baseline)

Bransen & Van Haaren (MIT Sloan, 2020) layer pair chemistry on top of VAEP. Joint Offensive Impact (JOI) sums VAEP across consecutive-action pairs of teammates; Joint Defensive Impact (JDI) assigns credit to defending pairs when opponents under-perform their expected offensive impact.

3.1 Joint Offensive Impact (JOI)

For an ordered pair of consecutive actions \((a_i, a_{i+1})\) by teammates \(p\) and \(q\), both VAEP values contribute to JOI(p, q):

\[ \mathrm{JOI}(p, q) = \sum_{(a_i,\, a_{i+1}) \in S_{p,q}} V(a_i) + V(a_{i+1}) \]

Normalised per 90 shared minutes: \( \mathrm{JOI90}(p, q) = 90 \cdot \mathrm{JOI}(p,q) / m_{p,q} \).

3.2 Joint Defensive Impact (JDI) — and what event data cannot see

JDI is the event-based baseline's hard half. Defenders' value lives mostly off-ball, but event data only records on-ball touches. The authors are explicit:

"The match event data only describes the actions that actually happened in the match but not the actions that players prevented from happening, for instance, by smart runs or clever positioning."
— Bransen & Van Haaren (2020), §3.2 (Joint Defensive Impact).

Their workaround: for every opposing player \(o\) the defending pair faced, compute the gap between \(o\)'s expected and actual OI and split it across same-team pairs by a responsibility share \(R(p, q, o)\) derived from a 5×5 grid distance heuristic.

\[ \mathrm{JDI}_m(p, q) \;=\; \sum_o \big(\mathbb{E}[\mathrm{OI}_m(o)] - \mathrm{OI}_m(o)\big) \cdot R_m(p, q, o) \cdot \frac{\mathrm{mins}_m(p, q, o)}{90} \]
OI, expected OI, responsibility share — full definitions

Per-match OI: \( \mathrm{OI}_m(o) = \sum_a V(a) \) over passes, crosses, dribbles, take-ons, and shots by player \(o\) in match \(m\).

Expected OI: \(o\)'s per-90 mean across prior matches in the dataset, Bayesian-shrunk toward a positional prior (GK / DEF / MID / FWD) when fewer than \(M_0 = 700\) minutes have been played.

Responsibility share: \( r_p(o) = (1/d(p, o)) / \sum_q (1/d(q, o)) \), with \(d\) the Euclidean distance in the 5×5 grid after mirroring \(o\)'s cell into the defending coordinate frame. The pair share is the simple mean \(R_m(p, q, o) = \tfrac{1}{2}(r_p(o) + r_q(o))\).

Aggregated across matches and normalised per 90 shared minutes gives JDI90.

Predictor metrics will load here.

The JDI quote above is the foil for §4–5: the tracking transformer does see those off-ball actions, frame by frame, for all 22 players.

4. Our tracking transformer

Architecture adapted from Sumer Sports's open-source SportsTrackingTransformer (originally an NFL model). Three changes for soccer: 23-token state instead of 22, two BCE heads (frame-VAEP) instead of an NFL-specific target, and an encode_with_attention path that is a first-class output rather than a debug hook.

4.1 Architecture

\[ \text{Input}\ (B, 23, 7) \;\to\; \mathrm{BatchNorm_{features}} \;\to\; \mathrm{Linear}\ (7 \to d) \;\to\; \big[\mathrm{TransformerEncoderLayer}(d, h)\big]^{L} \;\to\; \mathrm{Heads} \]

4.2 Heads & targets

Three heads have been trained on this backbone. The two relevant to chemistry are the frame-VAEP specialists; the xT-regression head is the original supervised target and is retained for evaluation.

Train / validation split. 36 PFF WC22 matches used for training (629,634 frames at 5 Hz); 8 held-out matches (138,793 frames) used for the val metrics reported below. The val set is match-disjoint from train — the model never sees any frame from those 8 matches during training. This is a single train/val split, not k-fold cross-validation; the headline numbers are best read as "held-out on 8 of 44 matches," not "out-of-fold across folds."

The chemistry pipeline (§5) uses the two single-head specialists, because the shared backbone attends disproportionately to GKs and defenders (P(concede) carries tighter spatial signal than P(score) and dominates the gradient). The score specialist's top-10 off-off pairs overlap the shared model's (legacy / retained for comparison only) by 7/10; new entries include Dembélé + Mbappé (was #7 on shared) and Brazil's Real Madrid duo Vinícius + Raphinha (absent from the shared top-10 entirely).

4.3 Attention extraction

The transformer exposes encode_with_attention which returns the pooled encoding alongside the full per-layer per-head attention tensor:

\[ \mathrm{Attn} \in \mathbb{R}^{B \times L \times H \times T \times T}, \quad T = 23 \]

For chemistry we use the ball token as the query: the row \(\mathrm{Attn}[\,b,\,:,\,:,\,\text{ball},\,1{:}\text{P}+1\,]\), averaged across the \(L\) layers and \(H\) heads, gives a length-22 probability vector over the player tokens — "given the ball token's query, how much weight does the model place on each player when forming its prediction at frame \(t\)?".

Python implementation (excerpt)
# research/scripts/extract_aw_joi.py
ball_attn = attn[:, :, :, BALL_TOKEN, :NUM_PLAYER_SLOTS]   # (B, L, H, 22)
ball_attn = ball_attn.mean(dim=(1, 2))                      # (B, 22)

5. From attention to AW-JOI / AW-JDI

Attention-Weighted JOI / JDI are the pair-level aggregates that feed the leaderboard, the Whiteboard, and the team chemistry density. The construction is the tracking-data analogue of Bransen's JOI / JDI: instead of summing VAEP across consecutive on-ball actions of two teammates, we sum frame-level prediction-deltas weighted by how much the model was jointly attending to both players at that frame.

5.1 Frame-level building blocks

Per frame \(t\), per same-team pair \((p, q)\):

5.2 Pair sums

AW-JOI sums over offence-side moves (positive score-delta) using the score-specialist's attention; AW-JDI sums over defence-side moves (positive concede-delta — i.e. concede risk grew, and the defending pair gets credit for what attention the model placed on them while it grew) using the concede-specialist's attention:

\[ \mathrm{AW\text{-}JOI}(p, q) \;=\; \sum_{t} c_{\mathrm{score}}(p, q, t) \cdot \max\!\big(\Delta P_{\mathrm{score}}(t),\ 0\big) \] \[ \mathrm{AW\text{-}JDI}(p, q) \;=\; \sum_{t} c_{\mathrm{concede}}(p, q, t) \cdot \max\!\big(\Delta P_{\mathrm{concede}}(t),\ 0\big) \]

Both are normalised per 90 shared on-pitch minutes for the pair: \(\mathrm{AW\text{-}JOI90}(p, q) = 90 \cdot \mathrm{AW\text{-}JOI}(p, q) / m_{p, q}\), analogously for AW-JDI90.

Source — verified against research/scripts/extract_aw_joi.py
# research/scripts/extract_aw_joi.py (excerpt; aggregation kernel)
# Forward differences -- last frame's dv = 0
dv_score[:-1]   = p_score[1:]   - p_score[:-1]
dv_concede[:-1] = p_concede[1:] - p_concede[:-1]
w_joi = np.clip(dv_score,   0.0, None)
w_jdi = np.clip(dv_concede, 0.0, None)

# c[p,q,t] = a_p(t) * a_q(t); contributions = c * weight
a_i = ball_attn_nz[:, iu]                  # (M, K) over upper-tri pairs
a_j = ball_attn_nz[:, ju]
c   = a_i * a_j
contribs = c * weight_nz[:, None]          # (M, K)
# masked to same-team pairs, accumulated into pair_sums[(p,q)]

Note on the JDI sign convention. Bransen's JDI rewards defenders when opponents under-perform expected offence. Our AW-JDI is the tracking-side analogue: positive \(\Delta P_{\mathrm{concede}}\) means concede risk grew over the next 0.2 s, so the attention paid to a defensive pair in that instant is credited to them as "they were in the picture while the team was being broken down." High AW-JDI90 thus flags defensive engagement, not defensive success per se — the leaderboard treats it as the "defensive co-watched-ness" score.

6. Team Chemistry Density (TCD)

TCD is the team-level scalar that summarises "how many of this squad's pairs are above the league baseline?". It is the metric most pages sort by. Definition:

6.1 Pool medians

We compute two pool-level reference values across all 31 squads (one team excluded for data completeness):

Empirical pool medians on the WC22 corpus: \(\tilde{m}_{\mathrm{off}} \approx 0.467\), \(\tilde{m}_{\mathrm{def}} \approx 0.322\).

6.2 Per-team counts

For each team:

\[ n_{\mathrm{off}} = \#\{(p,q) \in \mathrm{off}\text{-}\mathrm{off} : \mathrm{AW\text{-}JOI90}(p,q) > \tilde{m}_{\mathrm{off}}\} \] \[ n_{\mathrm{def}} = \#\{(p,q) \in \mathrm{def}\text{-}\mathrm{def} : \mathrm{AW\text{-}JDI90}(p,q) > \tilde{m}_{\mathrm{def}}\} \] \[ n_{\mathrm{cross}} = \#\{(p,q) \in \mathrm{cross} : \Delta_{p,q} > 0\} - \#\{(p,q) \in \mathrm{cross} : \Delta_{p,q} < 0\} \]

where \(\Delta_{p,q} = \mathrm{AW\text{-}JOI90}(p,q) - \mathrm{AW\text{-}JDI90}(p,q)\) — the "off-vs-def net" of a cross-line pair (an attacker–defender pair contributes positively when its offensive co-watched-ness exceeds its defensive co-watched-ness).

\[ \mathrm{TCD} \;=\; n_{\mathrm{off}} + n_{\mathrm{def}} + n_{\mathrm{cross}} \]

6.3 Worked example — Argentina (winner, TCD rank 7)

Values taken directly from team_chemistry_vs_paper.json:

ComponentValue
\(n_{\mathrm{off}}\) (off–off pairs above \(\tilde{m}_{\mathrm{off}}\))19
\(n_{\mathrm{def}}\) (def–def pairs above \(\tilde{m}_{\mathrm{def}}\))12
\(n_{\mathrm{cross}}\) (cross-line net, off-positive − def-positive)67
TCD98
TCD rank (of 31 teams)7

Sanity check: Argentina has 168 within-team pairs (squad 26, three role buckets), so 19 + 12 + 67 = 98 strong-or-net pairs is roughly 58% of the within-team pair pool. The Brazil row at TCD 121 (rank 2) is denser still; Qatar at TCD 54 (rank 27) is the floor among full data rows. Each tab's "strong pair" counts on the study derive from these same three numbers.

7. Reconciliation with earlier drafts

Reconciliation with earlier drafts. Earlier drafts reported a chemistry-vs-finish Spearman ρ ≈ 0.78 using a pre-TCD strong-pair count with a hard 0.4 threshold. The unified TCD metric (pool-median threshold per role + cross-net) measures ρ = 0.704 on the same 31 teams (p < 0.001). The new definition is the one shipped on every public page; the older 0.78 number is retained only in the headline-archive for traceability.

8. Limitations

9. References

  1. Decroos, T., Bransen, L., Van Haaren, J., & Davis, J. (2019). Actions Speak Louder than Goals: Valuing Player Actions in Soccer. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '19), pp. 1851–1861. DOI: 10.1145/3292500.3330758.
  2. Bransen, L., & Van Haaren, J. (2020). Player Chemistry: Striving for a Perfectly Balanced Soccer Team. MIT Sloan Sports Analytics Conference 2020. arXiv: 2003.01712.
  3. Sumer Sports. SportsTrackingTransformer (open-source implementation of a permutation-equivariant transformer over tracking data, originally NFL). GitHub: github.com/SumerSports/SportsTrackingTransformer.
  4. Singh, K. (2019). Introducing Expected Threat (xT). Blog post: karun.in/blog/expected-threat.html.
  5. Benhida, M., et al. (2025). Tactical analysis of Morocco's 2022 World Cup performance via PCA and K-means clustering of FIFA match KPIs. Applied Sciences. DOI: 10.3390/app15189994.

Citation note: a peer-reviewed Jordet/Aksum-style paper on visual scanning in soccer was on the candidate list for this section but is omitted because the specific DOI/URL could not be verified at write time; we would rather drop than misattribute. The Sumer Sports reference cites the open-source repository directly because no associated peer-reviewed paper is known to exist.

10. How to reproduce

All intermediate parquet files are exported as CSV on the Downloads page, along with the trained checkpoints. Python 3.12+; use uv (not pip) for everything.

Bash recipe (event pipeline + transformer + AW-JOI/JDI)
# 1. Env setup
uv sync --extra dev

# 2. Tests (skips real-data tests if data dirs are empty)
uv run pytest -q

# 3. Wiring smoke test on synthetic data
uv run python -m wc2026_tracking_transformer.train fit \
  --config configs/local_cpu.yaml

# 4. Train the frame-VAEP specialists
uv run python scripts/train_score_only.py   --pff-n 44 --epochs 6
uv run python scripts/train_concede_only.py --pff-n 44 --epochs 6

# 5. Extract per-pair AW-JOI / AW-JDI from both specialists
uv run python research/scripts/extract_aw_joi.py --combine-after

# 6. Render a clip
uv run python scripts/render_pff_gif.py --match 10502

Next → grab the raw artifacts from Downloads, or jump back to Chemistry Leaderboards for the live tables.