Architecture

Pipeline Flow

From raw data scraping through master file build, modeling phases, scouting overlay, cross-pipeline reconciliation, final merge, and into the playable EHM database. The full architecture, end to end.

The big picture

The Pivot Pipeline runs in five sequential stages. Each stage takes the output of the previous one and adds value — collecting raw data, normalizing it into master files, modeling per-player attributes, layering on scouting intelligence, reconciling players who appear in multiple leagues, and finally producing the EHM-importable database.

Data Input
Modeling
Scouting Overlay
Reconciliation / Merge
Final Output
Stage 01 Data Collection 60+ sources scraped, downloaded, or pulled via API into raw CSVs.
NHL MoneyPuck · EH · EDGE · LB · HSC 20+ sources, 880 players
AHL AHLTracker Single-source, 2,000 players
European Sportality · hokej.cz · liiga.fi · penny-del · nlicedata · hockeyslovakia 12 leagues, 5,500 players
Junior chl.ca · CHN · HockeyTech · GameSheet 6 leagues, 4,000 players
Russian khl.ru KHL + VHL, 1,500 players
Career / Tags EliteProspects (EP2EHM, U23, style tags) All players, multi-year
Scouting Prose The Athletic (Wheeler) · EP U23 reports 630+ NHL prospects
Stage 02 Master File Build All raw inputs joined per-player into one canonical master CSV per league pipeline.
NHL Master build_master_vNext.py ~1,000 columns, 1 row/player
AHL Master build_ahl_master.py ~750 columns
European Masters build_european_master.py 7 country builds
Junior Masters build_junior_master.py CHL · NCAA · USHL · NAHL
KHL/VHL Master build_khl_master.py
What this does: Each pipeline has its own column convention (NHL uses nhlmisc_*, edgeSpeed_*, ehgar_*; SHL uses shl_*; etc.). The master build joins every available data source by player ID, normalizes column types, applies hard-fail checks for required signals, and produces one consolidated master file per league.
Stage 03 Modeling Each pipeline runs its own sequence of modeling scripts, calibrated to that league's data depth.
N
NHL Track
10 phases
Phase 2 — Physicality
Phase 3 — Defense
Phase 4 — Skating
Phase 5 — Offense / Cognition
Phase 5b — Career Mentals
Phase 6 — Faceoffs
Phase 7 — Goalies
Phase 7b — Goalie Mentals
Phase 8 — CA & Roles
M
Minor / Junior Track
3 phases
Skater Modelunified
Goalie Modelunified
Phase 3 — CA & Roles
AHL · ECHL · SPHL
CHL · NCAA · USHL · NAHL
I
International Track
3 phases
Skater Modelleague-aware
Goalie Modelleague-aware
Phase 3 — CA & Roles
SWE · FIN · GER · CZE
SUI · SVK · RUS
The NHL pipeline is the gold standard. Other pipelines mirror its architecture but compress phases when data signal doesn't justify separating them. With 20+ analytical sources, the NHL can isolate physicality from defense from skating from offense as separate modeling phases. With single-source AHLTracker data, the AHL collapses Phases 2-6 into one skater model because the signals overlap too much for separate treatment.
Stage 04 Scouting Overlay Hand-scouted prospect intelligence layers attribute nudges on top of analytics.
Wheeler Pipeline The Athletic prose 471 NHL prospects, 17 categories
EP U23 Tool grades + prose Future Value, 6 tool grades
EP Style Tags All players Sniper, Power Forward, etc.
Hand-Curated CHL/NCAA Top ~215 prospects 9-category nudges
Max-Not-Sum Resolution Multi-source dedup No double-stacking adjustments
What scouting catches that stats can't: release mechanics, edge work, compete level, body language, whether a 19-year-old's stride has another gear coming. Statistical models capture what a player did; scouting prose captures how and why. The overlay nudges modeled attributes by typically 1–2 points (rarely more) up or down based on expert observation.
Stage 05 Cross-Pipeline Reconciliation & Final Merge Players in multiple leagues are reconciled; all pipelines merge into one EHM-ready database.
Multi-League ID Identify overlap players Callups, loans, mid-season trades
Primary League Quality-preferred selection Where they actually played most
Secondary Signals Bounded supplementary nudges Don't throw away other-league data
Final Master Combined output ~14,500 players unified
The "Eklund problem": A Swedish kid who plays 43 SHL games and then comes over for a 12-game AHL audition shouldn't be rated on the AHL's tiny sample. Reconciliation identifies these players, picks the right primary signal source (43 SHL > 12 AHL), and uses the supplementary league as a bounded adjustment rather than a coin-flip override. NCAA-to-NHL signings, AHL callups, European loans — all handled identically.
Stage 06 EHM Database Export Final ratings exported in EHM editor format, ready for database import.
Staff Export EHM editor format All 39+ skater attrs, 20+ goalie attrs
QA Reports Distribution + audit HTML Per-pipeline quality checks
Premier Pivot Rosters Released to community TBL Forum · Steam Workshop · Discord

How the modeling actually shapes ratings

Inside each modeling phase, the same general flow applies. Raw data signals are converted to per-60 rates, normalized within position groups, blended with reliability weighting (small samples regress toward neutral averages), and mapped to the 1–20 EHM attribute scale via normal-CDF distribution shaping.

Raw signal (e.g. Hits/60)
Position-group z-score
Reliability-weighted blend
Normal-CDF mapping
Modeled rating (1–20)
Scouting nudge ±1–2
Final attribute

For the full phase-by-phase breakdown — what data each phase reads, what attributes it produces, what the formulas actually do — see the Pipeline Guide.

Where to dig deeper

This page is the architectural overview. Three places to go from here: