Evaluation of ECG Representations
Must Be Fixed

Zachary Berger ^*

MIT CSAIL & MGH

Daniel Prakah-Asante^*

MIT CSAIL & MGH

John Guttag

MIT CSAIL

Collin M. Stultz

MIT CSAIL & MGH

^* Indicates equal contribution

arXiv

tl;dr: Current evaluation of ECG representations is flawed. We must revise and standardize the field's benchmarking practice to ensure that progress is reliable and clinically meaningful.

Abstract

This position paper argues that current benchmarking practice in 12-lead ECG representation learning must be fixed to ensure progress is reliable and aligned with clinically meaningful objectives. The field has largely converged on three public multi-label benchmarks (PTB-XL, CPSC2018, CSN) dominated by arrhythmia and waveform-morphology labels, even though the ECG is known to encode substantially broader clinical information. We argue that downstream evaluation should expand to include an assessment of structural heart disease and patient-level forecasting, in addition to other evolving ECG-related endpoints, as relevant clinical targets. Next, we outline evaluation best practices for multi-label, imbalanced settings, and show that when they are applied, the literature's current conclusion about which representations perform best is altered. Furthermore, we demonstrate the surprising result that a randomly initialized encoder with linear evaluation matches state-of-the-art pre-training on many tasks. This motivates the use of a random encoder as a reasonable baseline model. We substantiate our observations with an empirical evaluation of three representative ECG pre-training approaches across six evaluation settings: the three standard benchmarks, a structural disease dataset, hemodynamic inference, and patient forecasting.

BibTeX

@misc{berger2026ECGfix,
  title={Position: Evaluation of ECG Representations Must Be Fixed},
  author={Zachary Berger and Daniel Prakah-Asante and John Guttag and Collin M. Stultz},
  year={2026},
  eprint={2602.17531},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2602.17531},
}

Authors

Zachary Berger

Daniel Prakah-Asante

John Guttag

Collin M. Stultz