Our observation and modelling capabilities for understanding large scale terrestrial evapotranspiration (ET) have advanced in last decades. However, there are still discrepancies between remotely sensed products and land surface models (LSMs), and both cannot be directly validated. We propose to reconcile the observations and simulations of ET through a diagnostic framework composed of an observation‐model‐theory triplet. The Evapotranspiration Temporal VARiance Decomposition (EVARD), a theoretical tool, is used as a benchmark to estimate and diagnose ET variance across the contiguous United States (CONUS) with datasets including hydroclimatic observations, GRACE-based terrestrial water storage, four ET observation products, and four LSMs. Five experiments are systematically designed to evaluate and diagnose possible errors and uncertainties in ET temporal variance estimated by the four observation‐based ET products and the four LSM simulations. We quantify the sources of ET variance from climate and watershed storage components, evaluate the bias and uncertainties by intercomparing multi-source, multi-variable observation products, and diagnose the possible missing processes in LSMs. This study urges advancing hydrologic knowledge by finding congruence among models, data, and theories.