What Replaces the Engagement Metric | Measurement Reformation | The Institute for Cognitive Sovereignty

“When a measure becomes a target, it ceases to be a good measure.”
— Charles Goodhart, 1975, on what happens to metrics when they are made into management objectives

Section I

The Engagement Metric and What It Measures

Engagement, as measured by modern advertising-funded platforms, is a composite of behavioral signals: time-on-platform, content interactions (likes, shares, comments, clicks), return sessions, and scroll depth. These signals are aggregated into platform-level engagement metrics — daily active users, monthly active users, time-in-app — that serve simultaneously as product health indicators, advertiser audience quality guarantees, and executive compensation benchmarks.

The metric captures something real. Engagement measures the extent to which a platform's users are present and interacting with its content. This is not a meaningless quantity. High engagement means people are using the product. The problem is not that engagement is uninformative — it is that engagement, as measured, is systematically agnostic about the mechanism by which it is produced. The metric does not distinguish between a session that generated engagement through interest and a session that generated the same engagement through compulsion. The metric does not distinguish between interaction driven by satisfaction and interaction driven by anxiety. It does not distinguish between a user who chose to spend three hours on the platform and a user who spent three hours because the platform's design prevented them from leaving. The number is the same.

This indifference is not incidental — it is structural. The advertising business model requires engagement independent of quality, because the advertising targeting infrastructure requires behavioral data volume independent of how that volume was generated. An anxiety-induced session and a satisfaction-induced session of equal length produce equal quantities of behavioral data for targeting. From the perspective of the advertising revenue model, they are equivalent. The metric that drives platform design is optimized for the advertising revenue model, not for user welfare, because the platform's revenue depends on the advertising revenue model.

The Measurement Crisis series (MC-001 through MC-006) documented how this metric inversion operates across domains — how GDP fails to see welfare, how the BMI corrupts clinical practice, how test scores colonize learning. The engagement metric is the same phenomenon applied to attention infrastructure: the measure has become the objective, and in becoming the objective, it has replaced the underlying thing it was supposed to track.

Section II

Why the Metric Cannot Be Reformed — Only Replaced

A common response to the engagement metric problem proposes to supplement engagement with additional metrics — to add welfare signals to the existing measurement architecture without removing engagement as the primary objective. This approach cannot work, for the same reason that adding a welfare survey to a cigarette consumption maximization business model cannot work. The supplementary metric will be tracked; it will be reported; it will not govern product decisions when it conflicts with the primary metric. And it will frequently conflict with the primary metric, because engagement maximization and user welfare maximization are not the same objective.

The Goodhart's Law dynamic is determinative here. When engagement became the primary metric, it became the objective that platform teams were evaluated against. Features that increased engagement were approved; features that decreased engagement were not. The engagement metric is now embedded in the performance management system that governs every design decision at scale. Adding a supplementary metric does not change the objective function — it adds a reporting requirement that will be satisfied by the data team while the product team continues to optimize for engagement.

This is not speculation. Meta's own internal research, disclosed in the Haugen documents, shows that the company was aware of the welfare harms produced by its engagement optimization, tracked those harms in internal research, and did not change the engagement objective function. Adding welfare signals to the reporting architecture without changing the primary optimization target does not produce different outcomes — it produces better reports.

The conclusion is structural: the engagement metric cannot be reformed within an engagement-maximizing business model. It can only be displaced by an alternative metric that becomes the primary objective. The alternative metrics proposed in Section III are designed to be primary objectives — metrics that, if optimized for, would produce systematically different design incentives than engagement maximization produces.

Named Condition — ICS-2026-MR-001

The Metric Void

The condition in which the dominant performance metric in a behavioral technology system has been structurally decoupled from the welfare of the people whose behavior generates it — producing a measurement architecture that tracks something real (behavioral engagement) while remaining systematically indifferent to whether that engagement represents value or compulsion, satisfaction or harm. The Metric Void is not an absence of measurement; it is measurement of the wrong thing at the right resolution, producing high-fidelity data about a question that does not serve the interests of the measured.

Section III

Four Alternative Metrics

The following four metrics are proposed as replacements for engagement as the primary performance indicator for platforms committed to cognitive sovereignty as a design objective. Each is operationally defined in Section IV. Each is assessed for gaming resistance — the extent to which it could be gamed in ways that produce high metric scores without producing the underlying outcome the metric is meant to measure.

Metric 1: Voluntary Return Rate

Voluntary return rate measures the proportion of sessions followed by a return visit initiated without a platform notification prompt within a defined time window (proposed: 24 hours). A session that ends and is followed by a user-initiated return visit — the user typing the URL, opening the app, clicking a bookmark — counts as a voluntary return. A session that ends and is followed by a return initiated by a push notification does not count as voluntary. The metric captures whether users found sufficient value in a session to choose to return, rather than whether the platform's notification architecture successfully reinstated the seeking loop.

Metric 2: Session Satisfaction Rating

Session satisfaction rating is a self-reported measure collected at natural session endpoints — a brief prompt presented when a user voluntarily ends a session or closes the application. The prompt asks a single question: “How satisfied were you with this session?” on a five-point scale. This is a subjective measure, and its value depends on consistent administration at genuine session endpoints rather than within-session prompts designed to capture positive emotional states. The metric tracks whether users found their time on the platform worthwhile in their own assessment.

Metric 3: Attentional Completion Rate

Attentional completion rate measures the proportion of content items consumed to their natural conclusion versus abandoned mid-consumption. For articles, this is read-to-end versus scrolled-past. For videos, this is watched-to-end versus partially viewed. For podcast episodes, listened-to-completion versus skipped. The metric captures the extent to which users are engaging with content intentionally and sustainedly, as opposed to scrolling rapidly through a feed without sustained engagement with any individual item. High attentional completion rates indicate that content matched user interests at a depth that warranted sustained attention; low rates indicate scroll-optimized engagement patterns.

Metric 4: Post-Session Well-Being Score

Post-session well-being score is a self-reported measure collected at session endpoints, distinct from session satisfaction. The prompt asks: “How do you feel after this session compared to before?” on a five-point scale from “Much worse” to “Much better.” The distinction from session satisfaction is important: satisfaction measures the quality of the experience; well-being measures the user's affective state after the experience. A session can be satisfying (engaging, interesting, stimulating) while leaving the user feeling worse — this is the characteristic profile of doom-scrolling and anxiety-inducing feed consumption. A platform that optimizes for post-session well-being scores would have incentives to design sessions that leave users in better states than they entered.

Section IV

Operational Definitions

The following table provides operational definitions for each proposed metric, specifying the data source required, the technical implementation, and the primary gaming vector — the way a platform committed to the appearance of metric performance without the underlying reality could achieve high scores while producing the same welfare harms as engagement optimization.

Metric	Operational Definition	Data Source	Primary Gaming Vector	Gaming Resistance
Voluntary Return Rate	% of sessions followed within 24h by a return visit not triggered by platform push notification	Session logs + notification trigger logs	Suppressing push notifications to inflate voluntary return classification while deploying email/SMS alternatives	Moderate — requires defining “notification” broadly; auditable via user device telemetry
Session Satisfaction Rating	Mean response (1–5) to “How satisfied were you with this session?” collected at genuine voluntary session endpoints	User self-report at session end	Prompting at emotionally positive within-session moments rather than session endpoints; priming with positive framing	Low — prompt timing and framing are controllable by platform; requires third-party audit of implementation
Attentional Completion Rate	% of content items consumed to natural conclusion (article read-to-end, video watched-to-end) vs. abandoned mid-item	Content consumption telemetry	Optimizing for short-form content that is quick to complete; artificially segmenting long content	Moderate — can be gamed by content format, but format manipulation is visible; requires minimum content length thresholds
Post-Session Well-Being Score	Mean response (1–5) to “How do you feel after this session compared to before?” collected at genuine session endpoints	User self-report at session end	Prompting during emotionally positive post-session windows; excluding sessions ended under negative conditions	Low — same vulnerability as satisfaction rating; requires third-party administration or device-level collection

The two self-report metrics (session satisfaction and post-session well-being) have low gaming resistance when administered by the platform being measured. This is a genuine limitation. The mitigation is third-party administration: the metrics are collected by an independent auditor via device-level tooling rather than by the platform's own application. This requires regulatory mandate or audit framework infrastructure of the kind that the Measurement Reformation paper (MR-004) specifies. In the absence of third-party administration, voluntary return rate and attentional completion rate — which derive from platform behavioral telemetry rather than self-report — are the more robust initial targets.

Section V

Evidence for Each Metric's Validity

Each proposed metric requires a validity argument: evidence that the metric is actually measuring the underlying construct it is intended to track, rather than a correlated but distinct variable.

Voluntary return rate has conceptual validity as a measure of value: a user who returns without a prompt has exercised volition. The behavioral economics literature on prompted versus unprompted behavior (Thaler and Sunstein, 2008; Fogg, 2009) establishes that notification-triggered behavior is categorically distinct from habit-driven voluntary behavior. Platforms that produce higher voluntary return rates have, by definition, produced experiences that users chose to reinstate without external prompting. The construct validity is strong; the measurement validity depends on the operational definition of “voluntary.”

Session satisfaction rating maps to the hedonic well-being literature's established measure of task satisfaction (Kahneman et al., 2004; Diener et al., 1985). The experience sampling method — collecting momentary assessments at behavioral endpoints — has thirty years of validation in both laboratory and field settings. The construct is well-established; the challenge is administration fidelity, not construct validity.

Attentional completion rate maps to the attention and cognitive engagement literature's distinction between shallow and deep processing (Craik and Lockhart, 1972; Carr, 2010). Content consumed to completion is processed more deeply, retained more durably, and associated with greater subjective satisfaction than content scrolled past. The metric is a behavioral proxy for the depth of engagement as opposed to its breadth — tracking whether users are attending or merely exposing themselves.

Post-session well-being score maps directly to the affect research literature on media use and emotional outcomes (Twenge et al., 2018; Verduyn et al., 2015; Shakya and Christakis, 2017). The studies that document negative affective outcomes from social media use are, in effect, measuring the construct that this metric proposes to track routinely. The metric formalizes what the academic research literature has been measuring episodically into a continuous platform-level performance indicator.

Section VI

Commercial Adoption Barriers

None of these metrics will be voluntarily adopted as primary performance indicators by advertising-funded platforms under current market conditions. The reason is structural: the advertising revenue model requires behavioral data volume, which requires engagement, which requires engagement maximization as the objective function. A platform that optimizes for voluntary return rate rather than total engagement time will produce lower total behavioral data volume, which reduces advertising targeting precision, which reduces CPMs, which reduces revenue. The commercial incentive structure selects directly against the proposed metrics.

This is not a barrier that can be overcome through persuasion, voluntary commitment, or ESG pressure alone. The constraint is financial. A public company that adopts voluntary return rate as its primary metric and sees engagement decline will face investor pressure to reverse the change. A private company faces no such constraint but also no competitive pressure to adopt welfare metrics when engagement metrics continue to determine advertising market share.

Counterpoint

Alternative metrics could serve as premium positioning and trust-building instruments

A platform that demonstrably produces higher voluntary return rates and post-session well-being scores could market this differentiation as a premium attribute to both users and advertisers. The advertising industry is increasingly attentive to brand safety and contextual quality; advertisers paying a premium for placement in positive-affect contexts have a financial interest in the existence of platforms that can credibly claim positive-affect optimization. This is the trust-premium argument that the Design Covenant series (DC-005) makes for voluntary commitments generally.

The response is that the trust premium is real but insufficient at scale. A trust premium works for niche platforms serving premium users; it does not work for mass-market platforms competing for advertising budgets that measure CPMs in cents. The trust premium cannot close the revenue gap produced by engagement decline at scale. This is why the Measurement Reformation paper (MR-004) argues that regulatory mandate, not voluntary adoption, is the necessary mechanism.

Section VII

What the Replacement Demands

The replacement of engagement metrics with cognitive welfare metrics demands three things that do not exist in the current institutional environment.

First, it demands operational standardization: the metrics must be defined consistently across platforms so that voluntary return rate at one platform means the same thing as voluntary return rate at another. Without standardized definitions, platforms will implement the metrics in ways that optimize their own scores rather than the underlying construct. Standardization requires institutional authority — a regulatory body, an industry standards organization, or an international body — with the capacity to mandate and audit the operational definitions.

Second, it demands audit infrastructure: the self-report metrics require third-party collection to be meaningful, and all metrics require independent verification of implementation fidelity. This is analogous to the financial audit infrastructure that makes accounting metrics meaningful — the metrics are only as reliable as the audit function that verifies them. No such infrastructure exists for platform welfare metrics. Building it requires the kind of independent assessment body that MR-004 specifies.

Third, it demands regulatory mandate: the commercial adoption barriers documented in Section VI are insurmountable without external constraint. Platforms will not voluntarily adopt metrics that reduce their advertising revenue unless they are required to report those metrics under conditions where failure to improve them carries consequences. The Legal Architecture series (LA-001 through LA-005) specifies the statutory anatomy that could supply this mandate. The Measurement Reformation is the target set those statutes would have to track.

The Metric Void will persist as long as engagement remains the objective function. Specifying the replacements is necessary but not sufficient. The next three papers in this series address what else is required: a composite individual-level index (MR-002), population-level collective health measures (MR-003), and the institutional analysis of what it would take for any of these alternatives to become standard (MR-004).

Sources

Selected Sources

Goodhart, C. (1975). Problems of monetary management: The UK experience. Papers in Monetary Economics. Reserve Bank of Australia.
Kahneman, D., Krueger, A.B., Schkade, D., et al. (2004). A survey method for characterizing daily life experience: The Day Reconstruction Method. Science, 306(5702), 1776–1780.
Verduyn, P., Lee, D.S., Park, J., et al. (2015). Passive Facebook usage undermines affective well-being: Experimental and longitudinal evidence. Journal of Experimental Psychology: General, 144(2), 480–488.
Shakya, H.B., & Christakis, N.A. (2017). Association of Facebook use with compromised well-being: A longitudinal study. American Journal of Epidemiology, 185(3), 203–211.
Haugen, F. (2021). Facebook internal research documents. Submitted to the U.S. Securities and Exchange Commission and released to Congress.
Twenge, J.M., Joiner, T.E., Rogers, M.L., & Martin, G.N. (2018). Increases in depressive symptoms, suicide-related outcomes, and suicide rates among U.S. adolescents after 2010 and links to increased new media screen time. Clinical Psychological Science, 6(1), 3–17.
Craik, F.I.M., & Lockhart, R.S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11(6), 671–684.

← Measurement Reformation Hub

Paper II: The Cognitive Sovereignty Index →

How to Cite

The Institute for Cognitive Sovereignty. (2026). What Replaces the Engagement Metric [ICS-2026-MR-001]. The Institute for Cognitive Sovereignty. https://cognitivesovereignty.institute/measurement-reformation/what-replaces-the-engagement-metric