_{by Melanie Klein, Beth Jarosz, and Chris Dick}

The dataindex.us team is excited to launch the Data Checkup – a comprehensive framework for assessing the health of federal data collections, highlighting key dimensions of risk and presenting a clear status of data well-being.

When we started dataindex.us, one of our earliest tools was a URL tracker: a simple way to monitor whether a webpage or data download link was up or down. In early 2025, that kind of monitoring became urgent as thousands of federal webpages and datasets went dark.

As many of those pages came back online, often changed from their original form, we realized URL tracking wasn’t sufficient. Threats to federal data are coming from multiple directions, including loss of capacity, reduced funding, targeted removal of variables, and the termination of datasets that don’t align with administration priorities.

The more important question became: how do we assess the risk that a dataset might disappear, change, or degrade in the future? We needed a way to evaluate the health of a federal dataset that was broad enough to apply across many types of data, yet specific enough to capture the different ways datasets can be put at risk. That led us to develop the Data Checkup.

Once we had an initial concept, we brought together experts from across the data ecosystem to get feedback on that concept. The current Data Checkup framework reflects the feedback received from more than 30 colleagues.

The result is a framework built around six dimensions:

Historical Data Availability
Future Data Availability
Data Quality
Statutory Context
Staffing and Funding
Policy

Each dimension is assessed and assigned a status that communicates its level of risk:

Gone
High Risk
Moderate Risk
No Known Issue

Together, this assessment provides a more complete picture of dataset health than availability checks alone.

The Data Checkup is designed to serve the needs of both data users and data advocates. It supports a wide range of use cases, including academic research, policy decision-making, journalism, advocacy, and litigation.

Different users will have different tolerances for data risk. A researcher may be most concerned about changes to methodology, while a litigator may be most concerned that a statutory publication deadline was missed. With multiple dimensions of risks assessed, the Data Checkup provides users the flexibility to focus on the risks most relevant to their work.

By surfacing risks early, before data is lost, degraded, or unusable, the Data Checkup helps data users and advocates make informed decisions about using, maintaining, and protecting the federal datasets they rely on.

The Data Checkup Framework

Let’s take a closer look at the Data Checkup framework, exploring each dimension and the levels of risk they can be assigned.

Historical Data Availability

Assessment of the availability of existing (historical) data and resources associated with the data collection.

Gone	Historical data files are no longer publicly available.
High Risk	Some historical data files are removed.
Moderate Risk	Some historical data elements are removed.
No Known Issue	Historical data remain accessible with no known alterations.

Future Data Availability

Assessment of whether continued data collection and publication are likely and on schedule.

Gone	Data collection and publication has been terminated.
High Risk	Statutory publication deadline missed and/or collection/publication skipped and/or ICR expired for more than one year.
Moderate Risk	Typical or intended publication date missed and/or collection/publication delayed; ICR expired up to one year.
No Known Issue	Data published on time or as expected and ICR active or renewed before expiration.

Data Quality

Assessment of whether continued data collection is of similar quality to historical data.

Gone	Data collection and publication has been terminated.
High Risk	Reductions in granularity, timeliness, or frequency.
Moderate Risk	Potential or emerging risk to granularity, timeliness, or frequency.
No Known Issue	Maintained or improved granularity, timeliness, or frequency.

Statutory Context

Assessment of the strength of statutory requirements and authorization, as well as program reliance on continued data collection and publication.

High Risk	Statutory authorization is vague and/or there are alternative data collections that could serve as substitutes and/or no known programmatic use.
Moderate Risk	Not explicitly required by statute but required for the implementation of a state or federal program and there are no alternative data collections that could serve as substitutes.
No Known Issue	Statutorily required and/or statutory authorization is explicitly named and is clear what has to be collected and there aren't alternative data collections that could serve as substitutes and/or required for implementation of a federal program.

Staffing and Funding

Assessment of whether the agency has sufficient staff, budget, and leadership stability to sustain data collection and publication.

Gone	All of the staff in the division or agency are gone and/or all funding has been terminated.
High Risk	40% or more of staff lost, and/or 1,000 or more staff lost, and/or budget cut by 20% or more, and/or leadership removed.
Moderate Risk	10-39% of staff lost, and/or and/or 500-999 staff lost, and/or budget cut by 10-19%, and/or threatened change in leadership.
No Known Issue	Less than 10% of staff lost and less than 10% of budget cut and no known change in leadership.

Policy

Assessment of whether specific policy actions have been or are being taken to alter or discontinue data collection and publication.

Gone	Data collection and publication has been terminated.
High Risk	Presidential Action-driven ICR; negative policy note on site; other significant changes in accordance with Administration priorities.
Moderate Risk	Proposed or pending changes; statements by administration officials suggesting a change is being considered or planned.
No Known Issue	No notable changes since January 2025 affecting what data is collected and published.

Putting the Framework into Practice

Here you can see the Data Checkup framework applied to a subset of datasets. At a high level, it provides a snapshot of dataset wellbeing, allowing you to quickly identify which datasets are facing risks.

Zooming in on a specific dataset reveals additional information, including citations and explanations for why each dimension was assigned its risk level. This detailed view helps users understand the reasoning behind the assessment.

What’s Next for the Data Checkup

This launch marks the first version of the Data Checkup. We expect the framework to go through multiple iterations as the risk landscape evolves over time.

As the tool evolves, new assessments will be published, the coverage of monitored datasets will expand, and the assessment will be automated (where possible). Data Checkups will be updated quarterly, with additional ad hoc updates in response to major developments.

We welcome your feedback. If there is a dataset you would like to see monitored, or if you have expertise on the health of a dataset, we’d love to hear from you.

Together, we hope the Data Checkup serves as a shared resource for monitoring the health of federal datasets.

Introducing the Data Checkup: A Framework for Assessing the Health of Federal Datasets