by Melanie Klein, Beth Jarosz, and Chris Dick
The dataindex.us team is excited to launch the Data Checkup – a comprehensive framework for assessing the health of federal data collections, highlighting key dimensions of risk and presenting a clear status of data well-being.
When we started dataindex.us, one of our earliest tools was a URL tracker: a simple way to monitor whether a webpage or data download link was up or down. In early 2025, that kind of monitoring became urgent as thousands of federal webpages and datasets went dark.
As many of those pages came back online, often changed from their original form, we realized URL tracking wasn’t sufficient. Threats to federal data are coming from multiple directions, including loss of capacity, reduced funding, targeted removal of variables, and the termination of datasets that don’t align with administration priorities.
The more important question became: how do we assess the risk that a dataset might disappear, change, or degrade in the future? We needed a way to evaluate the health of a federal dataset that was broad enough to apply across many types of data, yet specific enough to capture the different ways datasets can be put at risk. That led us to develop the Data Checkup.
Once we had an initial concept, we brought together experts from across the data ecosystem to get feedback on that concept. The current Data Checkup framework reflects the feedback received from more than 30 colleagues.
The result is a framework built around six dimensions:
Historical Data Availability
Future Data Availability
Data Quality
Statutory Context
Staffing and Funding
Policy
Each dimension is assessed and assigned a status that communicates its level of risk:
Gone
High Risk
Moderate Risk
No Known Issue
Together, this assessment provides a more complete picture of dataset health than availability checks alone.
The Data Checkup is designed to serve the needs of both data users and data advocates. It supports a wide range of use cases, including academic research, policy decision-making, journalism, advocacy, and litigation.
Different users will have different tolerances for data risk. A researcher may be most concerned about changes to methodology, while a litigator may be most concerned that a statutory publication deadline was missed. With multiple dimensions of risks assessed, the Data Checkup provides users the flexibility to focus on the risks most relevant to their work.
By surfacing risks early, before data is lost, degraded, or unusable, the Data Checkup helps data users and advocates make informed decisions about using, maintaining, and protecting the federal datasets they rely on.
The Data Checkup Framework
Let’s take a closer look at the Data Checkup framework, exploring each dimension and the levels of risk they can be assigned.
Historical Data Availability
Assessment of the availability of existing (historical) data and resources associated with the data collection.
Gone | Historical data files are no longer publicly available. |
High Risk | Some historical data files are removed. |
Moderate Risk | Some historical data elements are removed. |
No Known Issue | Historical data remain accessible with no known alterations. |
Future Data Availability
Assessment of whether continued data collection and publication are likely and on schedule.
Gone | Data collection and publication has been terminated. |
High Risk | Statutory publication deadline missed and/or collection/publication skipped and/or ICR expired for more than one year. |
Moderate Risk | Typical or intended publication date missed and/or collection/publication delayed; ICR expired up to one year. |
No Known Issue | Data published on time or as expected and ICR active or renewed before expiration. |
Data Quality
Assessment of whether continued data collection is of similar quality to historical data.
Gone | Data collection and publication has been terminated. |
High Risk | Reductions in granularity, timeliness, or frequency. |
Moderate Risk | Potential or emerging risk to granularity, timeliness, or frequency. |
No Known Issue | Maintained or improved granularity, timeliness, or frequency. |
Statutory Context
Assessment of the strength of statutory requirements and authorization, as well as program reliance on continued data collection and publication.
High Risk | Statutory authorization is vague and/or there are alternative data collections that could serve as substitutes and/or no known programmatic use. |
Moderate Risk | Not explicitly required by statute but required for the implementation of a state or federal program and there are no alternative data collections that could serve as substitutes. |
No Known Issue | Statutorily required and/or statutory authorization is explicitly named and is clear what has to be collected and there aren't alternative data collections that could serve as substitutes and/or required for implementation of a federal program. |
Staffing and Funding
Assessment of whether the agency has sufficient staff, budget, and leadership stability to sustain data collection and publication.
Gone | All of the staff in the division or agency are gone and/or all funding has been terminated. |
High Risk | 40% or more of staff lost, and/or 1,000 or more staff lost, and/or budget cut by 20% or more, and/or leadership removed. |
Moderate Risk | 10-39% of staff lost, and/or and/or 500-999 staff lost, and/or budget cut by 10-19%, and/or threatened change in leadership. |
No Known Issue | Less than 10% of staff lost and less than 10% of budget cut and no known change in leadership. |
Policy
Assessment of whether specific policy actions have been or are being taken to alter or discontinue data collection and publication.
Gone | Data collection and publication has been terminated. |
High Risk | Presidential Action-driven ICR; negative policy note on site; other significant changes in accordance with Administration priorities. |
Moderate Risk | Proposed or pending changes; statements by administration officials suggesting a change is being considered or planned. |
No Known Issue | No notable changes since January 2025 affecting what data is collected and published. |
Putting the Framework into Practice
Here you can see the Data Checkup framework applied to a subset of datasets. At a high level, it provides a snapshot of dataset wellbeing, allowing you to quickly identify which datasets are facing risks.

Zooming in on a specific dataset reveals additional information, including citations and explanations for why each dimension was assigned its risk level. This detailed view helps users understand the reasoning behind the assessment.

What’s Next for the Data Checkup
This launch marks the first version of the Data Checkup. We expect the framework to go through multiple iterations as the risk landscape evolves over time.
As the tool evolves, new assessments will be published, the coverage of monitored datasets will expand, and the assessment will be automated (where possible). Data Checkups will be updated quarterly, with additional ad hoc updates in response to major developments.
We welcome your feedback. If there is a dataset you would like to see monitored, or if you have expertise on the health of a dataset, we’d love to hear from you.
Together, we hope the Data Checkup serves as a shared resource for monitoring the health of federal datasets.