Hitting the Books: America needs a new public data system

We've got to crunch the numbers to form a more perfect union.

Former Senior Editor

Sat, Jul 25, 2020, 11:00 AM·8 min read

Earlier this month the Trump administration stripped the CDC of its control over the nation’s Coronavirus data. By insisting that all case reporting be funneled through the White House, the administration further undermined public trust in its pandemic response and tainted any future release of information with the prospect of having been politicized. But incidents like this are symptomatic of a deeper problem, Julia Lane, a professor of public policy at NYU, explains in her new book, Democratizing Our Data: A Manifesto. She argues that the steady decline in data quality produced by the government that we’ve seen in recent years is not just a threat to our information-based economy but the very foundations of our democracy itself.

In the excerpt below, Lane illustrates the challenges that government employees face when given incomplete or biased data and still expected to do their duties, as well as the enormous benefits we can reap when data is effectively and ethically leveraged for the public good. Democratizing Our Data is already available on Amazon Kindle and will be for sale in print on September 1st.

Nowadays when people have an appointment to go to across town, their calendar app obligingly predicts how long it’s going to take to get there. When they go to Amazon to research books that might be of interest, Amazon makes helpful suggestions—and asks for feedback on how to make its platform better. If they select photos from Google Photos, it suggests people to send them to, prompts with other photos it thinks are like the ones selected, and warns if the zip file is going to be especially big. Our apps today are aware of multiple dimensions of the data they manage for us, they update that information in real time, and suggest options and possibilities based upon those dimensions. In other words, the private sector sets itself up for success because it uses data to provide us with useful products and services.

The government—not so much. Lack of data makes Joe Salvo’s job much more difficult. He is New York City’s chief demographer, and he uses the Census Bureau’s American Community Survey (ACS) data to prepare for emergencies like Hurricane Sandy. He needs to use data to decide how to get older residents to physically accessible shelters—operationally, where to tell a fleet of fifty buses to go to pick up and evacuate seniors. He needs data on the characteristics of the local population for the Mayor’s Office for People with Disabilities. He needs to identify areas with large senior populations to tell the Metropolitan Transit Authority where to send buses. He needs to identify neighborhoods with significant vulnerable populations so that the Department of Health and Mental Hygiene can install emergency generators at Department of Health facilities. But the products produced by the federal statistical system do not provide him with the value that he needs. The most current data from the prime source about the US population, the ACS, is released two years after collection, and that itself reflects five-year moving averages.

Creating value for the consumer is key to success in the private sector. The challenge to statistical agencies is figuring out how to get set up for success and produce high-quality data as measured against the same checklist by providing access to data while at the same time protecting privacy and confidentiality.

The problem is that the checklist for agencies is even longer with additional requirements so that Joe Salvo and his counterparts can do their jobs better. One requirement, given that the United States is a democracy, is that statistics should be as unbiased as possible—so that all residents, whatever their characteristics, are counted and that they are treated equally in measurement. Correcting for the inevitable bias in source data is an important role for statistical agencies. Another requirement is that collecting the data is cost-effective, so that the taxpayer gets a good deal. A third requirement is that the information collected is consistent over time so that trends can easily be spotted and responded to. Agencies need outside help from both stakeholders and experts to ensure all these requirements are met. That requires access to data, which requires dealing with confidentiality issues.

The value that is generated when governmental agencies can straightforwardly provide access and produce new measures can be great. For example, the same people who bring you the National Weather service and its weather predictions—the National Oceanic and Atmospheric Agency, or NOAA—have provided scientists and entrepreneurs with access to data to develop new products, such as predicting forest fires and providing real-time intelligence services for natural disasters in the United States and Canada. City transit agencies share transit data with private-sector app developers who produce high-quality apps that offer real-time maps of bus locations and expected arrival times at bus stops and more.

But other cases, when the government has confidential data, which is the case for most statistical agencies, are different. We need to be able to rely on our government to keep some data very private, but that will often mean that we have to give up on the granularity of government data that are produced. If, for example, the IRS provided so much information about taxpayers that it was possible to know how much money a given individual made, the public would be outraged.

So many government agencies have to worry about two things: (1) producing data that have value and (2) at the same time ensuring that the confidentiality of data owners is protected. This can be done. Some—smaller—governments have succeeded better than others in creating data systems that live up to the checklist of the desired features while at the same time protecting privacy.

Take the child services system as an example. To put child services in context, almost four in ten US children will be referred to their local government for possible child abuse or neglect by the time they’re eighteen. That’s almost four million referrals a year. Frontline caseworkers have to make quick decisions on these referrals. If they are wrong in either direction, the potential downside is enormous: Children incorrectly screened because of inadequate or inaccurate data could be ripped away from loving families. Or, conversely, also as a result of poor data, children could be left with abusive families and die. Furthermore, there could be bias in decisions, leaving black or LGBTQ parents more likely to be penalized, for example.

In 2014, Allegheny County’s Office of Children, Youth and Families (CYF) in Pennsylvania stepped up to the plate to use its internal data in a careful and ethical manner to help caseworkers do their job better. The results have captured national attention, as reported in a New York Times Magazine article. CYF brought in academic experts to design an automatic risk-scoring tool that summarizes information about a family to help the caseworker make better decisions. The risk score, a number between 1 and 20, makes use of a great deal of the information about the family in the county’s system, such as child welfare records, jail records, and behavioral health records, to predict adverse events that can lead to placing a child in foster care.

An analysis of the effectiveness of that tool showed that a child whose placement score at referral is the highest possible—20—is twenty-one times more likely to be admitted to a hospital for a self-inflicted injury, seventeen times more likely to be admitted for being physically assaulted, and 1.4 times more likely to be admitted for suffering from an accidental fall than a child with a risk score of 1, the lowest possible. An independent evaluation found that caseworker decisions that were informed by the score were more accurate (cases were more likely to be correctly identified as needing help and less likely to be incorrectly identified as not needing help), case workloads decreased, and racial bias was likely to be reduced. On the eight-item checklist Allegheny County hit on all items. They produced a new product that was used, was cost effective, and produced real-time, accurate, complete, relevant, accessible, interpretable, granular, and consistent data. And CYF didn’t breach confidentiality. But most importantly, Allegheny County worked carefully and openly with advocates for parents, children, and civil rights to ensure that the program was not built behind closed doors. They worked, in other words, to ensure that the new measures were democratically developed and used.

The Allegheny County story is one illustration of how new technologies can be used to democratize the decision of how to balance the ever-present tradeoff between the utility of new measurement against the risk of compromising confidentiality. They took advantage of the potential to create useful information that people and policy makers need while at the same time protecting privacy. That potential can be made real in other contexts by making the value of data clearer to the public. While that utility/cost tradeoff has typically been made by a small group of experts within an agency, there are many new tools that can democratize the decision by providing more information to the public. This chapter goes into more detail about the challenges of and new approaches to the utility/cost tradeoff. There are many lessons to be learned from past experiences.

This article contains affiliate links; if you click such a link and make a purchase, we may earn a commission.

Engadget
ISPs are fighting to raise the price of low-income broadband
Internet service providers are objected to the lower rates they need to offer lower income customers if they want to obtain government funds from a new Internet access program.
Engadget
Amazon is giving The Boys the prequel treatment
The cast and crew of Amazon's The Boys announced a bunch of new spinoffs for the supe action series.
Engadget
You can date everything in Date Everything!
Date Everything! is an upcoming dating sim game that lets you date evert
Engadget
The Bioshock movie is still happening but with a reduced budget
The Bioshock movie is still happening, but with steep budget cuts. It’s being reconfigured to become a ‘more personal’ film.
Engadget
Warner Bros. Discovery sues the NBA in a last-ditch effort to block Amazon’s new streaming package
Warner Bros. Discovery followed through on its threat to “take appropriate action” against the NBA for rejecting its broadcasting rights offer. On Friday, the media company sued the league after the NBA turned down its bid to match Amazon’s streaming package.
Engadget
Apple’s M3 MacBook Air with 16GB of RAM is $200 off right now
Apple’s M3 MacBook Air combines Apple’s lightest and thinnest laptop design with the cutting-edge horsepower of the latest Apple silicon chip. You can get the 2024 model on sale for $200 off right now.
Engadget
Here's how to stop Grok's AI models using your tweets for training
X automatically opted users into letting Grok's AI models train on their tweets and interactions with the chatbot. Here's how to opt out.
Engadget
The 10th-generation iPad is back down to $300, plus the rest of this week's best tech deals
The week after Amazon's Prime Day can be a bit sleepy for deals, but we still found a few decent discounts on gear we've tested and recommend.
Engadget
The 65-inch LG C3 OLED TV is nearly half off for today only
The 65-inch LG C3 OLED TV is nearly half off for today only. That brings the set down to a record low of $1,300.
Engadget
NASA's Perseverance rover found a rock on Mars that could indicate ancient life
A Martian rock sample collected by Perseverance contains "chemical signatures and structures" that could've been formed by ancient microbial life from billions of years ago.
Engadget
Apple agrees to stick by Biden administration's voluntary AI safeguards
Apple has joined more than a dozen other tech companies in signing up for the Biden administration's voluntary AI code of practice.
Engadget
North Korean who used ransomware to attack US healthcare providers has been indicted
A grand jury in Kansas City has indicted Rim Jong Hyok, a North Korean intelligence operative who allegedly used ransomware to attack health providers' systems in the US.
Engadget
Samsung Galaxy Ring review: A bit basic, a bit pricey
The Galaxy Ring is comfortable and seemingly basic, but actually delivers detailed insight on your sleep, walks and runs.
Engadget
Apple's 14-inch MacBook Pro laptop with an M3 Pro chip is $300 off at Amazon
Apple's well-specked 14-inch MacBook Pro with an M3 Pro chip, 18GB of memory and 512GB of storage is on sale for the lowest price we've seen yet at Amazon.
Engadget
Gran Turismo 7's more realistic physics update is launching cars into orbit
Gran Turismo 7's latest update is causing some bizarre problems, making cars bounce violently or launch completely into the air.
Engadget
The Morning After: OpenAI reveals its AI-powered search engine, SearchGPT
The biggest news stories this morning: AI video startup Runway reportedly trained on ‘thousands’ of YouTube videos without permission, The best cameras for 2024, WhatsApp hits 100 million monthly active US users.
Engadget
The best fitness trackers for 2024
Here's a list of the best fitness trackers you can buy, as chosen by Engadget editors.
Engadget
The best cameras for 2024
Here's a list of the best cameras you can buy, as chosen by Engadget editors.
Engadget
X's Grok chatbot is misleading voters about the presidential election
Grok's AI chatbot claims that President Biden's name must stay on the ballot in nine states, a claim that is categorically false.
Engadget
Comic-Con leak sparks rumors of two remastered Soul Reaver games
A photo from Comic-Con has leaked possible remasters of two Soul Reaver games from Crystal Dynamics.

Hitting the Books: America needs a new public data system

We've got to crunch the numbers to form a more perfect union.

Latest Stories

ISPs are fighting to raise the price of low-income broadband

Amazon is giving The Boys the prequel treatment

You can date everything in Date Everything!

The Bioshock movie is still happening but with a reduced budget

Warner Bros. Discovery sues the NBA in a last-ditch effort to block Amazon’s new streaming package

Apple’s M3 MacBook Air with 16GB of RAM is $200 off right now

Here's how to stop Grok's AI models using your tweets for training

The 10th-generation iPad is back down to $300, plus the rest of this week's best tech deals

The 65-inch LG C3 OLED TV is nearly half off for today only

NASA's Perseverance rover found a rock on Mars that could indicate ancient life

Apple agrees to stick by Biden administration's voluntary AI safeguards

North Korean who used ransomware to attack US healthcare providers has been indicted

Samsung Galaxy Ring review: A bit basic, a bit pricey

Apple's 14-inch MacBook Pro laptop with an M3 Pro chip is $300 off at Amazon

Gran Turismo 7's more realistic physics update is launching cars into orbit

The Morning After: OpenAI reveals its AI-powered search engine, SearchGPT

The best fitness trackers for 2024

The best cameras for 2024

X's Grok chatbot is misleading voters about the presidential election

Comic-Con leak sparks rumors of two remastered Soul Reaver games

About

Sections

Contribute

Buying Guides