By Jordan McNea
As the novel strain of coronavirus, Covid-19, has put half the world under some sort of lockdown, infected over a million people, and caused financial ruin, people in power have grossly misrepresented the data provided by epidemiologists and data scientists every step of the way. Whether through malicious intent or pure ignorance, many people in positions of power misled the public during the important early stages of the outbreak when the damage from this pandemic could still have been largely contained. Effective data communication is the hardest part of any data scientists’ job, but that job usually just entails getting managerial support from a limited number of higher-ups. In contrast, getting hundreds of millions, if not billions, of people to understand the severity of this virus, the importance of social distancing, washing hands, and restricting travel—all important elements in flattening the curve—requires even more attention to communicating data. This blog aims to demonstrate ways in which data have been misinterpreted and to give an explanation why those misunderstandings are wrong—and frankly, dangerous to millions of at-risk lives.
For well over 100 years statisticians have suffered from an inability to present their analyses in such a way that it is understood and implemented by non-statisticians: laymen and people in charge. Consulting engineer Willard Brinton describes this problem in his 1914 book Graphic Methods for Presenting Facts when he writes,
“Time after time it happens that some ignorant or presumptuous member of a committee or a board of directors will upset the carefully-thought-out plan of a man who knows the facts, simply because the man with the facts cannot present his facts readily enough to overcome the opposition.”
The intervening decades saw pioneers in the field of data visualization such as Edward Tufte, and the development of the visualization tools like Matplotlib in Python and Ggplot in R. Despite these advances, data scientists still have trouble communicating complex analysis in a way that is both easy to understand and accurately conveys its scientific rigor. Scott Berinato’s January 2019 article, “Data Science and the Art of Persuasion,” refers to this problem as the last-mile problem in that visualization and communication are generally the last steps in an analytical project. According, to Berinato, the last-mile problem arises due to three root causes, which can lead to data being misunderstood.
The first is the “Statistician’s Curse” which gets its name from statisticians spending less time on visual communication and using language that is confusing to those without a stats-heavy background. Next is the “Factory and the Foreman” where a person in a position of power has an idea or project that they want to push through, so that “foremen” will ask the data scientists to create charts and analysis that backup the desired viewpoint. Last is the “Convenient Truth” where the information design team builds charts that simplify complex ideas, which leads to decision-makers drawing the wrong conclusions from analysis.
All three of these have come into play as the coronavirus has engulfed the world.
Factory and the Foreman
As straight-forward as reporting confirmed cases and deaths of coronavirus seems like it should be, some common misconceptions have shaped the conversation around the pandemic. An overreliance on the numbers without acknowledgement of their flaws (including underrepresentation of the true number of cases) has caused people to falsely believe that the outbreak is not as serious as it is.
The days leading up to March 11th when Utah Jazz center Rudy Gobert tested positive for Covid-19 and shut down the entire sports world were marked by sports leagues being combative towards city ordinances that restricted large public gatherings. As public health officials urged teams like the Golden State Warriors to not allow fans to attend games, the Warriors initially refused to comply. The team played a game against the Los Angeles Clippers with fans in attendance on March 10th before agreeing to play in an empty arena starting on March 12th—while citing how this would lead to a multi-million dollar loss. However, they never got the chance to play to the empty arena, as the league was suspended the night before.
At the time, there were only 1,080 confirmed cases and 31 deaths in the United States, but also a serious lack of testing capabilities. On the night of Gobert’s positive test, the NBA was able to procure 58 tests from a private company; the state of Oklahoma, where the Jazz were slated to play, had a testing capacity of only 100 tests per day.
During this time, doctors such as Ashish Jha, the Director of the Harvard Global Health Institute were saying things like, “without testing, you have no idea how extensive the infection is. You can’t isolate people. You can’t do anything.” Leaders were more worried about their profits than the human toll; they relied on confirmed cases during these early stages rather than the insight of epidemiologists who were trying to emphasize that the seriously low testing numbers meant that conclusions based on confirmed cases was likely misleading. This example is a spin on the “Factory and Foreman” issue it and likely set us back weeks in our fight to contain the pandemic.
Another way in which people have misinterpreted the data is by not understanding how exponential growth works. Since the tenth reported death of coronavirus in the United States, deaths have steadily doubled every three days. This exponential growth means that taking a snapshot in time and making judgments solely based on that number—without factoring in how it’ll be completely outdated within days—can lead to terrible decision-making.
On March 26th, former New York City mayor and attorney to President Trump, Rudy Giuliani tweeted, “Approximately, 7,500 people die every day in the United States. That’s approximately 645,000 people so far this year. Coronavirus has killed about 1,000 Americans this year. Just a little perspective.” 
Within days that number of deaths had soared to well above 3,000—the same number of people who perished during the 9/11 terrorist attacks. Giuliani’s client, President Trump, state at the time—as a goal— that between 100,000 – 240,000 people would die—about the same number of American soldiers killed in battle during World War II.,
In a four-day span, the number of possible deaths went from being downplayed to akin to a war. This is a clear example of “Convenient Truth,” where someone in a position of power cherry-picked a statistic and came to their own, wrong conclusion. When faced with an exponential growth curve, this should be a lesson to not take a number and run with it, but to understand that until the trend begins to asymptote, things are going to change fast.
Due to its long incubation period, many asymptomatic carriers, and lack of historical data, creating models to estimate coronavirus death tolls and total cases has proven extremely difficult and requires making assumptions.
The most influential assumption is how seriously the public is following federal guidelines to stay at home and to commit to social distancing. If, like the White House Coronavirus Task Force believe, all federal guidelines are followed closely, then models that predict Covid-19 will kill between 100,000 – 240,000 Americans. On the other hand, experts who believe the guidelines are not being followed closely enough, and that those who are following them will slowly lessen their commitment, predict anywhere between 263,000 – 1.7 million deaths.
That’s a huge difference in estimates, but there’s a possibility that neither is wrong. If President Trump had rejected the advice of epidemiologists and opened up the country on Easter as he had initially planned, then the likelihood of seeing over a million deaths would be much greater. However, Trump eventually backed away from his Easter wish, and as a result, estimated deaths were revised downward.
As it appears that, for the most part, people are taking quarantining seriously and there’s some evidence that reported cases are starting to plateau in places that are the hardest hit, death estimates have dropped even further. This decline should be seen as sliver of good news during bad times. However, because the statisticians developing these models are struggling to explain why their estimates are changing in a way everyone understands, the public has misinterpreted the change in predictions as proof these data scientists and epidemiologists don’t know what they’re doing. This inability of statisticians to cleary explain the models is a classic example of the “Statistician’s Curse.”
While two of the doctors heading up the White House Coronavirus Task Force, doctors Deborah Birx and Anthony Fauci who have over 70 years of medical experience combined, have tried to explain how success in social distancing has caused the model estimates to decrease, members of the media have passed on misinformation, either by ignorance or malintent. One such example happened on April 6th when conservative radio host Bill Mitchell tweeted out to his over 500,000 followers, “People like [Bill] Gates are now calling 200,000 dead a ‘worst case scenario.’ Last week, Birx called it a ‘best-case scenario.’ When will the panic merchants ever be held accountable for their lies?” Attempts like this to smear the names of some of the nation’s leading doctors on infectious disease will likely sew doubt into factions of the public and entice people to break quarantine thinking the pandemic is not as serious as it is.
As epidemiologists attempt to guide the public through this difficult time in world history, the last-mile problem to effectively communicate their findings has proven to be a challenge. Whether due to people who want to believe flawed numbers, pick convenient truths, or misinterpret model predictions, misuse of data and breakdowns in communication have happened every step of the way. This blog provided examples of powerful people wanting an outcome to occur and so they cherry-picked numbers to back up their decisions, they did not understand how exponential growth works, and they attacked the credibility of doctors because they didn’t understand how models change as inputs and assumptions change. As this pandemic will likely impact our lives until a vaccine is made, let’s hope that not only can scientists better explain their data but that also the public, the media, and government officials, improve their ability to accurately understand complicated results.
Boice, Jay. “Best-Case And Worst-Case Coronavirus Forecasts Are Very Far Apart.” FiveThirtyEight, FiveThirtyEight, 2 Apr. 2020, fivethirtyeight.com/features/best-case-and-worst-case-coronavirus-forecasts-are-very-far-apart/.
Brinton, Willard Cope. Graphic Methods for Presenting Facts. Nabu Press, 2010.
Bump, Philip. “Analysis | Here’s How Many Americans Have Already Died to Defeat the Nazis and the Confederacy.” The Washington Post, WP Company, 14 Aug. 2017, http://www.washingtonpost.com/news/politics/wp/2017/08/14/heres-how-many-americans-have-already-died-to-defeat-the-nazis-and-the-confederacy/.
Canales, Katie. “San Francisco Just Banned All Large Gatherings Exceeding 1,000 People, the Latest in a Series of Restrictions to Stem the Coronavirus Outbreak.” Business Insider, Business Insider, 11 Mar. 2020, http://www.businessinsider.com/san-francisco-bans-large-gatherings-1000-people-2020-3#the-world-health-organization-declared-the-disease-a-pandemic-earlier-on-wednesday-1.
“Coronavirus Live Updates: Global Cases Exceed 1 Million.” The New York Times, The New York Times, 3 Apr. 2020, http://www.nytimes.com/2020/04/03/world/coronavirus-news-updates.html.
Grief, Andrew. “Warriors Fans Prepare for ‘Bizarre’ Home Game with Empty Arena.” Los Angeles Times, Los Angeles Times, 12 Mar. 2020, http://www.latimes.com/sports/story/2020-03-11/warriors-fans-prepare-for-bizarre-home-game-with-empty-arena.
Jhaveri, Hemal. “The NBA’s Access to Coronavirus Testing Helped the League Protect Players. The Rest of Us Deserve the Same.” USA Today, Gannett Satellite Information Network, 12 Mar. 2020, ftw.usatoday.com/2020/03/coronavirus-testing-kits-utah-jazz-rudy-gobert-nba-season-postponed.
Kliff, Sarah, and Julie Bosman. “Official Counts Understate the U.S. Coronavirus Death Toll.” The New York Times, The New York Times, 5 Apr. 2020, http://www.nytimes.com/2020/04/05/us/coronavirus-deaths-undercount.html.
Liptak, Kevin, et al. “Trump Says He Wants the Country ‘Opened up and Just Raring to Go by Easter,’ despite Health Experts’ Warnings.” CNN, Cable News Network, 25 Mar. 2020, http://www.cnn.com/2020/03/24/politics/trump-easter-economy-coronavirus/index.html.
Lovelace, Berkeley, and Dan Mangan. “White House Predicts 100,000 to 240,000 Will Die in US from Coronavirus.” CNBC, CNBC, 1 Apr. 2020, http://www.cnbc.com/2020/03/31/trump-says-the-coronavirus-surge-is-coming-its-going-to-be-a-very-very-painful-two-weeks.html.
“New York City Will Only Test Hospitalized Patients for Coronavirus.” NBC New York, NBC New York, 21 Mar. 2020, http://www.nbcnewyork.com/news/new-york-city-will-only-test-hospitalized-patients-for-coronavirus/2338096/.
Valenzuela, Sarah. “Utah Jazz Used More than Half of Oklahoma’s Daily Coronavirus Testing Capacity.” Nydailynews.com, New York Daily News, 14 Mar. 2020, http://www.nydailynews.com/sports/basketball/ny-utah-jazz-oklahoma-coronavirus-20200314-iroayd23q5htrcece466ghiate-story.html.
 Brinton, Willard Graphic Methods of Presenting Facts