How to Lie with Statistics

There are three kinds of lies: lies, damned lies, and statistics. - Disraeli.

The book "How to Lie with Statistics", first published in 1954, is an attention grabbing title for what should have really been called "A Guide to Spotting Statistical Trickery"

This book exposes common statistical manipulations used to mislead audiences. The author explains in a light-hearted way through its ten chapters how to critically evaluate statistical claims and avoid being deceived by distorted data.

In this age of both human & AI driven misinformation, this book is more relevant than ever.

Chapter 1: The Sample with the Built-in Bias

Trick: Presenting data based on a biased sample that doesn't accurately represent the population it claims to reflect. For example, using a sample of Yale graduates with known addresses to determine the average income of all Yale graduates. This method excludes those who may not be as financially successful, resulting in an inflated average.

How to Tackle It:

  • Scrutinise the sample selection process to avoid Sampling Bias. Look for any factors that might have skewed the sample, such as excluding certain groups or relying on self-reported data that could be biased. 
  • Consider the source's motivations. Who conducted the study and what might they have to gain from presenting the data in a particular light?

Chapter 2: The Well-Chosen Average

Trick: Using the word "average" without specifying the type of average: mean, median, or mode. This ambiguity allows manipulators to choose the average that best suits their agenda, even if it's not the most representative measure. For instance, reporting the mean wage of employees can hide income disparities if a few high earners skew the results.

How to Tackle It:

  • Always clarify the type of average used. If it's not specified, be skeptical and consider whether the chosen average might obscure the true distribution of data.
  • Request additional information about the data spread. Ask about the range, distribution, and median to gain a more comprehensive understanding of the numbers.

Chapter 3: The Little Figures That Are Not There

Trick: Omitting crucial information that could impact the interpretation of the data. This includes neglecting to mention the sample size, margin of error, or the range of values. For example, citing an "average" family size of 3.6 persons without acknowledging the wide range of family sizes in reality.

You can freeze or roast if you ignore the range

How to Tackle It:

  • Always look for the sample size and measures of significance. A small sample size or a lack of statistical significance can render the results unreliable.
  • Consider the range of values. Focus not just on the average but also on the spread of the data to understand the variability within the sample.
Chapter 4: Much Ado About Practically Nothing

Trick: Highlighting statistically significant differences that are so small they have little practical relevance. This often involves exaggerating the importance of minor variations to support a specific claim. The case of Old Gold cigarettes claiming to have the least amount of harmful substances, even though the differences were negligible, exemplifies this tactic.

How to Tackle It:

  • Assess the practical significance of the difference. Don't be swayed by statistically significant findings if the differences are too small to matter in real-world applications.
  • Consider the context and potential biases. Is the emphasis on small differences driven by a particular agenda or marketing strategy?

Chapter 5: The Gee-Whiz Graph

Trick: Manipulating graphs to create a misleading visual impression. This frequently involves truncating the graph by removing the zero line, which makes a small increase appear much larger. For instance, Newsweek's graph showing "Stocks Hit a 21-Year High" by truncating the graph at the 80 mark.

How to Tackle It:

The same data can be represented in different ways to suit the agenda

  • Always check for a zero line on graphs. If it's missing, the visual representation of the data could be exaggerated.
  • Pay attention to the scale and proportions of the graph. Look for distortions in the axes that might create a misleading impression of the trend.

Chapter 6: The One-Dimensional Picture

Trick: Using images to represent data in a way that distorts the actual numerical relationships. This often involves changing the size of images disproportionately to exaggerate differences. For example, depicting wages using moneybags, where one is twice the height and width of the other to represent a two-to-one ratio, creating a misleading visual impression of a four-to-one difference.

One of the furnaces is made to look about three times as big as the other though the numbers don't have that proportion

How to Tackle It:

  • Be wary of images used to represent numerical data. Consider whether the visual representation accurately reflects the underlying numbers.
  • Focus on the actual numerical values. Don't be swayed by the size or prominence of images; instead, pay attention to the accompanying figures.

Chapter 7: The Semiattached Figure

Trick: Presenting accurate data that is irrelevant to the claim being made. This involves linking a statistic to a conclusion that it doesn't actually support. For instance, claiming an antiseptic is effective because it kills germs in a test tube, without proving its effectiveness in the human body.

How to Tackle It:

  • Assess the relevance of the data to the claim. Does the statistic genuinely support the conclusion being drawn?
  • Scrutinise percentages and their bases. Are percentages presented in a way that misrepresents the actual situation or uses a misleading baseline?

Chapter 8: Post Hoc Rides Again

Trick: Confusing correlation with causation. This involves assuming that because two things are related, one must cause the other. For example, stating that cigarette smokers have lower college grades without considering other potential factors like personality traits or study habits.

How to Tackle It:

  • Remember that correlation does not imply causation. Just because two variables are related doesn't mean one causes the other.
  • Consider alternative explanations and the direction of causality. Explore other factors that might contribute to the correlation and whether the cause-and-effect relationship might be reversed or more complex.

Data dredging is the failure to acknowledge that the correlation was in fact the result of chance.

Chapter 9: How to Statisticulate

Trick: Using statistical techniques to intentionally mislead the audience. This can involve manipulating data, choosing biased samples, or employing deceptive visuals. 

How to Tackle It:

  • Be sceptical of any statistical claim, especially from interested parties. Consider the source's motivations and potential biases.
  • Examine the methodology and data sources. Look for any signs of manipulation, such as cherry-picking data or using inappropriate statistical techniques.

Chapter 10: How to Talk Back to a Statistic

This chapter provides a framework for critically evaluating statistical claims by asking five key questions:

  • Who Says So? Identify the source and consider their potential biases and motivations.
  • How Do They Know? Examine the methodology, data collection process, and sample size to assess the reliability of the data.
  • What's Missing? Look for omitted information, such as the margin of error, range of values, or relevant comparisons, that could impact the interpretation of the data.
  • Did Somebody Change the Subject? Check for a switch between the raw data and the conclusion, ensuring that the presented statistic actually supports the claim being made.
  • Does It Make Sense? Apply common sense and logic to assess the plausibility of the claim and consider whether the statistic aligns with your own observations and understanding.

By applying these questions and the lessons learned throughout the book, readers can become more discerning consumers of statistical information and avoid being misled by manipulated data.

Comments

Popular posts from this blog

Maven Crash Course - Learn Power Query, Power Pivot & DAX in 15 Minutes

"Data Prep & Exploratory Data Analysis" course by Maven Analytics

AWS vs Azure vs GCP