How to Lie with Statistics – Misleading Ways to Use Numbers

Diana Ploscaru

from Techreport on 2024-07-18 10:48 (#6PAA4)

Factual-Procrastination-Statistics-and-T

The internet makes it easier than ever to find useful information. Unfortunately, it's also easy to find half-truths and lies. Learning about how to lie with statistics will show you the most common methods to misrepresent data. Next time you see it, you'll know what to look for.

As the saying goes, A half-truth is the best lie.' As you'll see, the top methods for how to lie with statistics typically rely on factual data, which makes it all the more difficult to spot misinformation.

So, let's see how to lie with statistics, and let's dissect some real-life examples. This way, you'll be better prepared to separate fact from fiction.

In This Guide

How to Lie with Statistics - Top Ways to Misrepresent Data
Logical Fallacies When Using Statistics
1. Correlational Fallacies
2. The Base Rate Fallacy
What Information Should You Trust?
1. Comparing Levels of Evidence
In Conclusion
References
- How to Lie with Statistics - Top Ways to Misrepresent Data
- Logical Fallacies When Using Statistics
  1. Correlational Fallacies
  2. The Base Rate Fallacy
- What Information Should You Trust?
  1. Comparing Levels of Evidence
- In Conclusion
- References How to Lie with Statistics - Top Ways to Misrepresent Data
  There are multiple techniques for how to lie with statistics. The numbers don't have to be fabricated, either. As you'll see, we can use factual data but still twist the truth either willingly or by mistake.
  We can do this in two ways - either by misrepresenting the data itself or by using flawed reasoning to interpret findings. First, let's see how to lie with statistics by misrepresenting the numbers.
  Misusing Averages'
  In everyday speech, we often say average' to denote a typical value representative of a bigger data set. But in a mathematical sense, average' could refer to a number of values, primarily the mean or the median.
  Each of these averages' could represent different numbers, so the distinction is important.
  Here's how each is calculated:
  1The mean adds all the values in a set and divides the total sum by the count of values.
  For example, the average of the numbers {2, 30, 40, 300} added together and divided by 4 equals 93.
  2. The median arranges the values in a set from smallest to largest and takes the middle value based on the total number of values.
  In a set of 5 values, the median would be the third value in the middle. If the set has an even number of values, we must calculate the mean value of the two middle numbers. For example, the median of {2, 30, 40, 300} is (30 + 40) / 2, which equals 35.
  The mean and median averages can vary considerably depending on the range of values in the total data set. Opting to use one over the other can skew the readers' perception, even if the numbers being used are factually correct.
  Using the mean as an average' could be especially misleading if the data set contains major outliers.
  This often happens when reporting average salary figures, for example. Including the salaries of the top earners can skew the mean value considerably, making it seem like the average' salary is higher than it actually is.
  Australia Institute Research
  Say you have a group of 10 people. Two people earn $20k/year, six people earn $70k/year, and two people earn $300k/year.
  Calculating the mean value would give us an average' salary of $106k/year. But looking at the actual salary distribution in the data set, you wouldn't say the average person in the group earns six figures a year when the majority earn $70k/year.
  That's why using the median is preferable when dealing with outliers.
  Real Example
  PolitiFact shows in a 2024 article how average' salary figures can be misleading. In an April 8 speech, president Joe Biden stated the average salary in the semiconductor field is $110,000.
  PolitiFact (Instagram)
  There's some truth here; the average' (mean) salary in the semiconductor industry is over $100k/year. However, this figure includes all the jobs within the industry, including those high-paid ones that do require a degree.
  According to PolitiFact, people without a college degree in the semiconductor industry earn around $40k/year. Those with an associate degree earn up to $70k, while those with graduate degrees make up to $160k.
  The highest salaries skew the mean average higher, so the numbers don't tell the whole story. In this case, the industry's average' salary doesn't actually apply to the average worker.
  Conflating Percentages
  Conflating percentage changes and percentage point changes is an equally inconspicuous way to paint different pictures using the same data.
  Here's how they differ:1 The percentage change measures the relative change between two values and expresses the result as a percentage of the original value.
  For example, if the share of unemployed people in a given city went from 4% to 8%, that would represent a doubling of the original value - a +100% percentage change. If the share went from 4% to 2%, that's a halving of the original value, so a -50% percentage change.
  2 The percentage point change measures the absolute difference between two percentages.
  Using the percentage point change in the previous scenario, if the share of unemployed workers went from 4% to 8%, that would yield a +4% percentage point increase. If the share of unemployed workers halved from 4% to 2%, that's a -2% percentage point change.
  It would be equally correct to use either 100% or 4% to report on the example above. But each number could have a different impact on readers, with bigger numbers being more attention grabbing.
  Using the relative percentage change could overstate the magnitude of a given change, especially when the absolute change is otherwise small.
  If we don't know the absolute percentage point difference, we're left wondering about the change's true impact.
  Real Example
  Using the percentage change is common in medical reporting and is not meant to mislead the average reader.
  The relative value change provides a standardized way to compare and communicate the effects of interventions across different studies or populations.
  However, you can imagine how the title below could make the findings seem much worse to the layperson if we don't know the actual baseline risk and the absolute percentage point change in risk.
  Medical reporting typically uses relative risk figures
  Let's say the absolute risk of Type 2 diabetes in the general population is 3%. Then, a 50% relative risk increase could bump the risk of developing diabetes to 4.5% (a +1.5% percentage point increase). One percentage sounds more worrisome than the other.
  In other cases, using the percentage change can mislead readers and sway public opinion on a given topic.
  A 52% increase in murder rate sounds disastrous, but what would this actually mean for the average Tallahassee inhabitant?
  According to PolitiFact, this 52% increase refers to the murder rate between 2002-2009 before Andrew Gillum became mayor, and 2010-2017, throughout his tenure. In 2002-209, the murder rate in Tallahassee was 4.6 murders per 100,000 citizens.
  From 2010 to 2017, the murder rate increased to 7 murders per 100,000 citizens. This translates to a percentage change of 52%.
  However, this doesn't mean the absolute murder rate in Tallahassee is now over 50%. The murder rate would still be under 0.1 per 1,000.
  Using Misleading Graphs
  Numbers can tell us a lot about a topic, but pictures help us better visualize the data. Graphs also help us communicate information in a quick and easy-to-read format.
  You don't even have to read graphs because the visuals readily show you the information at a glance. Except, the format of a graph may not always be so straightforward. Sometimes, graphs can give you the wrong idea until you look closer.
  There are two main ways to create a misleading graph, even while using the correct data:
  Using Unlabeled Graphs
  Unlabeled graphs omit critical information needed to accurately interpret the data. Without all the context surrounding the visuals, we can't actually understand the meaning or relevance of the information. Consider this example:
  An unlabeled chart showing the change in average temperature'
  Here, we have a graph that tells us basically nothing except that the average temperature is going up. Is this increase something to worry about?
  We can't tell, because we don't know the timeframe or real magnitude of the change, so we can't draw an informed conclusion from this picture.
  However, the information used to plot this graph is factual. We used the average high temperature in Washington, D.C., from January to June to create this graph.
  The vertical axis represents the temperature change, and the horizontal axis represents each of the six months. With this information, we now know this chart shows us a seasonal fluctuation in temperature.
  Using Graphs Not Starting at 0
  Graphs measure value changes on a scale, typically starting from 0 as a reference point.
  When graphs don't start at 0, this can distort the proportions of bars and lines and exaggerate the visual change between two values. This distortion can happen even when the absolute change is very small.
  Starting the graph at a different reference point can make even minute differences seem like exponential changes.
  Bonus points if the graphs use fancy visuals, which can sometimes lead to unintentionally hilarious results like this:
  A bar graph of average female height, starting at 5 ft as a reference point
  More Than My Height (via Reddit)
  This graph starts at 5'0'' as a reference point, and this distorts the absolute difference between the heights on the chart. This is how a 5-inch difference makes Indian women appear significantly shorter than Latvian women, to say the least.
  Here's how the chart would look like if starting from 0:
  Bar graph of average female height, starting at 0 as a reference point. Like this infographic? Feel free to use it on your website or blog, but please remember to give us credit
  As you can see, the visual difference between the bars is much smaller now. Note that the vertical axis represents the same heights as before, but in centimeters.
  Real Example
  Below, you can see an example of a graph that doesn't start at zero. To make things worse, the chart's y-axis is also unlabeled, so we can't see at first glance where the data range starts or stops.
  Media Matters
  This chart shows us the difference in the number of people with a job and people on welfare. The problem is that the graph makes a 6.8% difference look like a 500% difference because of the distorted scale.
  For more information about this graph and similar blunders, you can check out Media Matters' content here.
  Cherry Picking Information
  Cherry picking' refers to the act of using only the information that supports one's point, leaving out the bigger context and contradictory data.
  While the information presented might be true by itself, we can still arrive at erroneous conclusions if we don't know the bigger picture.
  Cherry-picking can occur at any point when working with data, including when collecting, analyzing, and referencing data to draw conclusions.
  How It WorksLet's say we wanted to show that protein powder improves heart health, so we compared two groups of people - those who do and those who don't consume protein powder.
  We chose to only look at those who consume protein powder while also going to the gym. This would be a sampling bias because gym goers typically partake in other heart-healthy habits, like regular exercise and eating a balanced diet.
  Our analysis might show that people who use protein powder have better heart health. But we can't tell if the results are due to the protein or other differences between gym goers and the broader population.
  Another way to support our premise is to select only the studies that show protein powder is heart-healthy.
  For example, we could have 10 studies that show protein powder has no effects on heart health and 2 studies that say protein powder improves heart health. We then reference only the latter two studies, even though the majority of the data points in a different direction.
  Real Example
  One of the best examples of data cherry-picking is the infamous chocolate hoax study, which found that eating chocolate helps with weight loss.
  However, according to study lead author John Bohannon, the study design was intentionally shoddy, and the research team selectively picked positive findings to support the benefits of chocolate.
  As explained by the author, if you use a small sample of people and measure a large number of metrics, it's almost guaranteed to find a noteworthy result by chance.
  The same concept applies to any other topic where zooming in or out on a data set alters the findings. This often happens in discussions about climate change, for example.
  Changes in global mean sea level across different timespans
  Using the same data but focusing only on a limited timespan can give the illusion of a stable trend. While a statement might be true, it could still misrepresent the wider picture.
  Using A Semi-attached Figure'
  A semi-attached figure' is an unrelated piece of information cited to support a claim. Two topics might seem connected at first glance, but closer inspection shows the cited data doesn't prove the point.
  If you can't prove what you want to prove, demonstrate something else and pretend that they are the same thing.Darrell Huff, author of How to Lie with Statistics'
  Semi-attached figures can be difficult to spot because they make intuitive sense. As a general rule, when someone uses data from a study on one topic to try to prove a different point, that's a semi-attached figure.
  What It Looks Like
  Imagine a random commercial promoting a biotin-infused shampoo. The ad starts off by mentioning the importance of biotin, truthfully claiming that biotin is an essential vitamin that plays a role in hair health.
  Then, the ad claims their new shampoo formula contains biotin. Based on this information, we might be tempted to infer this shampoo helps us achieve healthy-looking hair. Both statements in the ad are true, but our conclusion might be incorrect.
  While biotin is an essential nutrient for hair health, the information cited doesn't prove that topical application of biotin-infused shampoo makes hair healthier. Maybe the shampoo works; maybe it doesn't. Based on the data provided, we can't support either conclusion.
  Real Example
  In 2009, an ad for Kellogg's Frosted Mini-Wheats cereal claimed the product could boost children's attentiveness by nearly 20%. The claim was based on a study commissioned by Kellogg, which showed that children who ate the cereal improved their attentiveness.
  VINnews
  The problem is the data didn't actually support the claim. The cited study didn't compare Kellogg's cereal to other cereal or other breakfast foods. In fact, the control group in the study received no breakfast at all.
  After being sued and reaching a settlement, Kellogg agreed to drop the 20% claim. However, the company would still be able to claim proof that kids who eat a filling breakfast (like mini-wheats) are more attentive than kids who skip breakfast.
  A similar case occurred with Coca-Cola's VitaminWater advertising a few years back when the company made product health claims based on the vitamin content in the beverage.
  Logical Fallacies When Using Statistics
  Logical fallacies are common reasoning errors used when building arguments. The data we use could be factual, and the resulting argument might sound convincing, but the conclusion is logically flawed.
  Logical fallacies can be innocent mistakes but equally insidious as intentionally misrepresenting data. Let's see some of the most common logical fallacies and how to lie with statistics using them.
  Correlational FallaciesA correlational fallacy implies deducing a cause-and-effect relationship between two elements only because they appear to be associated.
  For example, we might notice that people who own ashtrays are more likely to get lung cancer. In this scenario, ashtray ownership and lung cancer rates are correlated, but it would be factually wrong to claim one causes the other.
  In this case, smoking is the third factor that influences both ashtray ownership and increased rates of lung cancer.
  Reverse causation is another type of correlational fallacy where two events are correlated, but the cause and effect are reversed. The observed effect is assumed to be the cause and vice versa.
  For example, there's a correlation between painkiller use and chronic headaches because people with chronic headaches take painkillers to relieve the pain. Saying that painkillers cause headaches would be a reverse causation fallacy.
  The Post Hoc Fallacy
  The post hoc fallacy is very similar to correlational fallacies. It implies a cause-and-effect conclusion between two events that occur one after another.
  It sounds something like this- Event A happened, then event B happened. Therefore, A caused B. Here's how it would look in practice...
  Sugar consumption in the US started going down in 2000. A few years later, the obesity rate went up. Therefore, eating less sugar made Americans put on weight. And we have a fancy-looking chart to show you the correlation.
  Example of a false correlation between lower sugar consumption and increasing obesity rates
  Like this infographic? Feel free to use it on your website or blog, but please remember to give us credit
  Maybe we should all start eating more sugar to prevent weight gain. That's obviously not the case. This just goes to show that two things may not be causally related, even if they happen in close succession.
  Real Example
  We could easily find correlations to support any claim, including the opposite of what we know to be true. For example, the smoker's paradox or the obesity paradox are both based on correlations between smoking or overweight status and better health outcomes.
  Advocates use the obesity paradox to support the Health At Every Size movement Source: Lindo Bacon
  More studies found a correlation between lower weight and increased mortality, particularly in heart failure patients.
  These findings are sometimes interpreted as people who are overweight are less likely to die than normal weight people.' or even Overweight people live longer.'
  But the correlation between higher BMI and better survival doesn't prove that being overweight causes better outcomes. In fact, this might be a case of reverse causation, where lower weight is seen as a cause rather than a consequence of poorer health.
  The Base Rate Fallacy
  The base rate fallacy is a reasoning error where someone focuses primarily on isolated information (e.g.: an event occurring in a specific case) and ignores the general picture (the base rate of an event occurring at large).
  For example, let's say you have to flip a coin 10 times, and there's a 50/50 baseline chance that a coin toss will land on either heads or tails.
  Your coin lands on heads 7 times in a row, so you might conclude there's a higher chance the next 3 flips will also be heads. But the base rate is still the same, so you still have a 50% chance to land tails.
  Real Example
  The following statistics were used to show that the vaccinated population in the UK suffered more deaths compared to the unvaccinated population. This is factually true, but the assertion that the vaccine is ineffective or dangerous would be incorrect.
  Source: Facebook
  According to Reuters, far more people in the UK had taken the vaccine. This means that vaccinated people far outweigh the unvaccinated ones, so the statistic ignores the base rate of vaccinated people in the total population.
  Since the vaccinated population is much higher to begin with, more deaths are expected in this group. Adjusting for group size would give a fairer comparison.
  When comparing the mortality rate per 100,000 people, the mortality rate in the unvaccinated population was higher.
  What Information Should You Trust?
  It's always best to go straight to the source and get the facts yourself instead of taking somebody's claims for granted. This is easier said than done.
  All studies have limitations that can affect results; credible sources acknowledge these shortcomings.
  Although the scientific method helps us make sense of the world around us, our conclusions are only as accurate as the data they're based on. Someone can correctly use a source that supports their claim, but the facts can still be wrong.
  Data is simply inaccurate sometimes. That's why we see contradictory information on the same topic and why you could find sources to support any idea.
  
  But there are ways to better understand it all. The evidence hierarchy helps us prioritize information and understand where different findings fit into the larger context.
  The evidence hierarchy explains which data provides the strongest evidence
  FACSIAR, NSW Government
  As a general rule, information from studies higher up on the evidence hierarchy is more reliable. This factsheet from FACSIAR explains how the different study types compare.
  The studies at the base of the pyramid provide weaker evidence and carry more limitations than the studies at the top. This doesn't mean they're guaranteed to be wrong, but it's more likely.
  Comparing Levels of Evidence
  Case studies are an example of weaker' evidence. They're low on the evidence pyramid because they are more likely to be influenced by unintentional bias.
  Such studies typically involve smaller groups of participants and rely on methods like interviews or observations, which can have limitations that can impact study findings.
  Smaller groups may not adequately represent the broader population, so the findings could be biased by chance. Data collection methods like surveys can also lead to inaccurate findings because of the participant's subjective interpretation of survey questions.
  Small study findings don't always pan out when repeated on larger populations. However, they provide insight into potential trends and play a crucial role in generating new hypotheses for future research.
  Meta-analyses are at the other end and provide the strongest level of evidence, so their findings are generally more reliable.
  A meta-analysis pools together multiple studies that address the same question, which boosts the overall sample size and lowers the risk of biased results.
  Each meta-analysis can encompass hundreds or even thousands of studies, offering a thorough summary of the prevailing findings on a research topic.
  Thanks to this, they give us a quick overview of the scientific consensus and help us avoid confirmation bias when faced with contradictory information.
  Confirmation bias occurs when we favor information that supports our opinion, despite evidence to the contrary
  But even these comprehensive analyses aren't perfect. The quality of a meta-analysis depends on the quality of the studies in it. A collection of predominantly low-quality data will give us misleading results, a phenomenon often referred to as garbage in-garbage out.'
  The key takeaway is that no research method is perfect, and science can get it wrong sometimes. But those truly seeking knowledge will change their opinions when mounting evidence proves them wrong, rather than clinging to incorrect beliefs for personal comfort.
  In Conclusion
  Understanding how data can be manipulated helps us discern truth from fiction. The most common methods include misleading average figures, conflating percentage changes, and using distorted graphs or cherry-picked information.
  But it's also important to look beyond surface-level statistics and consider the context and methodology behind the data. Even seemingly credible numbers and studies can lie, so it's best to do our due diligence instead of taking facts' at face value.
  Ultimately, critical thinking and a skeptical approach to statistics will help us navigate the sea of information and avoid being deceived by interest groups or false advertising.
  ReferencesClick to expand and view sources
  - Is Biden right that you don't need a college degree to make $110,000 in the semiconductor field? (PolitiFact)
  - Florida PAC levels scary and misleading claim about Tallahassee's murder rate under Andrew Gillum (PolitiFact)
  - Dishonest Fox Chart Overstates Comparison Of Welfare To Full-Time Work By 500 Percent (Media Matters)
  - Study showing that chocolate can help with weight loss was a trick to show how easily shoddy science can make headlines (The Independent)
  - Court Rejects Settlement on School Claims for Frosted Mini-Wheats (Education Week)
  - Per capita consumption of caloric sweeteners in the United States from 2000 to 2022 (in pounds) (Statista)
  - Prevalence of obesity and severe obesity among U.S. adults from 1999 to 2018 (Statista)
  - The misleading smoker's paradox" (European Society of Cardiology)
  - Study shows obesity paradox' does not exist: waist-to-height ratio is a better indicator of outcomes in patients with heart failure than BMI (European Society of Cardiology)
  - Fat people live longer than those who are skinny' (The Independent)
  - Smoking and reverse causation create an obesity paradox in cardiovascular disease (Wiley)
  - Fact Check: Misleading data used to claim COVID vaccines do more harm than good (Reuters)
  - The Scientific Process (American Museum of Natural History)
  - What is an evidence hierarchy? (NSW Government)
  - What Is a Case Study? (Verywell Mind)
  - The Role of Meta-Analysis in Scientific Studies (Verywell Mind)
  - APA Dictionary of Psychology (American Psychological Association)
  The post How to Lie with Statistics - Misleading Ways to Use Numbers appeared first on The Tech Report.

Source	RSS or Atom Feed
Feed Location	https://techreport.com/feed/
Feed Title	Techreport
Feed Link	https://techreport.com/