Introduction to Statistics and Epidemiology

Last edited on February 1, 2024

Statistics and Epidemiology in Medicine is important for clinicians to understand. There can often be a difference in clinical outcomes seen in patients in clinical trials, compared to real-world, community settings. Clinical trials usually exclude patients with multiple diagnoses or comorbidities, whereas in the real world, patients have multiple diagnoses and conditions all the time. This disconnect means it can be challenging for clinicians to interpret the efficacy of treatments from clinical trials, systematic reviews, and meta-analyses.^[1] Having a solid understanding of statistical principles can allow you to critically appraise the literature and understanding whether you should trust a study or not!

Rhyme to Remember What Type I and Type II Errors Are

Think like an “evil Big Pharma” drug company that is trying to market a drug:

Type I Error = “Type I? We have won!!”
- i.e. - you have a false positive, this means the drug doesn't work in real life, but by chance/error the study shows it does.
Type II Error = “Type II? What to do?!”
- i.e. - you have a false negative, this means the drug really does work in real life, but the study (falsely, by statistical chance) shows it doesn't work!

Study Findings

If you were looking at a study looking at the association between smoking and lung cancer, you would apply the following table below to look at the probability of a Type I and Type II error.

Type I and II Errors: Study Findings

	Positive Finding in Real World	Negative Finding in Real World
Positive Finding in Study	(a) True Inference	(b) 👻 False Positive (Type I Error)
Negative Finding in Study	(c) 👻 False Negative (Type II Error)	(d) True Inference

Medical Tests

If you were looking at a study looking at how good a test was at detecting lung cancer, you would apply the following table below to look at the probability of a Type I and Type II error.

Type I and II Errors: Medical Tests/Investigations

	Diseased in Real World	Non-Diseased in Real World
Positive Test Result	(a) True Positive	(b) 👻 False Positive (Type I Error)
Negative Test Result	(c) 👻 False Negative (Type II Error)	(d) True Negative

Sensitivity and Specificity are fixed properties of a test and their values never change:

For sensitivity (the “true positive rate”), you are interested in knowing how many people with disease you are able to identify with your test. To determine sensitivity, you need to compare the number of patients who were identified as having the disease, compared to the number of patients who actually have the disease:
- A highly sensitive test rarely misses people with the disease and rarely has false negatives
- You want high sensitivity in a test when it is really bad to miss a disease (e.g. - brain cancer)
- A highly Sensitive test when Negative rules OUT the disease (SNOut)
For specificity (the “true negative rate”), this concept is slightly less intuitive, even though the equation is equally simple compared to sensitivity. Remember that with sensitivity, we focus on the patients who have disease. With specificity, we focus on the patients who do not have disease.
Thus, a test that has high specificity will not identify healthy patients as having disease. With a highly specific test, we would expect that there are a lot of true negatives and only a small amount of false positives (or even better, none at all!).
When we talk about specificity, we are interested in the proportion of people without disease who have a negative test.
- A highly specific test rarely will rarely give false positives
- You want a highly specific test when a false positive result might be harmful to the patient (e.g. - there is some debate that mammography in young women might not be highly specific for breast cancer, and place these patients at risk for further invasive procedures like biopsies and unnecessary psychological stress.)
- A highly Specific test when Positive rules IN the disease (SPIn)

A Non-Mathematical Example

Imagine you have a car with an alarm. In an ideal world, your car alarm will only trigger if someone tries to break into it, and would never be triggered by anything else. You have a dial where you can adjust the threshold for your car alarm.

Since you have a very fancy car, you initially set a low threshold for your car alarm to activate, so it activates even when a small vibration sets it off. In this case, your alarm is very sensitive, since every little movement will trigger it. However, is is not very specific, meaning even non-break in activities will set it off.

After being frustrated by the alarm going off all the time, you now set a high threshold for your alarm. Now your car is very specific for break-ins only. Unfortunately, your car was broken into twice, and it only went off once. Your alarm is thus, not very sensitive, since it missed 50% of the break ins.

In an ideal world, you would have a car alarm that is very sensitive and very specific, perhaps even 100% sensitive and 100% specific. Unfortunately, such a perfect car alarm does not exist. The same is true of medical tests and screening tools.

In An Ideal World...

In an ideal world involving a low prevalence disease, one would want a high sensitivity screening test, that picks up all possible cases first. Then, one would want a high specificity confirmatory test that would for sure rule in the disease.

Sensitivity and Specificity

	Diseased in Real World	Non-Diseased in Real World
Positive Test Result	True Positive (TP)	False Positive (FP)
Negative Test Result	False Negative (FN)	True Negative (TN)
Measure	Sensitivity = (TP) ÷ (TP+FN)	Specificity = (TN) ÷ (FP+TN)

High/Low Sensitivity and Specificity

Sensitivity	Specificity	Interpretation
High	Low	If our test has a high sensitivity but a low specificity, the test will be very good at finding all the disease because it is sensitive. BUT, it will also be easily tricked into thinking the disease is there when it is not because it is not very specific. Our results will show a lot of true positives (which is good), but also a lot of false positives (which is bad).
Low	High	If the test has low sensitivity but high specificity, when we get a positive result, we can be pretty sure that the disease is present. However, if we have a negative result, we can’t be too sure that we have a healthy patient because maybe the test was not sensitive enough to pick up the disease in that particular patient. This type of test is thus most helpful to us when it gives a positive result, because we can be pretty confident that the patient has the disease for sure.

Positive Predictive Value (PPV) and Negative Predictive Value (NPV) are another two concepts that relate to Type I and Type II errors:

Positive predictive value (PPV): probability that someone who has a positive test result actually has the disease.
- The PPV varies directly with patient's pretest probability of having said disease (i.e. - their baseline risk)
Negative predictive value (NPV): Probability that someone who has a negative test result actually does not have the disease.
- The NPV varies inversely with prevalence of the disease or with patient's pretest probability of having said disease

Positive Predictive Value (PPV) and Negative Predictive Value (NPV)

	Diseased in Real World	Non-Diseased in Real World	Measure
Positive Test Result	TP	FP	Positive Predictive Value = (TP) ÷ (TP+FP)
Negative Test Result	FN	TN	Negative Predictive Value = (TN) ÷ (FN+TN)
Measure	Sensitivity = (TP) ÷ (TP+FN)	Specificity = (TN) ÷ (FP+TN)

Accuracy is the final concept related to Type I and Type II errors, and can be calculated below.

Accuracy

	Diseased in Real World	Non-Diseased in Real World	Measure
Positive Test Result	TP	FP	Positive Predictive Value = (TP) ÷ (TP+FP)
Negative Test Result	FN	TN	Negative Predictive Value = (TN) ÷ (FN+TN)
Measure	Sensitivity = (TP) ÷ (TP+FN)	Specificity = (TN) ÷ (FP+TN)	Accuracy = (TP + TN) ÷ (TP + FN + FP + TN)

The terms prevalence and incidence are often used incorrectly in describing how common a disease or disorder is. Here are the correct definitions:^[2]

Prevalence: looks at ALL existing cases of an illness or disease. Think about looking at a photo of 1,000 people (the population), and asking, how many people have black hair? The number of individuals with that feature is used to calculate prevalence.
- Prevalence can also be calculated from a 2×2 table
- Prevalence = (TP + FN) ÷ (TP + FN + FP + TN)
There are 3 ways to express prevalence:
- Point prevalence: The number of cases at a certain time (e.g. - a survey on December 2020 asking if you are actively smoking).
- Period prevalence: The number of cases over a certain time frame, usually 12 months (e.g. - a survey asking if you have smoked between January 2020 to December 2020)
- Lifetime prevalence: The number of cases over one's total lifetime. (e.g. - a survey asking if you have ever smoked in your life)
Incidence: Incidence looks at the number new cases over a period of time. For example, in 1,000 people, if 50 were diagnosed with lung cancer over 2 years, the incidence is 50 cases per 1,000 persons, or 5.0% over a two year period, or 25 cases per 1,000 person-years (incidence rate) – because the incidence proportion (50 per 1,000) is divided by the number of years (2).

Prevalence

	Diseased in Real World	Non-Diseased in Real World	Measure
Positive Test Result	TP	FP	PPV
Negative Test Result	FN	TN	NPV
Measure	Sensitivity	Specificity	Accuracy = (TP + TN) ÷ (TP + FN + FP + TN) Prevalence = (TP + FN) ÷ (TP + FN + FP + TN)

The prevalence of a disease or disorder can change depending on where you are looking for the disease. A disease will have a much higher prevalence in a specialist referral hospital, compared to a primary care setting, which will have a much lower prevalence. Take a look at the example below.

Specialist Hospital

	Diseased in Real World	Non-Diseased in Real World	Measure
Test (+)	50	10	PPV = 50/60 = 83%
Test (-)	5	100	NPV = 100/105 = 95%
Measure	Sensitivity = 50/55 = 91%	Specificity = 100/110 = 91%

The prevalence of the disease in this specialist clinic is the number of real cases of disease, divided by the total number of individuals:

(TP+FN) ÷ (TP+FP+FN+TN) = (50+5) ÷ (50+5+10+100) = 33% prevalence

Primary Care

	Diseased in Real World	Non-Diseased in Real World	Measure
Test (+)	50	100	PPV = 50/150 = 33%
Test (-)	5	1000	NPV = 1000/1005 = 99.5%
Measure	Sensitivity = 50/55 = 91%	Specificity = 1000/1100 = 91%

The prevalence of the disease in the primary care setting is the number of real cases of disease, divided by the total number of individuals:

(TP+FN) ÷ (TP+FP+FN+TN) = (50+5) ÷ (50+5+100+1000) = 3% prevalence

The Take Away Message From This Example

Sensitivity and specificity of a test are immune to changes in prevalence
The PPV and NPV of the test does change with the prevalence of a disease

There are a number of different kinds of data:

Continuous (e.g. - weight, age)
Dichotomous (e.g. - a yes/no diagnosis, yes/no medications)
Categorical (e.g. - type of housing [apartment, condo, house])
Time to event (e.g. - time until death/time until re-hospitalization)
Time trends (e.g. - rates of hospital visits over time, number of falls per year)

For each type of data, there are different univariate and multivariable analytic methods.

Univariate analysis – compares only two variables or different categories of 1 variable (two means of blood pressure or depressed vs. non-depressed)
Multivariable analyses – More than one variable is included in the analysis
- Allows for risk adjustment
- Dependent variable = outcome variable
- Independent variables = predictive factors or variables that need to be accounted for (e.g. - to avoid confounding)

Efficacy is the extent to which an intervention does more good than harm under ideal circumstances (i.e. - in a clinical trial under strict conditions).
Effectiveness is the extent to which an intervention does more good than harm when provided under usual circumstances of healthcare practice (i.e. - in a real world setting). Effectiveness studies are also known as pragmatic trials in research studies, due to the pragmatic nature of the research.^[3]^[4]

Mnemonic

The memory aid “Efficacy is a DELICACY” can be used to remember the that efficacy is measured under strict, fancy, expensively run, randomized clinical trial conditions – a “delicacy”!

See main article: Critical Appraisal and Statistics

Once you know the basics of statistics, it is then important to be able to understand how statistics can be misinterpreted and be able to critically appraise the literature!

Research

Articles

stats-ca

¹⁾ Humphreys, K., & Williams, L. M. (2017). What can treatment research offer general practice?. The Lancet Psychiatry.

²⁾ EUPATI: Epidemiologic concepts: incidence and prevalence

³⁾ Kim, S. Y. (2013). Efficacy versus effectiveness. Korean journal of family medicine, 34(4), 227-227.

⁴⁾ Gartlehner G, Hansen RA, Nissman D, et al. Criteria for Distinguishing Effectiveness From Efficacy Trials in Systematic Reviews. Rockville (MD): Agency for Healthcare Research and Quality (US); 2006 Apr. (Technical Reviews, No. 12.) 1, Introduction.

Introduction to Statistics and Epidemiology

Table of Contents

Primer

Statistics Basics

Measurement

Validity

Type I and Type II Errors

Rhyme to Remember What Type I and Type II Errors Are

Study Findings

Type I and II Errors: Study Findings

Medical Tests

Type I and II Errors: Medical Tests/Investigations

Sensitivity and Specificity

A Non-Mathematical Example

In An Ideal World...

Sensitivity and Specificity

High/Low Sensitivity and Specificity

Positive and Negative Predictive Value (PPV and NPV)

Positive Predictive Value (PPV) and Negative Predictive Value (NPV)

Accuracy

Accuracy

Prevalence vs. Incidence

Prevalence

High and Low Prevalence

Specialist Hospital

Primary Care

The Take Away Message From This Example

Statistics 101

Data Types

Analysis

Epidemiology Basics

Efficacy vs. Effectiveness

Mnemonic

Critical Appraisal

Resources

Online Courses

Research

Articles

References