THREE CLASSES OF STATISTICS IN PSYCHIATRIC RESEARCH

The word “statistics” derives from a term used for “numbers describing the state;” that is, the original statistics were numbers used by rulers of states to better understand their population. Thus, the first statistics were simply counts of things (such as the population of towns, or the amount of grain produced by a particular town). Today, we call these kinds of simple counts or averages “descriptive statistics,” and these are used in almost every research study, to describe the demographic and clinical characteristics of the participants in a particular study.

Modern psychiatric research also involves two additional classes of statistics: psychometric statistics and inferential statistics. Most psychiatric studies will involve all three classes of statistics.

In psychiatric research, demographic variables (such as gender and height) can be measured objectively. However, most of our studies also require the measurement of variables that are not as objective (e.g., clinical diagnoses and rating scales of psychopathology). Here, we usually cannot measure directly the characteristics we are really interested in, so instead, we rely on a subject’s score on either self-report or on investigator-administered scales. Psychometrics is concerned with how reproducible a subject’s score is (i.e., how reliable it is), and how closely it measures the characteristic we are really interested in (i.e., how valid it is).

Psychiatric researchers study relatively small samples of subjects, usually with the intent to generalize their findings to the larger population from which their sample was drawn. This is the realm of inferential statistics, which is based on probability theory. Researchers are reporting inferential statistics when you see the telltale p-values and asterisks denoting statistical significance in the text and tables of the Results sections.

All three kinds of statistics (descriptive, psychometric, and inferential) are present in most published papers in psychiatric research, and are considered in a particular order, for the following reasons. First, without reliable and valid measures, neither of the other kinds of statistics will be meaningful. For example, if we rely solely on clinicians’ judgments of patient improvement, but the study clinicians rarely agree on whether a particular patient has improved, any additional statistics will be meaningless. Likewise, a measure can be very reliably measured, as with a patient’s cell phone number, but this measure is not reliable for any of the purposes of the study. Second, descriptive statistics are needed to summarize the many individual subjects’ scores into summary statistics (such as counts, proportions, averages [or means], and standard deviations) that can then be compared between groups. Inferential statistics would be impossible without first having these summary statistics. Third, without inferential statistics and their computed probability values, the researcher cannot generalize any positive findings beyond the particular group being studied (and this is, after all, the usual goal of a research study).

Table 62-1 illustrates the characteristics of each class, as well as the order in which the classes must be considered, since each successive class rests on the foundation of the preceding class.

Table 62-1 The Three Classes of Statistics Used in Psychiatric Research (in Order of Applicability)

Class of Statistic	Purpose	Examples
Psychometric Statistics	Measures of reliability and validity of rating scales and other measures. Once measures are shown to have adequate reliability and validity, they can then be used as descriptive statistics.	• Test-retest reliability coefficient • Intraclass correlation coefficient • Kappa coefficient • Sensitivity • Specificity
Descriptive Statistics	Statistics used to summarize the scores of many subjects in a single count or average to describe the group as a whole. After descriptive statistics have been computed for one or more samples, they can then be used to compute inferential statistics to attempt to generalize these results to the larger population from which these samples were drawn.	• Mean • Median • Standard deviation • Variance • Estimates of effect size • Proportions • Percentages • Mean differences • Odds ratios
Inferential Statistics	Statistics computed to compute probability estimates used to generalize descriptive statistics to the larger population from which the samples were drawn.	• t-statistic • F-statistic • χ² statistic • Confidence intervals

Concrete Examples of the Three Classes of Statistics in a Research Article

To provide a concrete example of these sometimes abstract concepts, consider a fictional study based on the simplest research design in psychiatric research: a randomized double-blind trial of a new drug versus a placebo pill for obsessive-compulsive disorder (OCD).

Figures 62-1 through 62-3 contain the annotated Method and Results sections for this fictional study, showing how the various psychometric statistics are presented in the Method section, while descriptive statistics are presented in the Method and Results sections, and inferential statistics are presented in the Results section (for definitions of terms used in these figures, refer to the section on statistical terms and their definitions).

Figure 62-1 Fictitious Method section annotated to illustrate psychometric statistics.

Figure 62-2 Fictitious Results section annotated to illustrate descriptive statistics.

Figure 62-3 Fictitious Results section annotated to illustrate inferential statistics.

Experiment-wise Error Rate

Researchers should test only a few carefully selected hypotheses (specified before collecting their data!) if their obtained p-values are to have any meaning. The more statistical tests you perform, the greater the chance of finding at least one significant by chance alone. Table 62-2 illustrates this phenomenon.

Table 62-2 Experiment-wise Error: Did the Researcher Find a Single Result Significant Solely by Chance?

Number of Statistical Tests Performed at p < .05	Probability of at Least One False-Positive Finding^*
1	.05
2	.09
3	.14
4	.18
5	.22
6	.26
7	.30
8	.33
9	.36
10	.41
15	.53
20	.64
30	.78
40	.87
50	.92

* Experiment-wise error rate.

One should not be impressed by a researcher who conducts eight t-tests, finds one significant at p < .05, and proceeds to interpret the findings as confirming his theory. Table 62-2 shows us that with eight statistical tests at p < .05, the researcher had a 33% chance of finding at least one result significant by chance alone.

Selecting an Appropriate Statistical Method

The two key determinants in choosing a statistical method are (1) your research goal, and (2) the level of measurement of your outcome (or dependent) variable(s). Table 62-3 illustrates the key characteristics of the various levels of measurement and provides examples of each.

Table 62-3 Levels of Measurement of Variables

Level of Measurement	Description of Level	Examples
Continuous (also known as interval or ratio)	A scale on which there are approximately equal intervals between scores	Beck Depression Scale Diastolic blood pressure Age of subject
Ordinal (also known as ranks)	A scale in which scores are arranged in order, but intervals between scores may not be equal	Class ranking in school Any continuous measure that has been converted to ranks
Nominal (also known as categorical)	Scores are simply names for different groups, but the scores do not imply magnitude. Often used to define groups based on experimental treatment or diagnosis	Diagnostic category Ethnicity Zip code of residence
Dichotomous (also known as binary)	A special case of a nominal variable in which there are only two possible values	Gender (M or F) Survival (Y or N) Response (Y or N)