EEG Interrater Reliability
James D. Geyer
Paul G. Cox
Paul R. Carney
INTRODUCTION
As with most facets of medicine, a number of myths surround electroencephalography and its interpretation. Many requesting physicians believe that the EEG can provide far more specific information than can be leveraged from the recorded data. Even more problematic is the belief that the interpretation of the EEG recording represents a definitive diagnostic answer. In fact, the EEG interpretation itself should be viewed as data requiring clinical correlation. At best, it is based on a sample of cortical activity detectable at scalp electrodes at a particular time with limited representation of the sleep and wake states. Furthermore, the interpreting physician and the individuals utilizing the data should recognize the limitations of interrater reliability in assigning significance to the EEG report. We should not become overconfident in the independent validity of our interpretations and should always have a healthy skepticism of their relationship to the truth, especially when they lack reasonable correlation with the clinical history and other data.
The observer/interpreter is an important source of error in the final report of an EEG.1,2,3,4,5,6,7,8,9,10,11,12,13 This issue has been a topic of concern among expert electroencephalographers for many years.14 Despite recognition of this problem by leaders in the field, interrater reliability continues to be poorly understood in the EEG community, and those further removed from reading EEGs may be completely unaware of the issue.
ASSESSMENT OF INTERRATER AGREEMENT (OR VARIABILITY)
The kappa coefficient has been described as the ideal statistic to quantify agreement for dichotomous variables. The kappa calculation assumes that the rated items are independent. Kappa is calculated using the formula in Eq. 24.1, where Pr(a) is the relative observed agreement between raters, and Pr(e) is the hypothetical probability of chance agreement, using the observed responses to calculate the probabilities of each observer randomly assigned to each category1:

Fleiss’ kappa is a statistical measure for assessing the reliability of agreement among a fixed number of raters. Table 24.1 details the interpretation of the kappa score, from poor to nearly perfect agreement. Other kappa measurements, such as
Cohen’s kappa, assess the agreement between two raters or the intrarater reliability for one interpreter vs themselves.15
Cohen’s kappa, assess the agreement between two raters or the intrarater reliability for one interpreter vs themselves.15
TABLE 24.1 Landis and Koch interrater reliability scores
![]() Stay updated, free articles. Join our Telegram channel![]() Full access? Get Clinical Tree![]() ![]() ![]() |
---|