Abstract
Advances in our understanding of the human body and technology have revolutionized modern medicine, allowing us to easily treat many conditions that were once considered a death sentence. The use of an improved understanding of biological processes and the development of disease biomarkers has led to the growth of “precision medicine” – which enables the ability to produce more objective diagnoses through individualized treatments that are more efficient and effective. The core concept of integrating precision medicine into the diagnosis and treatment of disease is now a commonplace and growing in many areas of medicine, notably the use of genomics in oncology.
16.1 Machine Learning: An Answer to Historic Challenges in Psychiatry?
Advances in our understanding of the human body and technology have revolutionized modern medicine, allowing us to easily treat many conditions that were once considered a death sentence. The use of an improved understanding of biological processes and the development of disease biomarkers has led to the growth of “precision medicine” – which enables the ability to produce more objective diagnoses through individualized treatments that are more efficient and effective. The core concept of integrating precision medicine into the diagnosis and treatment of disease is now a commonplace and growing in many areas of medicine, notably the use of genomics in oncology. However, diagnosis and treatment in psychiatry remains largely dependent on observable subjective symptoms and without objective biomarkers (1, 2). In addition, individual variability among patients contributes to a wide variation in patient responses to psychiatric treatment. For example, after initial treatment, over 50% of patients with major depressive disorder do not reach remission (3–5). Psychiatric research studies have suggested that there are biologically defined “subgroups” or “bio-types” of mental disorders, an observation that has pushed for a shift to classify psychiatric conditions as “brain disorders” (2). In order to elucidate these subgroups, the National Institute of Mental Health developed the Research Domain Criteria (RDoC) that aims to determine the mechanisms that result in dysfunction through basic science rather than symptomatology. The RDoC framework calls for research that integrates behavioral, biological, and environmental factors to facilitate the development of objective measures of psychopathology (6). Most noticeably though, such an undertaking requires massive data collection and data analysis methods that go beyond the abilities of traditional statistical and data analysis methods. Consequently, machine learning (ML) techniques have provided a promising avenue to analyze large datasets acquired in psychiatric research and support new discoveries. Briefly, ML is a branch of computer science and artificial intelligence that involves developing and validating algorithms that can learn from patterns gleaned from large datasets and subsequently allow predictions on previously “unseen” observations (7). Therefore, due to their ability to handle high-dimensional and large datasets, ML techniques and algorithms are well suited to be a key player in the redefinition of clinical tools used in the diagnosis and treatment of mood disorders (7). In this chapter, we will briefly discuss key concepts used in ML and explore how such concepts and ensuing tools are used in the study and treatment of mood disorders.
16.2 Machine Learning Techniques
ML techniques can be classified into three broad categories, namely supervised ML, unsupervised ML, and reinforcement learning. In this section, we briefly explore these broad categorizations and introduce specific use cases for such methods in the context of research in mood disorders.
16.2.1 Supervised ML
In supervised learning, a ML algorithm is developed and “trained” using a set of observations with corresponding labels. For example, in the context of a mood disorders study, a set of observations may represent neuroimaging scan data from healthy controls and bipolar disorder (BD) patients coupled with corresponding labels (BD +1, healthy controls −1) (Figure 16.1). These observations are subsequently used to “train” an algorithm to recognize characteristics in data, in this case, the neuroimaging scans that differentiate the target groups (e.g., healthy control vs. BD patients). The resulting “trained” algorithm is evaluated using a subset of “novel” labeled observations not included in the algorithm “training” process (8),(9). The most commonly used supervised ML techniques in the mood disorders domain include support vector machines (SVMs), relevance vector machines (RVMs), Elastic Net, and Least Absolute Shrinkage Selection Operator (LASSO) among others as highlighted in Table 16.1. Typical clinical and research applications of supervised ML currently include disease predictive classification (e.g., healthy vs. bipolar disorder (10)) and decoding of continuous clinical scales (e.g., Beck Depression Inventory (11)) using biological data such as neuroimaging scans.
Table 16.1 Common methods used in machine learning pipelines
Methods | Model details and categorization (e.g., supervised or unsupervised) |
---|---|
Linear regression models |
|
Linear and nonlinear kernel-based models |
|
Decision trees |
|
Additive models |
|
Artificial neural networks and deep learning |
|
Multivariate data dimensionality reduction |
|
Multidimensional data clustering |
|
Model evaluation metrics |
|
16.2.2 Unsupervised Learning
Unlike supervised ML, where the input data are labeled (e.g., disease +1 vs. healthy −1), in unsupervised ML, the input data are not labeled, and the main goal is to find hidden patterns within a dataset. Therefore, unsupervised ML techniques largely utilize data dimensionality reduction methods (e.g., principal component analysis) coupled with data clustering techniques (e.g., K-means) to identify hidden patterns and clusters within a dataset. Unsupervised ML techniques have recently been used to identify unique biological groupings or clusters in mood disorders – also known as “biotypes” (12).
16.2.3 Reinforcement Learning
Reinforcement learning entails “training” an algorithm that is able to take specific actions that maximize cumulative rewards. Notably, these algorithms mimic the human decision-making process where there are often an arbitrary number of actions to choose from and eventually learn from positive outcomes (i.e., reward) or negative outcomes (i.e., punishment). Typical examples of reinforcement learning applications have included mapping of positive and negative prediction errors to the firing of dopaminergic neurons in mood and affective disorders (13),(14). Most recently, reinforcement learning algorithms are increasingly being used to select optimal treatments (e.g., antidepressants) as they mimic the trial and error process used in selecting treatments during clinical practice.
However, despite the three categories of ML algorithms highlighted earlier (i.e., supervised, unsupervised, and reinforcement learning), below we highlight three overarching concepts used in ML to practitioners in establishing and validating ML algorithms before they are reported in research products or deployed for clinical purposes. Here we introduce these concepts.
16.2.4 Selection of Algorithm Training and Validation Samples
An “objective” ML algorithm is the one that is able to “generalize” results to a novel or new sample that it was not previously exposed to. Therefore, in order to develop an “objective” ML algorithm in a ML-related project, the first step entails splitting a dataset into independent “training” and “validation” sets. The “training” set is used in “training” the algorithm by identifying the best algorithm parameters whilst the “validation” set is used to establish whether the final algorithm/model is generalizable by making accurate and objective predictions. Consequently, it is a common practice to separate a dataset into two groups (i.e., training and validation sets) before embarking on a ML project.
16.2.5 Feature and Data Dimensionality Reduction
Raw data, particularly in specific psychiatric research domains such as neuroimaging and genomics are often acquired in high dimensions (e.g., >100,000 voxels) and may also contain measurement noise. In the context of ML, this problem is also referred to as the “curse-of-dimensionality” or “small-n-large-p” problem where there are significantly large number of predictors (e.g., neuroimaging voxels) as compared to a low number of observations (i.e., subjects) (15) . This may greatly hamper a ML algorithm as it’s not able to identify a best-fit solution – a problem also known as overfitting (15). Therefore, to circumvent this problem, data dimensionality reduction and feature reduction tools such as principal component analysis (PCA) or univariate t-test among other techniques are often employed to extract a subset of features or predictors (e.g., neuroimaging voxels) that are meaningful to the ML task at hand. The subset of features or predictors extracted using the data dimensionality or feature reduction techniques are subsequently used to “train” a ML model instead of the original raw data. Previous research studies on this domain have shown that these feature reduction techniques lead to ML models with higher accuracy and better generalization ability by being able to make accurate predictions from previously “unseen” observations in a validation sample.
16.2.6 Model Training and Parameter Optimization
Training a ML algorithm entails establishing parameters that can maximize prediction accuracy and promote model generalizability to a novel or previously “unseen” sample. Therefore, to achieve this goal, it is common practice to use cross-validation methods that support selection of “best-fit” parameters. Therefore, N-fold cross-validation (e.g., 10-fold or 5-fold) is often used by randomly separating the data into N subgroups while the algorithm is “trained” on N − 1 subgroups and tested on the left-out group. This is repeated so that each group is left out to estimate model prediction errors and accuracy across the N trials. Upon completion, the model and model parameters with the highest accuracy or least errors are selected to establish the final model. The final accuracy on a validation sample determines the generalizability of the model. In Table 16.1, we have briefly outlined key ML techniques and their categorizations.
16.3 Applications of Machine Learning Techniques to Neuroimaging and Clinical Data in Mood Disorders
16.3.1 Diagnostic Classification of Mood Disorders, Decoding Clinical Variables, Identification of Unique Disease Subtypes and Supporting Mechanistic Understanding
Despite recent progress, our understanding of the mechanistic pathophysiology of major mood disorders such as BD and major depressive disorder (MDD) still remains limited. Early neuroimaging studies used mass-univariate statistical methods coupled with neuroimaging scan data to elucidate critical insights on brain structural and functional differences between patients with mood disorders and comparative healthy controls. For example, through these studies, fronto-limbic structural abnormalities in BD patients were reported (44). In addition, volumetric and structural connectivity in the anterior cingulate cortex (ACC) in patients with MDD have also been reported (45). More recently, neuroimaging studies have leveraged ML techniques to classify or distinguish individual patients with mood disorders from healthy controls. For example, through a systematic review with fifty-one research studies, Librenza-Garcia and colleagues observed that ML coupled with structural and functional neuroimaging scans can accurately differentiate BD patients from healthy controls and other psychiatric diagnosis such as MDD (10). Another recent systematic review observed gray matter volume reductions in bilateral insula, right superior temporal gyrus, bilateral anterior cingulate cortex, and left superior medial frontal cortex in MDD and BD patients as compared to healthy controls (44). Predictive white matter abnormalities in the genu of the corpus callosum were also observed in both MDD and BD patient groups (44). Other studies have attempted to predict or decode continuous clinical rating scales from neuroimaging scans. This is followed by a subsequent examination of brain regions involved in predicting such clinical rating scales. For example, Mwangi and colleagues (11) reported prediction of the self-reported Beck Depression Inventory (BDI) using structural neuroimaging scans coupled with a kernel-based relevance vector regression ML algorithm in patients with MDD. This study reported correlation between actual BDI scores and predicted BDI scores at Pearson correlation coefficient = 0.694 and significant at p < 0.0001. Furthermore, the medial frontal, superior temporal gyrus, and parahippocampal gyrus were heavily involved in decoding the BDI scores in patients with MDD. In another study (46), BDI and Snaith-Hamilton Pleasure Scale (SHAPS) were accurately predicted in a cohort of fifty-eight patients with MDD using a supervised linear regression ML technique and functional connectivity data and identified several functional networks associated with anhedonia and negative mood as the main contributors. Another study predicted Functioning Assessment Short Test (FAST) (47) from a cohort of thirty-five patients with BD type I using a supervised support vector regression ML algorithm and structural neuroimaging scan data (48). The FAST score is used to measure functional impairment in BD and was predicted by volumetric reductions in the left superior and left rostral medial frontal cortex as well as right lateral brain ventricular enlargements. This indicates that a supervised ML algorithm together with structural neuroimaging scans can predict functional impairment in BD patients. In a similar pattern, multinational studies from the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA) consortium have also reported successful diagnostic classification of MDD (49) and BD (48, 50) patients as compared to healthy controls using neuroimaging scans from thousands of patients acquired from multiple centers around the world.
Recently, there has been a shift in psychiatric research toward identification of data-driven disease subtypes also referred to as phenomapping, which has partly been inspired by the NIMH’s RDOC criteria (6). Therefore, researchers have leveraged unsupervised ML techniques such as multivariate data-dimensionality reduction coupled with high-dimensional data clustering algorithms capable of identifying unique disease subtypes in BD and MDD. For instance, Wu and colleagues (37) used an unsupervised ML approach to cluster neurocognitive data derived from BD-I and BD-II patients into two distinct subtypes. Subsequently, the data derived subtypes were validated using a linear regression Elastic Net ML algorithm coupled with fractional anisotropy (FA) and mean diffusivity (MD) measures of brain diffusion tensor imaging (DTI) with 92% and 75.9% accuracy, respectively. Abnormalities in the inferior fronto-occipital fasciculus and minor forceps of the corpus callosum white matter tracts of patients with BD were found to be major contributors in separating the two data-derived subtypes of BD from healthy controls. In another study (51), a data-driven approach was used to identify transdiagnostic subtypes of mood disorders that span multiple clinical diagnoses. This study applied a hierarchical data clustering algorithm to identify unique subgroups that were subsequently validated using an independent sample. However, although there are promising results from the phenomapping literature, we still need to remain cautiously optimistic as attempts to replicate such disease subtypes in independent samples have in some cases not been successful (52).
16.3.2 Prediction of Treatment Response
Prediction of treatment response, such as being able to identify individual patients with MDD that are likely to have a positive response to a particular antidepressant is a well-documented problem in psychiatry (53),(54). Therefore, in the past decade, a plethora of studies in mood disorders have employed ML techniques to predict individual patients’ likelihood of responding to antidepressants or mood stabilizers. For instance, Webb and colleagues (55) examined whether a ML technique can recommend individualized treatment in a eight-week trial of sertraline versus placebo with a cohort of 216 depressed individuals. This study observed that a ML technique can identify a subset of MDD patients that are optimally suited for sertraline primarily based on a few clinical and demographic variables. Another study using the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) dataset (56) developed a supervised gradient boosting ML algorithm to predict patients who may benefit from citalopram following a twelve-week course of treatment. The ML algorithm achieved an accuracy of 64.6% and twenty-five clinical variables were selected by the Elastic Net ML algorithm as the top contributors to the observed accuracy. Numerous other studies have used a similar supervised ML approach to predict patients likelihood of response to antidepressants in MDD (57–60), electroconvulsive therapy in MDD (61, 62), and lithium in BD (63) using structural/functional neuroimaging scans, electroencephalogram (EEG), and clinical/demographic data. Although it’s not a common practice in psychiatry, recent studies in oncology are beginning to use reinforcement learning algorithms to implement automated radiation adaptive protocols in treating lung cancer (64). The reinforcement learning approach may be particularly well suited for adaptive protocols in MDD as it mimics the current gold standard of selecting optimal antidepressants through a “trial and error” process (65). Lastly, although there is significant progress in optimizing treatments for patients with mood disorders using ML techniques, the majority of studies have largely used retrospective data and resulting ML models have not been translated into actual clinical practice.
16.3.3 Prediction of Other Clinical Outcomes Such As Suicide, Medication Side Effects and Clinical Staging
ML techniques have also been a powerful asset at assessing and predicting other clinical outcomes such as suicidality and medication side effects, and, to some extent, recent studies have been successful at establishing disease stages. Two recent studies used large electronic medical records (EMR) datasets as input predictors with a number of supervised ML algorithms (e.g., Elastic Net, Random Forest and LASSO) and managed to predict suicide risk among patients in a psychiatric hospital or emergency department with specificity and sensitivity greater than 0.7 (66, 67). Interestingly, Passos and colleagues (68) reported accurate predictions (accuracy = 72%, sensitivity = 72.1%, and specificity = 71.3%) at predicting individual suicide attempters in a preliminary study with a cohort of 144 patients with BD and MDD. The kernel-based relevance vector ML technique used in this study identified previous hospitalizations for depression, a history of psychosis, cocaine dependence, and posttraumatic stress disorder (PTSD) comorbidity as the most relevant predictors of suicide attempt in mood disorders. This further highlights that ML techniques can not only aid in prediction of psychiatric patients at risk of attempting suicide but can also guide researchers to clinical factors that contribute to such events and open novel avenues for clinical interventions. Prediction of medication side effects has also shown promise as a prime application for ML techniques. For example, although lithium is a first-line form of treatment in BD, its risk for developing renal insufficiency reportedly discourages its use in treating BD (69). A study of 5,700 patients receiving treatment with lithium reported a regression ML technique-powered EMR data that was able to predict renal insufficiency risk with an area under the curve (AUC) of 0.81 (69). The authors observed that older age, female sex, history of smoking, history of hypertension, overall burden of medical comorbidity, and diagnosis of schizophrenia or schizoaffective disorder were the major contributing factors in predicting renal insufficiency among those receiving lithium treatment. This highlights that such ML tools can support clinicians to make informed decisions and facilitate the development of strategies that reduce negative outcomes such as side effects. Lastly, we highlight the use of ML techniques in predicting and validating disease stages in mood disorders. A recent study showed that structural brain scans can not only distinguish BD patients from healthy controls but also found that a subgroup of patients characterized by higher lifetime manic episodes including psychiatric hospitalizations had markedly higher gray and white matter density loss (70). The authors concluded ML coupled with structural neuroimaging scans is able to stratify BD patients into clinical stages (e.g., early stage vs. late stage BD) in line with the recently proposed clinical staging model of BD (71–74).

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree


