Controversies in Artificial Intelligence in Neurosurgery

Artificial intelligence (AI) has evolved from science fiction to a technology infiltrating everyday life. In neurosurgery, clinicians and researchers are exploring ways to implement this powerful tool to improve the safety and efficiency of the perioperative process. Current applications include preoperative diagnosis, intraoperative detection and recommendations, and technical skills assessment and feedback. Although the potential benefits are evident, AI integration into neurosurgical workflows requires discussions around ethical regulations, cybersecurity, privacy concerns, and data and algorithm ownership.

Key points

•

Artificial intelligence (AI) may become the cornerstone of technology that helps clinicians with decision-making and increases situational awareness.
•

There is growing literature and evidence on AI’s utility in assessing surgeons’ and trainees’ technical ability.
•

The surgical society may integrate scaled and standardized data collection to yield the best outcomes in AI integration in surgery.
•

Future operating rooms may become fully AI-integrated to monitor the intraoperative process, aiding the surgeon to increase safety and efficiency.

Introduction

Surgical practice has transcended its origins in rudimentary techniques, rituals, and shared beliefs, evolving into a domain driven by scientific rigor and technological advancement. Although human intellect has guided this progression to date, we now face the rise of advanced computational systems that can supplement our intelligence. Recent strides in data availability, computational prowess, and scientific innovation herald a new epoch of artificial intelligence (AI) systems with far-reaching effects across society. These AI systems, adept at performing large-scale data analyses through methods often inscrutable to human comprehension, present unprecedented opportunities for scientific and practical medical advancements. Can AI systems and surgeons collaborate to enhance the safety and efficiency of neurosurgical procedures?

Across medicine, AI is often touted as offering unprecedented opportunities for precision, efficiency, and personalized care ( Fig. 1 ). The future of the neurosurgical operating room may require a fusion of human and machine intelligence to achieve lasting breakthroughs. By integrating high-fidelity AI applications with comprehensive intraoperative data, AI can be a valuable tool for monitoring the operating room, identifying critical information, and intervening in suboptimal scenarios. AI technologies, such as machine learning and computer vision, are increasingly employed in various stages of surgical procedures, from preoperative planning to intraoperative assistance and postoperative monitoring. These technologies offer the promise of significant improvements in patient outcomes and operational efficiencies. However, the rapid adoption of AI in surgery also brings forth a multitude of controversies and challenges that need to be addressed.

In this article, we explore vital controversies shaping the discourse around AI in neurosurgery as of mid-2024, providing a balanced examination of the pros and cons of using AI inside and outside the operating room. These controversies present a complex interplay of potential benefits and serious concerns. Ethical implications, data privacy, and security, effects on surgical training and employment, regulatory and validation challenges, and bias and fairness in AI algorithms require significant consideration. For instance, who is ultimately accountable for its decisions and actions if AI supplants surgeons to enhance surgical precision and reduce human error? Similarly, the extensive use of patient data to train AI systems can lead to more personalized treatments and heighten the risk of data breaches and privacy violations. In addressing these controversies, we provide a comprehensive overview of the current landscape and highlight critical considerations that will define the future of AI in neurosurgery.

What is artificial intelligence?

First, we will define critical terms such as AI, machine learning (ML), and deep learning. At its core, AI refers to the demonstrated capability of a machine to reason without explicit instruction, imitating human behavior. AI rapidly transforms numerous fields, including health care, finance, and technology. AI can be seen today in various sub-disciplines, including natural language processing, computer vision, and robotics. These disparate use cases are united by a desire to create machine systems capable of performing tasks that typically require human intelligence ( Fig. 2 ).

In machine learning (ML, a subset of AI), algorithms enable computers to learn from data to make predictions or classifications. Unlike traditional programming, where explicit instructions are given, ML algorithms learn from data to identify patterns and relationships within datasets. A subset of machine learning called deep learning uses a computational system called a neural network with many layers and interconnections that are similar in general principle to the functioning of the human brain. Although many other types of machine learning systems exist, neural networks have come to dominate the field of machine learning in recent years. The primary types of machine learning include supervised, unsupervised, and reinforcement learning.

In supervised learning, algorithms are trained on labeled data, meaning the input data are presented to the algorithm paired with the correct output. For example, an input data point might be a text or an image from a computed tomography (CT) scan. The label would be the diagnosis associated with that text or image, such as “subarachnoid hemorrhage.” A given input can have multiple labels depending on the data structure. The goal of a supervised algorithm is to learn the relationship between input data and their labels to generalize to new, unseen data with the same underlying relationships. Typical applications include image classification, speech recognition, and medical diagnosis. For instance, in medical imaging, a supervised learning model might be trained on thousands of pairs of MRI images and pathology reports to predict the text of a pathology report from a previously unseen MRI image.

On the other hand, unsupervised learning refers to algorithms that discover patterns in data despite not being presented with explicit labels. These unsupervised models find a hidden structure or pattern within the data. Techniques such as clustering and dimensionality reduction fall under this category. For example, we might be able to identify common patterns within tumor histology cells to separate slides containing cancer cells from those without cancer, even without knowing which pathologic specimens are cancerous or non-cancerous. Once the unsupervised algorithm separates the 2 groups, an expert can review them and say which group is which.

Most commonly utilized in robotics, reinforcement learning involves training an agent to make a sequence of decisions by rewarding it for desirable actions and penalizing it for undesirable ones. This type of learning is loosely inspired by behavioral psychology and is used in various applications in medicine where behavior can be modeled. Using reinforcement learning, agents or computational representations of human users can be taught to perform biomedical tasks, such as functions within the electronic medical record.

In addition to the specific purpose models described earlier, a new category of AI models called generative AI emerged. The vital feature of generative AI is that the output of a generative AI model is a type of input into the model. For example, rather than generating a numerical prediction (“5 cases of brain aneurysm”) or a classification of a piece of text (“patient is not disabled”), a generative text model might generate new text as its output, allowing for conversational interactions. Large language models are a type of generative model that is capable of developing art and other images. Some generative models can generate new data modalities, such as converting text to images, given particular input modalities. In contrast, others convert within single modalities, such as transforming an MRI into a simulated CT scan of the same patient.

A final subtype of the generative AI model is the foundation model. These models are not optimized for a single task, such as classifying whether an image is a cat or a dog, but can perform many different tasks. Many generative AI models, such as large language models, can be considered foundation models because they can demonstrate significant skills to represent the world in text by answering questions and creating art or images.

Artificial intelligence systems outside the operating room: pathologic and radiographic image interpretation

The most immediate application of AI in neurosurgery is the ability of AI systems to perform image classification and prediction tasks. The high performance of AI image classifiers on pathologic and radiographic interpretation represents a transformative advancement in medical diagnostics—and the first example of where human practitioners will be forced to grapple with AI systems in medical practice. For instance, we can see these systems used for stroke detection from CT scans and finding tumor cells on digital pathology slides. Using AI systems might lead to earlier and more accurate diagnoses, potentially improving patient care.

However, the increasing reliance on AI for critical medical decisions raises significant ethical concerns about accountability and autonomy. There is a risk that AI might make difficult decisions for humans to interpret or contest, leading to potential moral and legal dilemmas. As AI continues to evolve and integrate into medical practice, these challenges must be addressed to ensure its benefits are realized without compromising ethical standards.

Accuracy and reliability

AI algorithms have demonstrated remarkably high levels of accuracy in interpreting pathologic and radiographic images. These systems can analyze vast amounts of data rapidly and consistently, identifying subtle patterns and anomalies that human practitioners might miss. This capability allows for earlier and more accurate diagnoses, which is crucial in improving patient outcomes and facilitating personalized treatment plans. By leveraging AI, health care providers can enhance their diagnostic accuracy and efficiency, leading to better-informed clinical decisions and optimized patient care.

However, despite their high accuracy, AI systems are capable. Concerns about the reliability of these algorithms in diverse clinical settings persist. Variability in image quality, differences in patient populations, and other contextual factors can significantly impact the performance of AI models. Additionally, the “black box” nature of some AI systems, meaning that a human cannot view the explicit decision-making process of the system, can pose significant challenges. This lack of transparency can make it difficult for clinicians to trust and validate AI-generated results, potentially leading to hesitation in adopting these technologies in clinical practice. It is not clear that adopting AI technology will improve the performance of radiologist-AI or pathologist-AI teams equally. Ensuring the reliability and transparency of AI systems is essential for their successful integration into health care.

Training data and bias

AI systems trained on large and diverse datasets have shown the ability to generalize well to new data, which can significantly reduce diagnostic disparities and improve care across different patient demographics. By systematically analyzing extensive datasets, AI can standardize interpretations and reduce the variability often seen between different observers. This consistency is particularly beneficial in medical diagnostics, where uniformity in interpretation can lead to more equitable and accurate patient outcomes. The ability of AI to generalize across various data points ensures that diagnostic tools remain effective in diverse clinical scenarios, potentially enhancing the overall quality of health care.

However, the benefits of AI in improving diagnostic consistency and reducing disparities are contingent upon the representativeness of the training data. The data used to train these AI systems rarely reflect the broader patient population. In that case, there is a risk that these models could perpetuate or even exacerbate existing biases in health care. For instance, an AI model predominantly trained on data from one demographic group may exhibit poor performance when applied to images from patients of other demographic groups, leading to diagnostic inaccuracies and disparities in treatment. Ensuring that AI systems are trained on diverse and comprehensive datasets is crucial, yet it presents significant challenges. Achieving this level of diversity requires extensive data collection and curation efforts to encompass a wide range of variables in the broader patient population.

Integration into clinical workflow and hospital operations

AI can enhance the efficiency of clinical workflows by quickly and accurately interpreting images, allowing radiologists and pathologists to focus on more complex cases. This can reduce workload and burnout among clinicians, improving overall health care delivery. If AI can streamline diagnostic processes, its implementation might reduce diagnostic errors and the need for repeat imaging. If the promise of AI is realized in practice, by improving the efficiency and accuracy of image interpretation, AI can contribute to more cost-effective health care delivery.

On the other hand, integrating AI into existing clinical workflows presents several challenges. Most AI algorithms are difficult to integrate within electronic health record systems and other health care information technology infrastructure. Additionally, clinicians may require training to effectively use AI tools, and there may be resistance to adopting new technologies due to concerns about job security and changes in traditional roles. The development, implementation, and maintenance of AI systems can be expensive. Smaller health care facilities or those in resource-limited settings may find it challenging to invest in AI technologies. Lastly, the economic benefits of AI must be balanced against the potential impact on employment, as the automation of certain tasks could affect jobs in radiology and pathology.

Artificial intelligence systems within the operating room

Due to success in image classification and prediction tasks outside of the operating room, AI has been proposed as a tool to study visual data collected within the operating room ( Fig. 3 ). Surgical videos store detailed information capturing live, meticulous, high-stakes procedural work and critical anatomic structures. In current surgical practice, surgical video recording is used for skills assessment, training, and quality improvement, in a limited fashion. The limitations are mainly due to time-consuming subjective evaluation by human experts, but automated systems such as AI promise both a more objective and less human-dependent quantification of surgical processes and a data-driven automated assessment. AI may present itself as a valuable and powerful tool to leverage video data, understand composites of surgical expertise, and provide feedback to the surgeon and the team without needing continuous human input.