Development of the Quantified Human

(7.1)

With T equal to the average time required to complete the movement and D (distance) over W (width of target) as a proxy for accuracy; a and b are device dependent constants.

Fitts’ law showed a linear relationship between task difficulty and movement time that has proved to be remarkably robust. Although there have been minor modifications since then, the mathematical relationship applies under a variety of conditions, with different limbs, and holds true even without overt motor movements [11]. More fundamentally, it advanced Shannon’s work by providing the first empirical determination of the information capacity of the human motor system [12]. Providing a mathematically valid description of human performance was not only revolutionary for its time, it continues to be relevant and advantageous today.

Despite the cognitive revolution, much of the human sciences have remained rooted in empirical versus theory-driven studies. Indeed, as critical as the development and application of information theory has been, it has largely remained more descriptive than explanatory [13]. MacKenzie [14] stated that “despite being robust and highly replicable, Fitts’ law remains an analogy waiting for a theory.” In many ways, the real fruits of the cognitive revolution have yet to be picked. What distinguishes an engineering discipline is an objective feedback control mechanism. What’s needed now is to “close the loop,” where the physical and mental states of the operator are fed back into the machine, making the human a more seamless part of the overall system.

James Watson, in his explanation of the goals of behaviorism, says, “Its theoretical goal is the prediction and control of behavior [15].” Without that, as Donald Kennedy so aptly put it, neuro-imaging is akin to post-modern phrenology [16]. As Proctor and Vu stated in their review of Human Factors research progress, “One implication of an emphasis on paradigm shifts is that past research is of little relevance because it is from ‘old’ paradigms. This view is reinforced within human factors because the field deals with new, increasingly sophisticated technologies” [17].

What we propose in this chapter is a framework that reconciles the behaviorists’ demand for objective data with the cognitive desire to understand mental processes directly. If one hopes to design human performance with the same precision as a circuit (or in concert with a circuit), a more quantitative, data-driven approach to human augmentation is needed.

With this in mind, we present the sense-assess-augment (SAA) framework [18], which is based loosely on the adaptive system framework originally proposed by Feigh et al. [19]. It begins with the human, sensing their physical, physiological, and psychological state. Sensing is the most mature piece of the paradigm, thanks to considerable commercial investment in athletics, healthcare, and productivity. Sensors exist or are in development that can measure a huge range of parameters, such as brain activity, eye movement, skin temperature, and biological performance markers, such as blood glucose levels or molecules like orexin that indicate the onset of fatigue. Assessment involves the interpretation of data from multiple, individual sensors and merging it into actionable information. The challenge is to empirically make sense of the data in relation to individual baselines and the needs of the task at hand. Ideally, this is a task shared by both human and machine and happens both in real time and across a lifetime. Finally, based on the assessment, the appropriate augmentation is delivered. Augmentation can take many forms, including the redistribution of tasks from man to machine, just in time or chronic uptake of drugs, external hardware, environmental changes, or even genetic engineering. We will discuss each of these pieces in more detail below.

Each piece of the framework is critical to the design and deployment of human augmentation. Sensing without assessment is frustrating. It is, in fact, one of the most common complaints of consumers trying to make sense of the athletic, health, and productivity data they are collecting. Awash in a flood of data, many ask: what does the data mean and how do I alter my performance accordingly?

Augmentation without the sensing and assessment components is not only potentially dangerous, but breeds distrust among the public and policy-makers. For example, the Air Force pilots responsible for the friendly-fire deaths of Canadian troops in Afghanistan in 2003 implicated “go pills” as the cause of the accident. Although the official investigation found no contribution of the drug to the outcome, the public and media [20] were not persuaded. Physiological monitoring and assessment might have provided objective proof whether the cause was poor judgment by the pilot, a side effect of a widely used drug, or a combination of the two that stemmed from individual susceptibility.

Absent the framework described above, the sensing and augmentation communities have largely worked independently, and the assessment piece has lacked a research leader to make significant progress to bridge them. As we will discuss in detail, if there is one lesson learned from the decades of attempting to deliver the many promises of human performance augmentation, it is the necessity and interdependence of the three steps (Fig. 7.1).

Fig. 7.1

Sense-assess-augment framework (From Galster and Johnson [18])

7.2.1 Sense

In their article “Beyond Asimov: The Three Laws of Responsible Robotics [21],” Woods and Murphy proposed alternatives to Asimov’s classic laws of robotics, stating “The capability to respond appropriately—responsiveness—may be more important to human-robot interaction than the capability of autonomy.” An unfortunate case in point is the 2010 drone attack that killed 23 Afghan civilians. The primary cause of the accident cited by Air Force and Army officials was information overload [22]. In addition to keeping track of video from the drone, operators were also engaged in “dozens of instant-message and radio exchanges with intelligence analysts and troops on the ground.” They failed to mentally account for the children that were part of the civilian assembly.

This is but one illustration that stems from a lack of shared perception between human and machine. There is not only a need, but now the opportunity, to push beyond simple measurements of human experimental feedback, such as filling out surveys or asking people, “Was your workload diminished or not?” Despite unprecedented technological advances, our ability to assess an individual’s or team’s physical, psychological, and physiological readiness is startlingly unsophisticated. We are blind, for example, to any number of problems that plague human operators:

When boredom or data overload lead to prolonged lapses of attention
When emotional resilience hits its breaking point
When exhaustion or hunger degrade cognitive abilities

The emerging field of neuroergonomics aims to remedy this by decoding the functioning of a healthy brain at work [23]. The work is highly interdisciplinary, drawing from human factors, ergonomics, neuroscience, and machine learning to develop adaptive interfaces that sense and respond to changes in an individual’s executive function, an umbrella term that refers to cognitive processes such as planning, working memory, task switching, initiative, and others. These studies are important because, as founder Parasuraman [24] explains, more traditional cognitive science and neuroscience work “often fails to capture the complexity and dynamics of behavior as it occurs naturally in everyday settings. In other cases, the tasks used in laboratory studies may have little or no relation to those confronting people in everyday life.”

Another important milestone in personal sensing came in 2007, when two editors at Wired Magazine noticed that trends in life logging, personal genomics, location tracking, and biometrics were starting to converge. Gary Wolf, one of the founders of what became known as Quantified Self, stated “These new tools were being developed for many different reasons, but all of them had something in common: they added a computational dimension to ordinary existence.” Today nearly anyone can record a half dozen physiological data streams in his quest to become fitter or healthier, including a log of alpha rhythms to diagnose sleep quality. For an elite athlete or corporate executive, the sky is the limit in terms of quantified physiological parameters. This made the development of unobtrusive, wearable, and robust sensors a commercial industry, enabling performance tracking at the individual level at a cost that would have been unfathomable just a decade ago.

The combination of neuroergonomics and individual tracking allows us to finally escape the tyranny of the “average user” which has dominated HCI philosophies. As discussed earlier, many protocols originate from Fitts and others, who examined the most complicated pairing of man and machine at that time—the airplane cockpit. The idea of an average user worked for pilots who simply had to distinguish between one knob or another on a panel and the time-accuracy trade-offs did not vary significantly across the population of users, given the right training. It was also fine for distinguishing the utility of a keyboard versus a mouse. The same cannot be said for today’s information saturated, multi-tasking knowledge worker. There’s huge variability in executive function between individuals, as well as differences that alter performance hour to hour, and from day to day. Thus, the complexity and the number of parameters that must now be optimized together fundamentally changes how we need to approach HCI.

Topol describes how individual tracking is already leading to massive changes in the approach to healthcare in his book The Creative Destruction of Medicine [25]. An example of particular relevance to HCI and the SAA model is blood glucose monitoring. Until a few years ago, the only way for diabetics to monitor their glucose levels in their day to day life was with finger sticks, using a device to lance one of the fingers to produce a drop of blood which must then be smeared onto a test-stick and read by a small device. This procedure is usually performed four times a day, is inconvenient, somewhat painful, but more importantly, still runs the risk of missing large spikes or drops due to food intake, exercise, or incorrect insulin dosages. Today, continuous glucose monitoring is possible with a sensor that samples glucose levels from the interstitial fluid just beneath the skin using a small, indwelling 27 gauge needle. The device still has its downsides, such as cost and the need to calibrate readings with finger sticks every 12 h. However, the sensor is robust enough that wearers can exercise and shower as normal. Topol describes additional sensors in development, noting “contact lenses can be embedded with particles that change color as the blood sugar rises or falls or the glucose level can be assessed through tears. Another imaginative solution has been dubbed a “digital tattoo” in which nanoparticles are injected to the blood that bind glucose, and emit a fluorescent signal that is quantified by a reader on a smart phone.”

The challenge for HCI is to become equally imaginative in what to sense and how to sense it. The artificial intelligence and HCI communities have continued to focus on how the human can better access and utilize computer technology, without mention of how sensing of the human condition and capabilities might also augment the machine. For example, Sandberg writes, “What is new is the growing interest in creating intimate links between the external systems and the human user through better interaction. The software becomes less an external tool and more of a mediating ‘exoself.’ This can be achieved through mediation, embedding the human within an augmenting ‘shell’ such as wearable computers or virtual reality, or through smart environments in which objects are given extended capabilities [26].”

We now have the sensors and digital infrastructure to “remotely and continuously monitor each heart beat, moment-to-moment blood pressure readings, the rate and depth of breathing, body temperature, oxygen concentration in the blood, glucose, brain waves, activity, mood—all the things that make us tick [27].” And in response, we can imagine a machine that uses this information to asses the cognitive and affective state of its user and dynamically alter its level of automation and complexity in response. This is not a new idea—the field of human/brain-computer interface has sought such an adaptive interface since man became so dependent on his machine counterpart. But most of instruments used to examine mental workload today, such as electroencephalography (EEG), electrocorticography (ECoG), functional near-infrared spectroscopy (fNIRS), and functional magnetic resonance imagery (fMRI), were designed for laboratory use where issues of wearability, comfort, portability, and robustness are not an issue. In their review, Pickup et al. note [28], “The notion [of mental workload] has found widespread acceptance as of value in assessing the impact of new tasks, in comparing the effects of different or job interface designs and in understanding the consequences of different levels of automation.” This highlights that much of the prior HCI work focused on initial design considerations rather than true adaptability.

Beyond simple user experience however, these instruments miss the more common and frequent sources of performance decrement, such as lack of sleep, low blood glucose, emotional distress, sickness, etc. Nor does it account for the growing source of information through mobile and social media. A recent survey [29] found that 75 % of workers access social media on the job from their personal mobile devices at least once a day (and 60 % access it multiple times a day). Without the ability to pinpoint the source of increased mental workload in real time, the proper augmentation strategy may not be implemented.

Biomarkers are essential to this endeavor. In addition to the readouts from EEG for example, peripheral measures largely associated with the autonomic nervous system have proven to be salient as well [30]. Biomarkers can mean different things: blood oxygen levels, eye movements, perspiration levels, posture, or any number of molecular metabolites.

Molecular monitoring has been aided significantly by the development of flexible, dissolvable electronics. Advances in electronics and microfluidics have led to the development of miniaturized “lab-on-a-chip” devices and unobtrusive wearable psychophysiologic sensors that can support the rapidly emerging need to instrument the user and monitor physical and mental states. This monitoring, when fed back into the machine system, can provide a “check engine light” for the operator as well as drive adaptive autonomy based on the real-time needs of the operator to improve overall sociotechnical team mission performance. Recent scientific studies have elucidated several molecular targets of opportunity. For example, the neuropeptide orexin A (hypocretin) has been implicated in arousal/alertness. Deficiency of orexin A results in narcolepsy, while other studies [31] suggest orexin is the central switch between sleep/wake states. Previously, monitoring this peptide in patients required a sample of cerebrospinal fluid—an impractical obstacle to widespread adoption. However, recent advances in biofunctionalized sensors have increased sensitivity for orexin detection over 3 orders of magnitude (pM levels) allowing for peptide detection in saliva—a more preferable biomatrix for sampling.

Another molecular target of opportunity is neuropeptide Y (NPY). This 36 amino acid peptide has been implicated in learning and memory and is produced by the hypothalamus. In one study [32], animals whose behavior was extremely disrupted by induced stress displayed significant down regulation of NPY in the brain, compared with animals whose behavior was minimally or partially disrupted and with unexposed controls. One-hour post-exposure treatment with NPY significantly reduced prevalence rates of extreme disruption and reduced trauma-cue freezing responses, compared with controls. Although most studies on NPY have been performed with rodents, there is accumulating data [33] from the genetic to the physiological to implicate NPY as a potential ‘resilience-to-stress’ factor in humans as well.

Diabetics are not the only ones who need to be concerned with blood glucose levels. Previous studies have not only shown decreased cognitive performance with low blood glucose, but that increasing blood glucose can partially compensate for decreases in procedural memory due to sleep deprivation [34], a condition that is increasingly common among workers across industries.

As mentioned, one of the biggest challenges is developing sensors that do not themselves impinge on human performance. Current “wet electrode” EEG monitoring, for example, is cumbersome enough to preclude its use except in the most extreme necessities where lapses in performance could mean loss of life (e.g. Flight traffic controllers). Arguably, the future of human performance monitoring may benefit most from advances in materials science, such as recent work [35] utilizing flexible, dissolvable, and unobtrusive electronics. Transient electronics, made of biocompatible metals and encased in silk, are meant to be implanted into the body, do their work for days, weeks, or even months, and then safely dissolve and resorb in the body.

In addition to measuring the human directly, we must also sense the environment to discover the right correlates to understand degraded executive function in context. Lighting, noise levels, and temperature can all impact cognitive function, and perhaps just as importantly, offer some the easiest of potential solutions.

7.2.2 Assess

Man’s ability to understand is often outstripped by his ability to measure. Assessment of the context of the psychophysiologic and performance data represents a key underdeveloped area in many systems in need of future research. Knowledge of context and changes in context allow human team members to disambiguate under-constrained data that can have different meanings in different settings. Machine reasoning to understand sensor data related to environmental, system, task planning, and user physical and cognitive state will allow the system to share some level of perception with the human operator in the proper context. Fundamentally, assessment addresses three questions: who should augment, under what conditions, and how can we quantify the effects?

To better understand the challenge of assessment, consider the landmark work of Yerkes and Dodson [36], who in 1908 proposed a relationship between adverse reinforcement and discrimination learning in rats. What became known as the Yerkes-Dodson Law, popularized decades later in a review by Hebb [37], resembles an inverted, U-shaped curve, as shown below. The Hebbian version proposes that at low arousal, people are lethargic and perform badly. As arousal increases, performance also increases, but only to a point, after which increasing arousal actually decreases performance. Arousal in this context has often been equated with stress (Fig. 7.2).

Fig. 7.2

Hebbian version of the Yerkes-Dodson law

Thus one might assume that, given the variety and robustness of sensors today, it should be straightforward to assess from physiological data when a user is experiencing less than optimal arousal in an operational setting and thus enhance the human or adjust the computer interface accordingly to maximize performance. However, there have been many criticisms of the Yerkes-Dodson Law, much of it relating to the misinterpretation [38] of the original work. For example, many modern references use terms such as arousal, stress, and performance, terms that were never used in the original paper and remain vague and un-quantified today. Nor was the original work, which was performed using rats, intended to extend to the relationship between stress and performance in humans. Even those experiments performed with rats produced notable exceptions to the expected curvilinear response. For example, as Easterbrook [39] describes in his paper on cue-utilization theory, “On some tasks, reduction in the range of cue utilization under high stress conditions improves performance. In these tasks, irrelevant cues are excluded and strong emotionality is motivating. In other tasks, proficiency demands the use of a wider range of cues, and strong emotionality is disorganizing. There seems to be an optimal range of cue utilization for each task.” Thus, Easterbrook goes on to explain, tasks can be considered complex if it involves attention to multiple cues and simple if it involves focused attention to a single cue. This may constitute Easterbrook’s definition of difficulty, but it is by no means widely accepted.

The problem extends throughout the human performance community, as well as medicine. In a now famous article, Ioannidis [40] suggested that much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong. His conclusions are in keeping with the issues of the Yerkes-Dodson Law as well: (1) the smaller the studies, the less likely the research findings are to be true; (2) the smaller the effect, the less likely the research findings are to be true; (3) the greater the financial and other interests, the less likely the research findings are to be true; (4) the hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.

The appeal of the Yerkes-Dodson Law lies in its appeal to our intuition. We have all encountered cases where arousal, in the form of a cup of coffee or an impending deadline, allowed us to focus and perform better than we might have otherwise. Likewise, we have all experienced stress, in form of a cold or an overflowing email inbox, that appeared to degrade performance. What’s missing from many of the studies today is the ability to determine the context of stress or arousal, and the patterns that link that context to individual performance. This is critical for determining whether augmentation is needed and the predicted improvement in performance based on the augmentation selected. Such an approach requires, at least initially, the fusion of much more sensory data from both the individual and the environment, than most research currently includes. As Tapscott and Williams warn [41], “When the devices we use to capture and process data are sparsely distributed and intermittently connected, we get an incomplete, and often outdated, snapshot of the real world.”

The most common approach to pattern recognition is based on models, such as Markov models or neural networks, which provide some general knowledge of the system they are observing. However, both these approaches require large sets of training data in order to produce accurate results. For example, a study might monitor EEG channels combined with heart rate data as a participant is put through scenarios that are believed to represent low and high mental workload tasks. The training data establishes classification criteria for the two states. As the individual is then tested with real tasks, one sees attribution of low and high mental workload, typically with accuracy in the 80–85 % range. An issue manifests when baselining takes so long that the test subject enters a high stress, disengaged mental state before the experimental portion even begins. Thus, data which are supposed to indicate stress are already one or two standard deviations above baseline and thus little variation is seen in assessed response.

All of this data, however, is taken in a laboratory setting with very controlled parameters and tasks. If the characteristics of the data being analyzed deviate significantly from the training model, then previously learned data sets must be relearned along with the new data set. This means that without retraining, a model that relies on select EEG channels to produce impressive accuracy rates for a vigilance task, for example, often does not work well when applied to a different task in a different setting. This becomes even more problematic when you consider that the average worker engages in several tasks as part of his work, each of which may rely on a distinct assessment or augmentation. One task may require intense vigilance while another may require a mix of creativity, abstract thinking, and the ability to forecast. Today, a study that focuses on assessment of vigilance will be of little consequence to an assessment of creativity.

The answer then is not to collect less data, at least initially. The goal is to collect as much data as possible to discover the relevant performance patterns for each individual. This will likely require a data-driven algorithm that requires no a priori knowledge of the underlying system and can operate without a closed data set. The algorithm would be capable of learning and would include some or all of the following features:

Would not need to be tuned based on expected features in a data stream.

Able to learn and recognize patterns in an unlabeled data stream.

Works online—the process of learning and recognition would occur simultaneously without an offline training phase.

Quickly converges to recognize data patterns after only a few occurrences.

Finds patterns in nonlinear, nondeterministic, and non-Markovian systems.

Interpretable structure and produces an interpretable model.

Hierarchical pattern detection for the determination of context.

Such a system is particularly important when trying to merge sensor data across multiple time scales. Many parameters can be measured on an hourly or daily basis, but the trends that indicate the source of aberrations may not be apparent for months. For example, Sky Christopherson, a former Olympic cyclist turned technology CEO, started having health problems despite a lifetime of fitness and healthy eating. Because of his familiarly with personal tracking as an elite athlete, he started collecting a range of biomarkers and environmental data to discover where he could make significant, positive contributions to his health, including sleep, diet, stress, exercise, and traditional physiological measures such blood pressure. One of the most significant causes of stress was related to sleep quality, and only after collecting data for nearly 9 months did he notice trends that varied with the season. His assessment was that the real issue was room temperature that varied with outdoor temperature, so he installed a water-filled cushion on his bed that actively regulated body temperature year round. Although he implemented other changes, the effects were profound. He not only reversed his health issues, but in the process set a world cycling record at the age of 35—a feat previously thought biologically impossible due to declining testosterone levels [42].

It would also be desirable to add a predictive function to the learning algorithm. The simplest method for predicting the next state is based on the probability calculated by the number of times each state has been outputted from the system. Lacking a model for the underlying system itself, this approach might in fact be the only reasonable method of prediction. Future activities include enabling hierarchical and orthogonal learners to detect patterns of patterns, detecting spatial patterns within the model, determining similarity measurements between patterns, and incorporating visualizations of the model to assist human decision-makers in the post-processing step to identify meaningful and more nuanced patterns.

Performance assessments would ideally be quantified relative to an individual baseline collected over time. As we saw with the confusion around the Yerkes-Dodson Law, to say simply classify someone as tired or stressed provides little correlation to performance. But if it were possible to know, for example, when a someone’s critical thinking ability decreased by 25 %, a better decision as to how and when to address the symptom of fatigue could be made. Nor should these assessments occur only at the tactical level. Supervisors and leaders are just as likely, if not more so, to be suffering from lack of sleep and exercise, poor nutrition, and information overload that can impair decision-making.

Initially, such a system might sound too complex to be manageable, much less design. However, Kurzweil’s view of complexity in his book, How to Create a Mind, is undoubtedly relevant. He points out:

We might ask, is a forest complex? The answer depends on the perspective you choose to take. You could note that there are many thousands of trees in the forest and that each one is different. You could then go on to note that each tree has thousands of branches and that each branch is completely different. Then you could proceed to describe the convoluted vagaries of a single branch. Your conclusion might be that the forest has a complexity beyond our wildest imagination. But such an approach would literally be a failure to see the forest for the trees. Certainly there is a great deal of fractal variation among trees and branches, but to correctly understand the principles of a forest you would do better to start by identifying the distinct patterns of redundancy with stochastic (that is, random) variation that are found there. It would be fair to say that the concept of a forest is simpler than the concept of a tree. [43]

Of course, there are challenges to such an approach as well. As the volume of raw data from various sensors increases, the problem of finding underlying sources within the information becomes more difficult and time consuming. Increasing the spatial resolution increases the number of data channels. Increasing the temporal resolution increases the sampling rate. Such a system would likely require long periods of data collection and analysis, along with input directly from the user, before it was capable of reliably recommending appropriate augmentation strategies.

Only gold members can continue reading. Log In or Register to continue