Next, we calculated an overall measure of the percentage single task to combined task change by taking the median of the percentage change scores across the two tasks. The median and spread of scores from these derived measures are shown in Figure 7.2, which illustrates that the majority of participants (15) had equal or better memory span under combined task compared with single task conditions. Four participants also had equal or better verification span performance under combined task conditions. The overall measure of combined task performance was around 94 per cent of single task performance. This reinforces the case that combining
These results cannot be attributed to a lack of power or sensitivity in the design. Each participant was performing at their own span level, and acted as their own control with combined task performance compared with single task performance at span. As such, performance levels were not at ceiling or at floor. These data would be very difficult to incorporate within a single flexible resource model or attention switching model of working memory. They also reinforce the view (e.g. Caplan and Waters 1999) that taking only a storage measure might provide a misleading picture of the capacity of working memory as a whole.
A possible caveat is that participants were somehow putting more ‘cognitive effort’ into the combined task condition, thereby effectively increasing the total resource available. This seems unlikely given that we adopted a span procedure that is widely used in assessments of working memory span as well as in measures of digit or word span.
A further possibility is that the one per second presentation rate for memory span single task might have been too rapid for two-digit numbers relative to the time available in the combined task condition. In this last condition, with two problems to verify within ten seconds, a participant might respond at a leisurely pace, allowing time to encode and rehearse presented totals. As the number of problems increased, the amount of time available for encoding each two-digit total would be reduced accordingly, but participants might have been strategic in their use of timesharing between the tasks, allowing more time for the memory span combined task, thereby giving rise to the combined task improvement in span. In order to explore this possibility, we examined the verification response times in single and combined task conditions. If participants were indeed failing to use their full capacity in the single component task conditions, we might (p.125) expect a different pattern of response times in the single and in the combined task conditions. Any trade-off in the combined task with participants taking advantage of time between sums for verification to focus on encoding and rehearsing the numbers should show up in slower response times for the verification component of the task.
Mean correct response times are illustrated in Figure 7.3 for two, three and four problems. There were too few correct responses for five or more problems. An ANOVA revealed no overall difference in response time between single and combined task, F(1,18) ‹1. However there was a significant reduction in time as the number of problems increased, F(2,36) = 18.42; MSE = 0.032; p ‹0.001, and a significant interaction, F(2,36) = 11.18; MSE = 0.025; p ‹0.001. From Figure 7.3 it is clear that the reduction in correct response time occurred primarily in the single task condition; participants could improve their performance under single task conditions until they reached their span level. With the combined task, the response time function is largely flat. This indicates that participants may well be attempting to free up time in the combined task condition, possibly to switch between processing and rehearsal of the items for recall. This might offer a partial explanation for the slightly higher combined task memory span, but does not offer a complete account because the combined task memory span and verification span measures resulted from performance on the longer lists of problems when verification demands as well as storage demands were very high. However, it is possible that participants might have benefited from having longer for rehearsal under combined task conditions relative to single task conditions. Specifically, in the single task condition, the mean memory performance was 3.2 items (see Figure 7.1), with a rate of 1.5 s for presentation of each item, including interitem intervals. In the combined task condition, mean memory span was 3.8 items, and with a 11.5 s presentation time (10 s plus 3 × 0.5 s interstimulus intervals), this equates to around 3 s for each item.
Experiment 2
In Experiment 1, there was a suggestion that participants might have had effectively a longer period in which to encode and rehearse the numbers for recall for some of the trials in the combined task condition compared with the single task memory condition. Experiment 2 aimed to address this possibility by presenting the single task memory span materials at a slower rate. Moreover, given the lack of evidence for a trade-off between memory and verification components of the tasks, Experiment 2 also served to test the reliability of the findings from Experiment 1.
Method
Participants. Eighteen new participants were tested, all native Norwegian speakers from the University of Bergen, Norway. There were 9 males and 9 females (mean age 24 years, range 21 to 27 years).
Procedure. Tasks and procedure were as for Experiment 1, except that during the memory single task, items were displayed for 2 seconds each, with a 0.5 second interitem delay. The materials were those for Experiment 1 except that the arithmetic problems were allocated in a pseudo-random fashion to different lists, as were two-digit numbers for recall.
Results
One participant performed at floor on single task memory span, and these data were discarded. Span measures for the remaining 17 participants were calculated for both memory and verification during single and combined task procedures (Figure 7.4). An ANOVA confirmed that the memory span means did not differ between single and combined task conditions F(1,16) ‹1. Verification span combined task performance was poorer than single task performance by around 15 per cent and this difference was statistically reliable, F(1,16) = 5.12; MSE = 0.692; p ‹0.05).
The combined percentage change measure revealed that there was no overall combined task effect (98 per cent of single task performance). Like Experiment 1, participants were performing at their own span level and therefore this lack of an overall drop in performance cannot be attributed to performance levels being at floor or ceiling.
Mean correct verification times for two, three, and four problems in single and combined task conditions showed a very similar pattern to that for Experiment 1, with no overall difference in correct response time between single and combined task, F(1,12) = 1.57; MSE = 0.056; n.s., an effect of number of problems, F(2,24) = 9.46; MSE = 0.030; p ‹0.001, and an interaction, F(2,24) = 11.49; MSE = 0.023; p ‹0.001, with a reduction only in single task response time as the number of problems increased.
Mean memory scores under combined task conditions (Figure 7.4) show that participants averaged 3.3 items, or approximately 3.3 s per item, including any time for verification. Single task conditions involved presentation of an item once every 2.5 seconds, with no interpolated (p.127)
Discussion
The major impact of increasing single task exposure time for memory stimuli was that there was no combined task advantage for memory span. As in Experiment 1, there was no overall drop in performance resulting from combined task demands, and no evidence that memory span was being protected at the expense of verification or vice versa. Increased time for the memory task alone allowed the presentation time for single task and dual task span to be more comparable, but the total resource available for span as a single task appears to be the same when it is combined with arithmetic verification. The same interpretation broadly applies to the verification task. The dual task drop was significant, but it comprised a rather modest 15 per cent cost for performing verification (at span), and memory (at span) in combination.
The speeding of responses as the number of verification problems increased points to some deployment of additional resources as they are required within the single task condition until performance breaks down. As for Experiment 1, the lack of a reduction in response time under combined task conditions is consistent with participants switching between verification and rehearsal of the items for recall. The lack of an overall drop in performance between single and dual task conditions is not consistent with the use of a single flexible resource for both processing and storage, and is more consistent with the view that there are separate pools of resource available that can work in parallel for memory and for semantic processing.
(p.128) Discussion of experiments 1 and 2
We set out to explore further a previous finding (Duff and Logie 2001) that separate cognitive resources might support respectively processing and temporary storage in working memory span tasks. Our new data are consistent with our previous findings and demonstrate that the apparent fractionation of processing and storage is unlikely to be due to distinctiveness in the codes used for retaining the items for recall relative to the processing task. Specifically, the cognitive demands of retaining a number sequence that is set at the limits of memory span for each individual participant appear to result in very little constraint on the ongoing processing when both memory and processing are task requirements. Likewise, demanding processing conditions appear not to constrain the ability of participants to retain as many items as they could when that processing demand was absent. Because the overall time for presenting the items in the combined task condition changed very little with the length of the sequence, our results are unlikely to be influenced by the length of time between item presentation and recall. The use of a longer single task presentation time for memory span in Experiment 2 demonstrated that the results of Experiment 1 are unlikely to be due to the amount of time for rehearsal in the combined task conditions.
The lack of evidence for a trade-off in performance between processing and storage demands is not compatible with the Barrouillet view that attention switching is required between the two task components, and suggests that maintenance of items for recall may continue even when participants are performing a highly demanding cognitive task. This points to the suggestion that there are separate cognitive resources for memory and for processing in the verification task, and that these can operate largely in parallel rather than relying on the switching of a single attentional resource between them. The notion of separate pools of attentional resource is not unprecendented (e.g. Hunt and Lansman 1981; Wickens 1984, Wickens and Liu 1988), in addition to evidence offered in more recent studies (e.g. Bayliss et al. 2003; Cocchini et al. 2002).
One further approach to investigating whether or not memory and processing components of working memory span are independent is to consider whether performance on the memory and processing measures tends to covary. Waters and Caplan (1996) reported low correlations between these measures in a sample of 94 participants who were mainly undergraduate students aged 18–37. They suggested that a combined measure of processing and of memory would provide a more reliable measure than would a memory score on its own. More recently, Waters and Caplan (2003) reported scores on a range of working memory measures derived from 139 participants spread across a wide age range (18–80+), with at least high school education. The working memory measures each had high internal consistency, but the tasks showed only modest inter-correlations. Moreover, test–retest reliability improved dramatically when combined scores for at least two different tests were used. The participant numbers in Experiments 1 and 2 were adequate for investigating the impact of experimental manipulations in task demands. However exploration of the processing and storage elements from an individual differences perspective would require rather larger numbers of participants that are more heterogeneous with respect to age and educational background. This was the aim of Experiment 3.
Experiment 3
In Experiment 3 we adopted an individual differences approach to explore the relationship between processing and storage in working memory span tasks. We took advantage of the opportunity to collect data via a web site in cooperation with the BBC. Collecting data via the internet has the major advantage that it allows for a diverse and very large sample of participants. Sample (p.129) numbers are typically in the thousands and can be in the tens or even hundreds of thousands (e.g. Reimers, in press; Reimers and Maylor 2005), and both experimenter and demand effects are minimized. The problems associated with web-based experiments are now fairly well understood and there are documented procedures for maximizing the quality of the data obtained (e.g. Birnbaum 2004; Reips 2002). The experimental procedure can be standardized with control of presentation format and times, and of retention and response intervals, and allows for collection of real-time response times as well as accuracy data. Disadvantages include a lack of control of the conditions under which the test session is completed (e.g. background noise, time of day, illumination), and multiple attempts by the same individuals. The issue of multiple attempts can be handled by selecting for inclusion only the first attempt from any one computer, using a cookie system for recording previous attempts. The experimental environment will add noise to the data, but it is reasonable to assume that this is a random effect that will have little or no impact when set against the large numbers of participants. Dropout rate can be a serious problem if the experiment involves large numbers of trials or takes overly long to complete. This limits the range of measures and the number of data points per measure, but this again is compensated by having large numbers of participants so that between-participant designs can generate very substantial amounts of data across conditions.
The experiment reported here involved measuring working memory span via a BBC web site by means of sentence verification and recall of sentence-final words. Sentence verification times and accuracy were recorded and memory was tested using a serial reconstruction procedure. Working memory span was one of several measures of cognitive ability based on self report or on memory performance including digit span, visual pattern span, spatial orientation and binding of perceptual features in working memory.
Method
Participants. A total of 49,902 sets of data were collected via a BBC web site over a two-month period with participants drawn from a total of 150 countries. Of these, 41,917 comprised the first attempt from any particular computer. Each participant was asked to state their age, sex, the highest level of education they had reached – primary school, secondary or high school, technical or vocational college, other college, university graduate, postgraduate or professional degree, and to rate their general health – excellent, very good, good, moderate or poor. Participants who did not provide all four of these demographic details were excluded. Because the web site was in English, it is likely that most if not all participants were reasonably fluent in English even if it was a second language. However, given evidence that memory span may vary by language spoken and language fluency (e.g. Naveh-Benjamin and Ayres 1986) for the analysis reported here, only participants from countries for which the dominant language is English are included. This represented the majority of the total participants who provided demographic details and included Australia, Canada, Ireland, New Zealand, the United Kingdom and the United States. Information regarding first language was not requested, but this selection criterion should maximize the number of people for whom English is their native language. Finally, only participants reporting their age as between 16 and 60 years, and who rated their general health as good or better were included. Analyses of the complete data set will be reported elsewhere, and the experiment is ongoing at the time of writing, so the eventual data set may be much larger. However, the focus in this chapter is on the data for healthy adults without the possible impact of younger or older age, and the data patterns for the sample sizes reported here are unlikely to change with a larger sample. The resulting sample size was 24,630, with 14,955 female and 9675 male, mean age 31.73 years, SD = 11.56.
(p.130) Web-based tests
All of the tests described below were programmed by BBC staff to function within Macromedia Flash, a system that allows the running of real time data collection via the Internet.
Working memory span involved presenting short sentences similar to those used by Baddeley, Logie, Nimmo Smith et al. (1985) for verification against semantic knowledge. Sentences were either typically true, e.g. ‘flies are insects’ or typically false, e.g. ‘mobile phones are made of cheese’. Participants had to use their mouse to click on a ‘true’ or ‘false’ button on the screen as quickly as possible, and to remember the final word of the sentence. A second sentence was then presented for response and so on until the sequence of sentences was complete. At this stage, a set of 20 words was shown in a square array in the left two thirds of the screen, with the correct sentence final words given in random positions while the remaining words were unrelated foils. Participants had to click on each of the sentence final words that they could remember and drag them to boxes arranged vertically on the right of the screen in the order in which they were presented. There was no maximum time for a true/false response or for recalling the words. Sequences started with two sentences and increased to a maximum of six sentences, with two sequences for each sequence length. The test stopped if the participant was unable to recall all of the sentence final words for two successive sequences. Working memory span was taken as the average of the two longest sequences for which the sentence-final words were recalled correctly.
Digit span comprised presentation of random digit sequences of increasing length, with digits shown one at a time at a rate of one per second in the centre of the screen. At the end of the sequence, a blank box appeared in the centre of the screen and participants had to type in the digit sequence in the order shown. Sequence length started with three digits and increased to a maximum of nine with two sequences at each length. The test stopped if the participant was unable to correctly recall the digit sequence on two successive occasions. This test was originally devised by Jacobs (1887) but is included in most standard measures of mental ability and is widely used as a measure of short-term verbal memory capacity. Its correlations with other measures of cognitive ability such as reading comprehension tend to be rather lower than those reported for working memory span (e.g. Daneman and Hannon, this volume).
Visual pattern span was measured by presenting square matrix patterns with white and blue squares for immediate recall, based on a procedure from Logie and Pearson (1997; see also Della Sala, Gray, Baddeley et al. 1999; Wilson, Scott and Power 1987). Each pattern was shown for two seconds. It was then replaced with a blank matrix in the same location and participants were to click on the squares that had previously been filled in blue. The patterns started with 3 × 3 square, then 3 × 4 (5 blue squares for both), then 4 × 4 (8 blue squares) then 4 × 5 up to a maximum of 5 × 5 (9 squares blue for both), with two patterns at each level. The test stopped when participants failed to recall all of the squares correctly on two successive trials.
Spatial orientation involved presenting a male figure, arms outstretched with a blue ball in one hand and a white ball in the other. On any one trial, the figure was shown in one of four positions, namely facing the viewer and upright, with their back to the viewer and upright, facing the viewer and upside down or with their back to the viewer and upside down. The task was to click on the words ‘left’ or ‘right’ at the bottom of the screen to indicate in which hand the figure held the blue ball. Participants had to complete as many trials as possible in a period of 30 seconds. This task was based on the Manikin test used by Logie and Baddeley (1983).
Memory binding comprised presentation of colored shapes at one of four positions on the computer screen, top, right, bottom or left. The shapes appeared for 2 seconds for every shape shown, for example two shapes would have a total of four seconds display time. Following removal of the shapes, four colored patches appeared along the top of the screen, and four outlines of the shapes appeared down the left of the screen. The task was to recall the color, shape and position (p.131) previously displayed, first by clicking on a color, then clicking on the shape shown in that color, and then clicking on the position in which that colored shape had been shown. No shape or color was repeated on any one trial. All three features had to be recalled correctly, thereby giving a measure of the binding in memory of the three features for each item shown on a given trial. The test started with one item for recall and increased to a maximum of four items. The test stopped if participants failed to recall all three features for an item on two successive trials. In all cases, the color of each item was drawn from the set red, yellow, green and blue, and no color was repeated on any one trial. Participants were randomly allocated to one of four conditions. In conditions one and two, participants were shown geometric shapes drawn from the set, square, circle, triangle, diamond. In conditions three and four, the shapes were of animals namely camel, penguin, elephant and pig, based on the Snodgrass and Vanderwart (1980) line drawings. In conditions one and three, for trials with two, three or four items, the items were shown simultaneously on the screen for respectively 4 seconds, 6 seconds and 8 seconds. In conditions two and four, the items were shown consecutively at a rate of 2 seconds per item. For each trial, the combination of shape, color and position for each item was allocated at random, and each trial involved different combinations. However, these combinations were identical for all participants in a given condition. A detailed analysis of these data will be reported elsewhere. For the purposes of this chapter, data were collapsed across conditions.