Dec 15, 2022 8 min read

Make Working Memory Work For You

*Image by* *Mateo Vrbnjak* on *Unsplash*.

This is part 4 in a series on what we know about how we learn and how this knowledge should inform how we teach. The series is intended for teachers, students, and developers of education technology who want to be more informed about their practice. Parts 1, 2, and 3.

Let’s start with a lesson on RNA folding:

The RNA folding problem describes the twin problems of the prediction and design of RNA molecular structures. Four types of RNA nucleotides — adenine, uracil, guanine, and cytosine — rest on a flexible ribo-phosphate backbone, forming chemical bonds of varying strengths with each other as the molecule wraps upon itself. Predicting these structures from proposed designs has been proven to be NP-hard. Designing RNA structures to fold into pre-defined targets is even more difficult…

Have I lost you yet? If you already know a lot about RNA, maybe not. If you don’t know your ribonucleic from your deoxyribonucleic, maybe so.

When students have to pay attention to too many things at once, it stresses their working memory. This “stress” is called cognitive load.

What it is.

Working memory describes one particular characteristic of human minds: our limited, short-term, processing power. Our working memory processes what we pay attention to, but there are limits to how much we can process at one time. Cognitive load describes the burden that a learning experience places on the working memory of the learner.

A little cognitive load can be good — if they’re hardly engaging with the material, not much learning will take place. But too much load is bad for learning.

Imagine your attention span to be like a funnel and your teacher to be pouring water into it. If they pour slowly, all of the water can go through the funnel. But if they pour too fast, the funnel gets backed up, and some of the water pours out. This is “cognitive overload”: it’s what happens when you try to teach someone too many new things at once.

Cognitive load can also inhibit learning when it imposes a mental processing cost without a corresponding learning benefit. This can be the case with taking lecture notes verbatim. The student spends so much effort transcribing the words that she can’t process the meaning of the lecture very deeply.

How it works.

We’ve talked before about how cognitive effort often helps learning. The research on cognitive load is an important qualification: the right kind of cognitive effort helps learning. We have a limited amount of mental effort to spend during any learning experience. If we spend this mental effort unwisely, by hearing a ringing phone, checking our newsfeeds, or focusing on that terrible joke the teacher said, learning suffers.

The research on cognitive load also contributes to two larger principles that come up in other research contexts.

Prior knowledge matters.

We all understand this at some level. I’m not going to explain something to a child in the same way that I explain something to an adult. Your Pokemon-obsessed child will probably remember those new Pokemon (and their special attributes) better than you for the same reason.

The “expertise reversal effect” is a nice illustration of the impact of prior knowledge. The same information that helps novices learn something can impose learning costs on those with more experience. Extra information becomes redundant and even confusing for more advanced learners.

But it’s easy to forget the importance of prior knowledge. As we learn more and more about something, our understanding about what it’s like to “not know” fades. The most common term for this is “expert blindness.” The practical effect is that it becomes harder for experts to know what students don’t know, which makes it challenging to design instruction. This leads to cognitive overload: instructors providing too much information (too quickly) for students to handle.

Learning activities have to be aligned with learning goals.

Well this seems perfectly obvious, doesn’t it?

But it’s more difficult than it seems. Often, we think that a certain learning activity leads to desired outcomes and we turn out to be wrong. The research on cognitive load suggests that we should identify and remove aspects of the learning experience that drain mental resources for no good (learning) purpose. Figuring out which aspects are which can be quite challenging.

But consider the following two presentations of the same problem.

Both ask the same question. On the right-hand side, however, I’ve made the problem “artificially” more challenging. Instead of putting the angle and side labels inside the triangle, I’ve added another layer of labeling and moved them further away from the angles they describe. This imposes a small “extra” cognitive load. Instead of being able to look at the triangle and read the measurements off directly (as on the left), a student has to switch back and forth between the labels and the actual triangle as they solve the problem.

Is imposing this extra load worth it? I think most teachers would say no. Maybe the students’ eye movements will get faster, but it’s not activity aligned to the (presumed) learning goals about geometry.

How we know it works.

Certain basic premises of cognitive load have their roots in broader psychological research. The distinction between long-term and working memory has some complexities, but it is one of the long-standing tenets of cognitive psychology. The idea of a limited working memory as a feature of our brains has been repeatedly explored.

It’s also reasonably clear that distractions from the main learning task impairs learning. Phone calls that interrupt lessons lead to lower learning gains. People doing distracting things on their laptops distract fellow classmates, leading to lower learning gains. External interruptions are just not good.

The bigger question is: how and when do additional working memory burdens help (as opposed to hurt)? Answering this question is tricky, because sometimes doing something that really challenges our working memory improves learning. And sometimes it inhibits learning.

Within the cognitive load framework, the explanation is that “good” cognitive load helps learning; and “bad” cognitive load hurts it. But which is which?

Early research in cognitive load focused on mathematical problem-solving. John Sweller’s key insight was that students could spend a lot of time solving math problems without actually learning how to solve them very well. That’s because the students kept applying inefficient methods for solving these problems, without ever discovering more efficient methods. Giving students more guidance, however, helped them solve future, structurally identical problems.

A second line of cognitive load research has focused on the design of effective multimedia presentations. Ask students to watch an animation, video, or slide show. And vary where you put the text, vary the animation style, vary just about anything you can imagine. This line of research led to a series of recommended best practices in instructional design (the recent book "e-Learning and the Science of Instruction" is probably the best resource, but this Wikipedia article provides a decent summary).

One of the prevailing limitations in these studies has been measuring cognitive load accurately, and distinguishing between “good” and “bad” load. The very same activity that can help one learner in one context can impair learning with a different learner in a different context, but it can be difficult to predict beforehand.

Research on the “modality effect” is illustrative. A long series of research studies supported the idea that multimedia instruction should take advantage of both visual and audio “channels” to make learning more efficient. Instead of presenting both graphics and text visually, presenting the same graphic visually and the same text as audio confers learning advantages.

Except sometimes it doesn’t. When researchers tried to extend these findings to more practical classroom settings, some even found a “reverse modality effect” — splitting information up into visual and audio channels led to lower learning gains.

The explanation for these contrasting findings seems to be whether the instruction is “system-paced” or “learner-paced.” The studies that established the modality effect involved short multimedia lessons that students had no control over. The studies finding no modality effects (or even reverse modality effects) gave students the opportunity to learn from instructional materials on their own. This meant the students could seek out information that they wanted; they weren’t just strapped in for the multimedia ride.

Another example of the difficulty of knowing which kind of load is good or bad beforehand is the research on multiple representations. Representations, in this case, has a broad meaning: it might include text descriptions, pictures, simulations, models, symbols, flow-charts… you get the idea.

If you provide students with two representations that contain essentially the same information, it’s redundant. And redundancy seems to tax working memory. So from this perspective, presenting students with multiple representations of the same concept would seem to impair learning.

Of course, multiple representations might also facilitate learning. When students work with multiple representations they can make meaning by comparing the representations to each other. This can lead to stronger knowledge of the underlying concept.

Research supports both of these ideas. Integrating multiple representations does impose high cognitive load. And, in some cases, it’s just too high and students don’t seem to learn much. But techniques that support meaning-making can make this “high load” activity pay off. Asking students to explain differences and similarities between the representations, for example, helps. Students that have enough prior knowledge to be able to integrate these representations can really benefit from doing so.

Often, the research on cognitive load is summarized by just stating “don’t overload students with information.” But these findings reveal considerable nuance about how to use the brain’s processing power effectively.

A number of factors are at work here. Whether cognitive load is good (or bad) for learning depends on prior knowledge, on how much control learners have when learning the material, on supporting cues and scaffolds, and probably on some other stuff we haven’t figured out yet.

How to implement it.

So, given this research, what are the practical takeaways? Are there practical takeaways?

Here are a few.

Representational fidelity can be bad. There’s a through-line in the development of education technology that says something like, “making things more realistic is better.” Instead of still images, use animations. Instead of simple animations, use more realistic ones. Instead of standard definition television, use high definition. Instead of video, use VR. The research on cognitive load helps illustrate why this isn’t usually true. Realistic images contain lots of things to be distracted by, which can impose higher irrelevant cognitive load. Simple images don’t.

Simplification comes up in other situations, too. It’s not uncommon for undergraduate science teachers to use graphs from research papers in their slides. But the purpose of the original graph and the purpose of the instructional graph are different.

Authors submitting their papers to peer-review are often trying to pack in as much information as possible into their graphs. Research papers are also aimed for an expert audience, so readers already know the meanings of certain symbols and they’ve seen plenty of graphs like this before. It doesn’t take much effort for experts to read the graphs.

Instructional graphs, however, are about teaching students something about what’s illustrated. It’s a different audience — students — with lower prior knoweldge and different fundamental needs. There’s usually too much complexity in the original graphs for them to be ideal teaching tools. So simplify them.

Another technique to lower excessive cognitive load is the slow reveal. Let students grasp one aspect of a new idea before introducing more complexity. Some of the really best examples of this, in my opinion, come from game tutorials. The best game tutorials layer lessons about game mechanics in a seamless way.

By the same logic, instruction can be faded out. A huge literature has developed around how to provide learners with the right support at the right time. Several studies have demonstrated how fading worked examples can facilitate the acquisition of problem-solving skills, especially when paired with self-explanation prompts. This is a way to combat the “expertise reversal effect” mentioned earlier.

I use cognitive load to remind myself to ask important questions. What am I really trying to teach? Are the activities that I’m having students do really about what I want to teach? Can they handle this amount of information at once? Can I re-structure it in a way that helps them?

These are questions that we have to revisit over and over again.