Over the past year, we have all heard about various hardware-level security vulnerabilities affecting the microprocessors that power our modern infrastructure. These issues, with names like “Meltdown”, “Spectre”, “Speculative Store Buffer Bypass”, and “Foreshadow”, collectively known as speculative execution side-channel vulnerabilities, impact a performance optimization in which microprocessors attempt to guess ahead about future program behavior in order to reduce the time waiting around for data to be loaded from slower external memories. Today, yet another set of vulnerabilities were disclosed, known as Microarchitectural Data Sampling (MDS). These are similar to those we have seen before, but they involve different parts of the processor.
The vulnerabilities are in the implementation of speculative execution, which is where the processor tries to guess what instructions may be needed next. They exploit the possibility of reading data buffers found between different parts of the processor.
- Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
- Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127
- Microarchitectural Fill Buffer Data Sampling (MFBDS) CVE-2018-12130
- Microarchitectural Data Sampling Uncacheable Memory (MDSUM) CVE-2019-11091)
Not all processors are affected by all variants of MDS.
According to Intel in a May 2019 interview with Wired, Intel’s researchers discovered the vulnerabilities in 2018 before anyone else. Other researchers had agreed to keep the exploit confidential as well since 2018.
On 14 May 2019, various groups of security researchers, amongst others from Austria’s Graz University of Technology, Belgium’s Catholic University of Leuven, and Netherland’s Vrije Universiteit Amsterdam, in a disclosure coordinated with Intel, published the discovery of the MDS vulnerabilities in Intel microprocessors, which they named Fallout, RIDL, and ZombieLoad. Three of the TU Graz researchers were from the group who had discovered Meltdown and Spectre the year before.
The state of the processor
Modern processors are designed to perform calculations upon data under the control of stored programs. Both the data and the program are contained within relatively large memory chips that are attached to the external interfaces of the processor. If you’ve ever looked at a modern server motherboard, these look like parallel rows of small circuit boards (known as DIMMs, or Dual Inline Memory Modules) that run along each side of the central processor chip. The processor itself has small amounts of internal memory, known as caches, that are loaded with copies of data from memory to perform calculations. Data must usually be “pulled” into the caches (an automatic process) for the “execution units” to perform calculations.
These caches form a hierarchy in which the innermost level runs at the core speed of the processor’s execution units, while the outer levels are progressively slower (yet still much faster than the external memory chips). We often say in computer engineering that you can have small and fast or big and slow memories, but the laws of physics don’t let you have both at the same time. As a result, these innermost caches are tiny, too small in fact for even a single whole cat image (e.g. 32 kilobytes), but they are augmented by multiple additional larger levels of caches, up to many megabytes in size. Compared to the external memories (which may be many gigabytes), even the larger caches are small. As a result, data is frequently pulled into the caches, and then evicted and written back to memory for other data to be loaded.
Caches keep copies of data from external memory, thus every entry in the cache has both an address, and an associated data value. Caches are further split at the innermost level into those that keep copies of data, and those that keep copies of program instructions. When a program adds two numbers together, the values are typically first loaded into the innermost processor cache (the “L1” data cache, or L1D$), then the calculation is performed, and finally, the result is also stored in the L1 data cache. Eventually, the result will also be written back to the correct corresponding external memory location, but this can happen at a more leisurely pace. In most modern systems, results will remain in the caches until space needs to be freed up because such data might be used again. Keeping it in the caches reduces the time to access the data.
Processor decisions: Now vs. then
In the earliest days of computing, processors ran at similar speeds to the memories to which they were attached, and which contained the data upon which they would operate. Over time, the relative difference in performance became more acute. Today, there may be whole orders of magnitude between the relative performance of a processor and its external memories. Thus it is critical to ensure that data is available to the processor right when it needs it. Sophisticated hardware such as “prefetchers” help to pull data that may be needed into the caches ahead of time, but there are still many situations in which the processor needs data that is not cached.
A common example is that of a branch condition. The processor may hit a piece of code that needs to make a decision – is it raining today? The answer may be stored in a piece of data contained in the external memory. Rather than waiting for the data to be loaded into the cache, the processor can enter a special mode known as speculation in which it will guess ahead. Perhaps the last few times this piece of code ran, the result was that it was not raining. The processor may then guess the same is still true. It will thus begin to “speculatively” execute ahead in the program based upon this potential condition, all while the data needed is being loaded from memory (“resolved”). If the guess is right, significant time can be saved.
Conversely, if the guess is wrong, the processor must now discard any transient operations that it has performed as a result of miss-speculation. It will then recompute the correct result. All of this happens under the hood, in the processor implementation, the so-called “microarchitecture”. This differs from the programmer visible model of the world known as an “architecture”. At an architectural level, the program tests a condition (is it raining?) and only ever executes the correct path in the program as a result. The fact that the processor may speculate underneath is supposed to be invisible to the programmer, and for decades it was assumed to be the case.
The lessons of Meltdown and Spectre
At the start of 2018, Spectre and Meltdown taught us that these assumptions were no longer valid. It was discovered that a determined attacker could leverage “side channels” to exfiltrate information from the microarchitectural state and make it visible at an architectural level. This is typically done by exploiting the fundamental property of caches: they make access to recently used data faster. Thus, it is possible to write code that will perform timing analysis on memory access, inferring whether a location is present in the caches or not. By performing a second (speculative) memory access that depends upon the value of sensitive data contained in a first memory location, an attacker can use cache side-channel timing analysis to reconstruct the value of the sensitive data.
Meltdown exploited such a sequence of loads, in combination with an additional optimization common to many different processors allowing a data load from privileged memory to proceed in parallel with the corresponding permission check to see if the access was permitted. To save time, the processor may speculate that the access is permitted, allowing a second “dependent” load to be performed based upon the content of the first. When the access check eventually completes, the processor kills the speculative state and generates the appropriate access violation error, but by then the second load has altered the cache in a way that can be later measured by the attacker to reconstruct the content of the privileged memory.
Meltdown exists in part because of the complexity of the infrastructure used to handle loads (reads) of memory, and stores (writes) to memory. Modern application programs don’t operate directly upon the address of data in physical memory. Instead, they use “virtual memory”, which is an abstraction in which nicely linear and uniform memory addresses are seen by each program, and are translated by the processor Memory Management Unit (MMU) into physical addresses. This happens using a set of the operating system managed structures known as Page Tables, and navigating (“walking”) through these takes some time. To speed things up, the processor has a series of Translation Lookaside Buffers (TLBs) that store a small number of recent translations.
We mitigated Meltdown by observing that successful exploitation required secret data to be either present in the innermost data cache or a valid virtual memory mapping to existing for the privileged data. For performance reasons, Linux and other operating systems used to have a mapping installed for all operating system memory in the address space of every running application. KPTI (Kernel Page Table Isolation) removes this by splitting the set of translations such that there won’t be a useful translation present during a Meltdown attack. Thus we trade some performance loss (having to switch these page tables every time a program calls upon the OS in a system call) for improved security. Newer hardware removes the need for PTI.
The differences between MDS
MDS has some similarities with the previous vulnerabilities, as well as some important differences. MDS is a family of vulnerabilities in different (related) components of the processor. Unlike Meltdown, MDS doesn’t allow an attacker to directly control the target memory address from which they would like to leak data. Instead, MDS is a form of “sampling” attack in which an attacker can leverage cache side-channel analysis to repeatedly measure the stale content of certain small internal processor buffers that are used to store data as it is being loaded into the caches or written back to memory. Through the sophisticated statistical analysis, it is possible to then reconstruct the original data.
Each of the variants of MDS relies upon an optimization in the load path of Intel processors. When a load is performed, the processor performs address generation and dispatches an internal operation (known as a top or micro-op) to control the execution unit performing the load. Under certain conditions, the load may fault (for example due to attempting to access memory marked as not having a valid translation), or it might be complex enough that it requires an “assist”. This is a special process in which the processor begins to execute a simple built-in microcode program to help handle a load that can’t be implemented easily in pure hardware. In either case, the design of the processor is such that it may speculate beyond the pending fault.
During the window of resulting speculation, before handling the fault or assist, the processor may speculatively forward stale values from internal buffers that may be available to subsequent operations. The designers knew that the processor would shortly be handling the fault or assist, and would throw out any speculated activity as a result, so this was perhaps not seen as a problem, especially in light of the previous assumptions around speculation being a “black box”. Unfortunately, this allows a modified Meltdown-like attack to extract the stale buffer state.
According to varying reports, Intel processors dating back to 2011 or 2008 are affected, and the fixes may be associated with a performance drop. Intel reported that processors manufactured in the month before the disclosure have mitigations against the attacks.
Intel characterized the vulnerabilities as “low-to-medium” impact, disagreeing with the security researchers who characterized them as a major, and disagreeing with their recommendation that operating system software manufacturers should completely disable hyperthreading. Nevertheless, the ZombieLoad vulnerability can be used by hackers exploiting the vulnerability to steal information recently accessed by the affected microprocessor.
Fixes to operating systems, virtualization mechanisms, web browsers, and microcode are necessary. Microcode is the implementation of processor instructions on the processor itself, and updates require a firmware patch, also known as BIOS or UEFI, to the motherboard. As of 14 May 2019, applying available updates on an affected PC system was the most that could be done to mitigate the issues.
- Intel incorporated fixes in its processors starting shortly before the public announcement of the vulnerabilities.
- On 14 May 2019, mitigation was released for the Linux kernel, and Apple, Google, Microsoft, and Amazon released emergency patches for their products to mitigate Zombie Load.
- On 14 May 2019, Intel published a security advisory on its website detailing its plans to mitigate ZombieLoad.