[GTALUG] Fwd: Spectre attacks: exploiting speculative execution

David Collier-Brown davec-b at rogers.com
Tue Jan 16 07:23:44 EST 2018


And another on Spectre

--dave



-------- Forwarded Message --------
Subject: 	Spectre attacks: exploiting speculative execution
Date: 	Tue, 16 Jan 2018 06:00:00 +0000
From: 	adriancolyer <>

https://blog.acolyer.org/2018/01/16/spectre-attacks-exploiting-speculative-execution/

Spectre attacks: exploiting speculative execution

Spectre attacks: exploiting speculative execution 
<https://spectreattack.com/spectre.pdf> Kocher et al., /2018/

Yesterday we looked at Meltdown and some of the background on how modern 
CPUs speculatively execute instructions. Today it’s the turn of Spectre 
of course, which shares some of the same foundations but is a different 
attack, not mitigated by KAISER. On a technical front, Spectre is as 
fascinating as it is terrifying, introducing a whole new twist on ROP 
<https://blog.acolyer.org/2017/12/06/the-dynamics-of-innocent-flesh-on-the-bone-code-reuse-ten-years-later/>.

    This paper describes practical attacks that combine methodology from
    side-channel attacks, fault attacks, and return-oriented programming
    that can read arbitrary memory from the victim’s process… These
    attacks represent a serious threat to actual systems, since
    vulnerable speculative execution capabilities are found in
    microprocessors from Intel, AMD, and ARM that are used in billions
    of devices.

Spectre is a step-up from the already very bad Meltdown:

  * it applies to AMD and ARM-based processors (so works on mobile
    phones, tablets, and so on)
  * it only assumes that speculatively executed instructions can read
    from memory the victim process could access normally (i.e., no page
    fault or exception is triggered)
  * the attack can be mounted in just a few lines of JavaScript – and a
    few lines of JavaScript can easily be delivered to your system from,
    for example, an ad displayed on a site that you would otherwise trust.

    … in order to mount a Spectre attack, an attacker starts by locating
    a sequence of instructions within the process address space which
    when executed acts as a covert channel transmitter which leaks the
    victim’s memory or register contents. The attacker then tricks the
    CPU into speculatively and erroneously executing this instruction
    sequence…


      Let’s play tennis

You’re watching an epic clay-court tennis match with long rallies from 
the baseline. Every time the ball is received deep in the forehand side 
of the court, our would-be-champion plays a strong cross-court forehand. 
Her opponent plays the ball deep into the forehand side of the court 
once more, and almost reflexively starts moving across the court to get 
into position for the expected return. But this time her speculative 
execution is in vain — the ball is played down the line at pace leaving 
her wrong-footed. By forcing a branch misprediction the now champion 
tennis player has won the point, game, set and match!

Spectre exploits speculative execution of instructions following a 
branch. Inside the CPU, a /Branch Target Buffer/ (BTB) keeps a mapping 
from addresses of recently executed branch instructions to destination 
addresses. The BTB is used to predict future code addresses (“it’s going 
to be a cross-court forehand”) even before decoding the branch 
instructions. Speculative execution of the predicted branch improves 
performance.


      Attack foundations

Consider this simple code fragment:

	if (x < array1_size)
	  y = array2[array1[x] * 256];

On line 1 we test to see whether |x| is in bounds for |array1|, and if 
it is, we use the value of |x| as in index into |array1| on line 2. We 
can execute this code fragment lots of times, always providing a value 
of x that is in bounds (the cross-court forehand). The BTB learns to 
predict that the condition will evaluate to true, causing speculative 
execution of line 2. Then, wham, we play the forehand down-the-line, 
passing a value of x which is out of bounds. Speculative execution of 
line 2 goes ahead anyway, before we figure out that actually the 
condition was false this time.

Suppose we want to know the value of sensitive memory at address $a$. We 
can set |x = a - (base address of array1)| and then the lookup of 
|array1[x]| on line 2 will resolve to the secret byte at $a$. If we 
ensure that |array1_size| and |array_2| are not present in the 
processor’s cache (e.g., via a clflush) before executing this code, then 
we can leak the sensitive value.

When the code runs…

 1. The processor compares the malicious value of |x| against
    |array1_size|. This results in a cache miss.
 2. While waiting for |array1_size| to be fetched, speculative execution
    of line 2 occurs, reading the data, $k$, at the target address.
 3. We then compute |k * 256| and use this as an index into |array2|.
    That’s not in the cache either, so off we go to fetch it.
 4. Meanwhile, the value of |array1_size| arrives from DRAM, the
    processor realises that the speculative execution was erroneous in
    this case, and rewinds its register state. But /crucially, the
    access of array2 affects the cache as we saw yesterday with Meltdown/.
 5. The attacker now recovers the secret byte $k$, since accesses to
    |array2[n*256]| will be fast for the case when $n = k$, and slow
    otherwise.

You can even do this from JavaScript, using code like this:


(Enlarge <https://adriancolyer.files.wordpress.com/2018/01/spectre-js.jpeg>)

Note that there’s no access to the |clflush| instruction from 
JavaScript, but reading a series of addresses at 4096-byte intervals out 
of a large array does the same job. There’s also a small challenge 
getting the timer accuracy needed to detect the fast access in |array2|. 
The Web Workers feature of HTML5 provides the answer – creating a 
separate thread that repeatedly decrements a value in a shared memory 
location yields a timer that provides sufficient resolution.

For a C code example, see appendix A in the paper. Unoptimised, this 
code can read about 10KB/second an an i7 Surface Pro 3.


      The art of misdirection

With /indirect/ branches we can do something even more special. An 
indirect branch is one that jumps to an address contained in a register, 
memory location, or on the stack.

    If the determination of the destination address is delayed due to a
    cache miss and the branch predictor has been mistrained with
    malicious destinations, speculative execution may continue at a
    location chosen by the adversary.

This means we can make the processor speculatively execute instructions 
from any memory location of our choosing. So all we have to do, as in 
the gadgets of Return-Oriented Programming 
<https://blog.acolyer.org/2015/12/01/rop/> (ROP), is find some usable 
gadgets is the victim binary.

    … exploitation is similar to return-oriented programming, except
    that correctly-written software is vulnerable, gadgets are limited
    in their duration, but need not terminate cleanly (since the CPU
    will eventually recognize the speculative error), and gadgets must
    exfiltrate data via side channels rather than explicitly.

In other words, this is an even lower barrier than regular ROP gadget 
chaining. Code executing in one hyper-thread of x86 processors can 
mistrain the branch predictor for code running on the same CPU in a 
different hyper-thread. Furthermore, the branch predictor appears to use 
only the low bits of the virtual address: “/as a result, an adversary 
does *not* need to be able to even execute code at any of the memory 
addresses containing the victim’s branch instruction./”


      A Windows example

A proof-of-concept program was written that generates a random key the 
goes into a infinite loop calling |Sleep(0)|, loading the first bytes of 
a file, calling Windows crypto functions to compute the SHA-1 hash of 
the key and file header, and then prints out the hash whenever the 
header changes. When compiled with optimisation flags, the call to 
|Sleep(0)| will be made with file data in registers |ebx| and |edi| 
(normal behaviour, nothing special was done to cause this).

In |ntdll.dll| we find the byte sequence |13 BC 13 BD 13 BE 13 12 17|, 
which when executed corresponds to:

     adc  edi, dword ptr [ebx+edx+13BE13BDh]
     adc  dl, byte ptr [edi]

If we control |ebx| and |edi| (which we do, via the file header), then 
we can control which address will be read by this code fragment.

Now the first instruction of the sleep function is |jmp dword ptr 
ds:[76AE0078h]|, which we can target for branch mistraining. (The actual 
destination changes per reboot due to ASLR).

  * Simple pointer operations were used to locate the indirect jump at
    the entry point for sleep, and the memory location holding the
    destination for the jump
  * The memory page containing the destination for the jump was made
    writable using copy-on-write, and modified to change the jump
    destination to the gadget address. Use the same method, a |ret 4|
    instruction was written at the location of the gadget. (These
    changes are only visible to the attacker, not the victim).
  * A set of threads are launched to mistrain the branch predictor. This
    is a bit fiddly (see the details in section 5.2), but once an
    effective mimic jump sequence is found the attacker is able to read
    through the victim’s address space.


      Variations and mitigations

Spectre is not really so much an individual attack, as a whole new class 
of attacks: speculative execution may affect the state of other 
microarchitectural components, and virtually any observable effect of 
speculative execution can lead to leaks of sensitive information. Timing 
effects from memory bus contention, DRAM row address selection status, 
availability of virtual registers, ALU activity, and the state of the 
branch predictor itself need to be considered.

    The conditional branch vulnerability can be mitigated if speculative
    execution can be halted on potentially-sensitive execution paths…
    Indirect branch poisoning is even more challenging to mitigate in
    software. It might be possible to disable hyperthreading and flush
    branch prediction state during context switches, although there does
    not appear to be any architecturally-defined method for doing this.

Anything we can do in software or microcode though, should at best be 
seen as stop-gap countermeasures pending further research.


      The last word

    The vulnerabilities in this paper, as well as many others, arise
    from a longstanding focus in the technology industry on maximizing
    performance. As a result, processors, compilers, device drivers,
    operating systems, and numerous other critical components have
    evolved compounding layers of complex optimizations that introduce
    security risks. As the costs of insecurity rise, these design
    choices need to be revisited, and in many case alternate
    implementations optimized for security will be required.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gtalug.org/pipermail/talk/attachments/20180116/0da405a7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: a795b4f89a6d096f314fc0a2c80479c1
Type: application/unknown
Size: 61 bytes
Desc: not available
URL: <http://gtalug.org/pipermail/talk/attachments/20180116/0da405a7/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: spectre-js.jpeg
Type: application/unknown
Size: 60 bytes
Desc: not available
URL: <http://gtalug.org/pipermail/talk/attachments/20180116/0da405a7/attachment-0001.bin>


More information about the talk mailing list