<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>And another on Spectre</p>
<p>--dave<br>
</p>
<div class="moz-forward-container"><br>
<br>
-------- Forwarded Message --------
<table class="moz-email-headers-table" border="0" cellspacing="0"
cellpadding="0">
<tbody>
<tr>
<th nowrap="nowrap" valign="BASELINE" align="RIGHT">Subject:
</th>
<td>Spectre attacks: exploiting speculative execution</td>
</tr>
<tr>
<th nowrap="nowrap" valign="BASELINE" align="RIGHT">Date: </th>
<td>Tue, 16 Jan 2018 06:00:00 +0000</td>
</tr>
<tr>
<th nowrap="nowrap" valign="BASELINE" align="RIGHT">From: </th>
<td>adriancolyer <><br>
<br>
</td>
</tr>
</tbody>
</table>
<a class="moz-txt-link-freetext" href="https://blog.acolyer.org/2018/01/16/spectre-attacks-exploiting-speculative-execution/">https://blog.acolyer.org/2018/01/16/spectre-attacks-exploiting-speculative-execution/</a><br>
<br>
<title>Spectre attacks: exploiting speculative execution</title>
<base
href="https://blog.acolyer.org/2018/01/16/spectre-attacks-exploiting-speculative-execution/">
<p><a href="https://spectreattack.com/spectre.pdf"
moz-do-not-send="true">Spectre attacks: exploiting speculative
execution</a> Kocher et al., <em>2018</em></p>
<p>Yesterday we looked at Meltdown and some of the background on
how modern CPUs speculatively execute instructions. Today it’s
the turn of Spectre of course, which shares some of the same
foundations but is a different attack, not mitigated by KAISER.
On a technical front, Spectre is as fascinating as it is
terrifying, introducing a whole new twist on <a
href="https://blog.acolyer.org/2017/12/06/the-dynamics-of-innocent-flesh-on-the-bone-code-reuse-ten-years-later/"
moz-do-not-send="true">ROP</a>.</p>
<blockquote>
<p> This paper describes practical attacks that combine
methodology from side-channel attacks, fault attacks, and
return-oriented programming that can read arbitrary memory
from the victim’s process… These attacks represent a serious
threat to actual systems, since vulnerable speculative
execution capabilities are found in microprocessors from
Intel, AMD, and ARM that are used in billions of devices.
</p>
</blockquote>
<p>Spectre is a step-up from the already very bad Meltdown:</p>
<ul>
<li>it applies to AMD and ARM-based processors (so works on
mobile phones, tablets, and so on)</li>
<li>it only assumes that speculatively executed instructions can
read from memory the victim process could access normally
(i.e., no page fault or exception is triggered)</li>
<li>the attack can be mounted in just a few lines of JavaScript
– and a few lines of JavaScript can easily be delivered to
your system from, for example, an ad displayed on a site that
you would otherwise trust. </li>
</ul>
<blockquote>
<p> … in order to mount a Spectre attack, an attacker starts by
locating a sequence of instructions within the process address
space which when executed acts as a covert channel transmitter
which leaks the victim’s memory or register contents. The
attacker then tricks the CPU into speculatively and
erroneously executing this instruction sequence…
</p>
</blockquote>
<h3>Let’s play tennis</h3>
<p>You’re watching an epic clay-court tennis match with long
rallies from the baseline. Every time the ball is received deep
in the forehand side of the court, our would-be-champion plays a
strong cross-court forehand. Her opponent plays the ball deep
into the forehand side of the court once more, and almost
reflexively starts moving across the court to get into position
for the expected return. But this time her speculative execution
is in vain — the ball is played down the line at pace leaving
her wrong-footed. By forcing a branch misprediction the now
champion tennis player has won the point, game, set and match!</p>
<p>Spectre exploits speculative execution of instructions
following a branch. Inside the CPU, a <em>Branch Target Buffer</em>
(BTB) keeps a mapping from addresses of recently executed branch
instructions to destination addresses. The BTB is used to
predict future code addresses (“it’s going to be a cross-court
forehand”) even before decoding the branch instructions.
Speculative execution of the predicted branch improves
performance.</p>
<h3>Attack foundations</h3>
<p>Consider this simple code fragment:</p>
<pre class="brush: cpp; title: ; notranslate"> if (x < array1_size)
y = array2[array1[x] * 256];
</pre>
<p>On line 1 we test to see whether <code>x</code> is in bounds
for <code>array1</code>, and if it is, we use the value of <code>x</code>
as in index into <code>array1</code> on line 2. We can execute
this code fragment lots of times, always providing a value of x
that is in bounds (the cross-court forehand). The BTB learns to
predict that the condition will evaluate to true, causing
speculative execution of line 2. Then, wham, we play the
forehand down-the-line, passing a value of x which is out of
bounds. Speculative execution of line 2 goes ahead anyway,
before we figure out that actually the condition was false this
time.</p>
<p>Suppose we want to know the value of sensitive memory at
address $a$. We can set <code>x = a - (base address of array1)</code>
and then the lookup of <code>array1[x]</code> on line 2 will
resolve to the secret byte at $a$. If we ensure that <code>array1_size</code>
and <code>array_2</code> are not present in the processor’s
cache (e.g., via a clflush) before executing this code, then we
can leak the sensitive value.</p>
<p>When the code runs…</p>
<ol>
<li>The processor compares the malicious value of <code>x</code>
against <code>array1_size</code>. This results in a cache
miss.</li>
<li>While waiting for <code>array1_size</code> to be fetched,
speculative execution of line 2 occurs, reading the data, $k$,
at the target address. </li>
<li>We then compute <code>k * 256</code> and use this as an
index into <code>array2</code>. That’s not in the cache
either, so off we go to fetch it.</li>
<li>Meanwhile, the value of <code>array1_size</code> arrives
from DRAM, the processor realises that the speculative
execution was erroneous in this case, and rewinds its register
state. But <em>crucially, the access of array2 affects the
cache as we saw yesterday with Meltdown</em>.</li>
<li>The attacker now recovers the secret byte $k$, since
accesses to <code>array2[n*256]</code> will be fast for the
case when $n = k$, and slow otherwise. </li>
</ol>
<p>You can even do this from JavaScript, using code like this:</p>
<p><img
src="https://adriancolyer.files.wordpress.com/2018/01/spectre-js.jpeg?w=640"
alt="" moz-do-not-send="true"><br>
(<a
href="https://adriancolyer.files.wordpress.com/2018/01/spectre-js.jpeg"
moz-do-not-send="true">Enlarge</a>)</p>
<p>Note that there’s no access to the <code>clflush</code>
instruction from JavaScript, but reading a series of addresses
at 4096-byte intervals out of a large array does the same job.
There’s also a small challenge getting the timer accuracy needed
to detect the fast access in <code>array2</code>. The Web
Workers feature of HTML5 provides the answer – creating a
separate thread that repeatedly decrements a value in a shared
memory location yields a timer that provides sufficient
resolution.</p>
<p>For a C code example, see appendix A in the paper. Unoptimised,
this code can read about 10KB/second an an i7 Surface Pro 3.</p>
<h3>The art of misdirection</h3>
<p>With <em>indirect</em> branches we can do something even more
special. An indirect branch is one that jumps to an address
contained in a register, memory location, or on the stack.</p>
<blockquote>
<p> If the determination of the destination address is delayed
due to a cache miss and the branch predictor has been
mistrained with malicious destinations, speculative execution
may continue at a location chosen by the adversary.
</p>
</blockquote>
<p>This means we can make the processor speculatively execute
instructions from any memory location of our choosing. So all we
have to do, as in the gadgets of <a
href="https://blog.acolyer.org/2015/12/01/rop/"
moz-do-not-send="true">Return-Oriented Programming</a> (ROP),
is find some usable gadgets is the victim binary.</p>
<blockquote>
<p> … exploitation is similar to return-oriented programming,
except that correctly-written software is vulnerable, gadgets
are limited in their duration, but need not terminate cleanly
(since the CPU will eventually recognize the speculative
error), and gadgets must exfiltrate data via side channels
rather than explicitly.
</p>
</blockquote>
<p>In other words, this is an even lower barrier than regular ROP
gadget chaining. Code executing in one hyper-thread of x86
processors can mistrain the branch predictor for code running on
the same CPU in a different hyper-thread. Furthermore, the
branch predictor appears to use only the low bits of the virtual
address: “<em>as a result, an adversary does <strong>not</strong>
need to be able to even execute code at any of the memory
addresses containing the victim’s branch instruction.</em>”</p>
<h3>A Windows example</h3>
<p>A proof-of-concept program was written that generates a random
key the goes into a infinite loop calling <code>Sleep(0)</code>,
loading the first bytes of a file, calling Windows crypto
functions to compute the SHA-1 hash of the key and file header,
and then prints out the hash whenever the header changes. When
compiled with optimisation flags, the call to <code>Sleep(0)</code>
will be made with file data in registers <code>ebx</code> and <code>edi</code>
(normal behaviour, nothing special was done to cause this).</p>
<p>In <code>ntdll.dll</code> we find the byte sequence <code>13
BC 13 BD 13 BE 13 12 17</code>, which when executed
corresponds to:</p>
<pre> adc edi, dword ptr [ebx+edx+13BE13BDh]
adc dl, byte ptr [edi]
</pre>
<p>If we control <code>ebx</code> and <code>edi</code> (which we
do, via the file header), then we can control which address will
be read by this code fragment.</p>
<p>Now the first instruction of the sleep function is <code>jmp
dword ptr ds:[76AE0078h]</code>, which we can target for
branch mistraining. (The actual destination changes per reboot
due to ASLR).</p>
<ul>
<li>Simple pointer operations were used to locate the indirect
jump at the entry point for sleep, and the memory location
holding the destination for the jump</li>
<li>The memory page containing the destination for the jump was
made writable using copy-on-write, and modified to change the
jump destination to the gadget address. Use the same method, a
<code>ret 4</code> instruction was written at the location of
the gadget. (These changes are only visible to the attacker,
not the victim).</li>
<li>A set of threads are launched to mistrain the branch
predictor. This is a bit fiddly (see the details in section
5.2), but once an effective mimic jump sequence is found the
attacker is able to read through the victim’s address space. </li>
</ul>
<h3>Variations and mitigations</h3>
<p>Spectre is not really so much an individual attack, as a whole
new class of attacks: speculative execution may affect the state
of other microarchitectural components, and virtually any
observable effect of speculative execution can lead to leaks of
sensitive information. Timing effects from memory bus
contention, DRAM row address selection status, availability of
virtual registers, ALU activity, and the state of the branch
predictor itself need to be considered.</p>
<blockquote>
<p> The conditional branch vulnerability can be mitigated if
speculative execution can be halted on potentially-sensitive
execution paths… Indirect branch poisoning is even more
challenging to mitigate in software. It might be possible to
disable hyperthreading and flush branch prediction state
during context switches, although there does not appear to be
any architecturally-defined method for doing this.
</p>
</blockquote>
<p>Anything we can do in software or microcode though, should at
best be seen as stop-gap countermeasures pending further
research.</p>
<h3>The last word</h3>
<blockquote>
<p> The vulnerabilities in this paper, as well as many others,
arise from a longstanding focus in the technology industry on
maximizing performance. As a result, processors, compilers,
device drivers, operating systems, and numerous other critical
components have evolved compounding layers of complex
optimizations that introduce security risks. As the costs of
insecurity rise, these design choices need to be revisited,
and in many case alternate implementations optimized for
security will be required.
</p>
</blockquote>
</div>
</body>
</html>