Page MenuHomePhabricator

Investigate applicability of SPECTRE (CVE-2017-5753) to scribunto
Closed, ResolvedPublic

Description

(AFAICT. I'm not very well versed on some of this stuff)

We use the scribunto extension (aka LuaSandbox php extension) to provide lua scripting on our sites. I believe this uses luajit so that lua runs jit-ed in the same process as php.

Additionally (as pointed out by Tim Starling), lua supports a high resolution timer in the form of os.clock(). (In fact LuaSandbox overrides the default os.clock() to provide a higher resolution timer). Tim says the precision is 1µs.

https://spectreattack.com/spectre.pdf details (among other things) an attack where branch prediction misses could result in out of bounds reads that can than be leaked by a side channel attack on cpu caches. In section 4.3 they give a proof of concept for javascript where the attack was used to read private info in the browser process that js shouldn't have access to, using the fact it was JIT-ed and there was a high resolution timer.

So maybe a similar issue might affect our Lua implementation.

[Note, the other MELTDOWN/SPECTRE bug is T184256]

Event Timeline

Err, nevermind, we appearently don't use luajit

As you already noted, we don't use LuaJIT. It's an option when compiling the luasandbox PHP extension, but I'm not sure to what extent it ever actually worked and, since it isn't built or tested in CI or by anyone I know of, support for it may well have bitrotted since the option was added.

os.clock() is most likely going to be returning thread CPU usage time determined using clock_gettime( CLOCK_THREAD_CPUTIME_ID ), although on platforms lacking that constant it will use CLOCK_REALTIME (which is wall clock time) instead. The standard Lua 5.1 implementation uses clock() instead (process CPU usage time). The override, as far as I know, is more to return a more accurate Lua CPU time usage value (particularly in threaded environments) than to provide a value with higher resolution.

On terbium, a small test program reports that clock() has a resolution of 1μs. clock_gettime() might have a resolution as good as 1ns.

I did a small test script. if you load the same data twice in a row, os.clock() is definitely fine grained enough to distinguish the first load from the second load (Seems to be a difference of about 0.5μs). Whether or not that's from CPU cache or other caching affects, I have no idea.

I tried to naively implement the bounds check variant as described in the spectre paper. Which didn't work for me. However that's kind of unsurprising, because even if its possible to do, it seems like you'd have to probably do a lot more than the naive implementation of it to make it work.

Anyways, my current thinking is os.clock() is probably on the verge of being fine grained enough, but its probably very hard to pull this off when using an interpreter instead of a jit. In any case, I am probably a bit out of my depth on this issue.

So maybe we should reduce the timer precision to 20 microseconds like firefox did as a just in case measure against these types of timing attacks (and potentially other yet undiscovered timing attacks), and then call it a day.

Ok.

At the Scribunto level, this should do it:

To also do it in the luasandbox PHP extension:

Regarding luajit, if we're not supporting it and it theoretically has a security downside, I think we should remove support for it. If you try and build luasandbox without liblua5.1-dev installed, the error message makes it appear that you should install luajit if you don't read closely:

...
checking pkg-config is at least version 0.9.0... yes
checking for lua >= 5.1 lua < 5.2... no
checking for lua5.1... no
checking for lua-5.1... no
checking for luajit... no
configure: error: Package requirements (luajit) were not met:

No package 'luajit' found

Consider adjusting the PKG_CONFIG_PATH environment variable if you
installed software in a non-standard prefix.

Alternatively, you may set the environment variables LUA_CFLAGS
and LUA_LIBS to avoid the need to call pkg-config.
See the pkg-config man page for more details.

(full log P6655)

Fine by me, here's a patch.

+2 to the luajit patch. With that commit message I suppose we could merge it publicly?

I'm hoping either @tstarling or @Bawolff could review the timer patches. How realistic/theoretical are the security implications? Do we need to deploy a new package version secretly? And should we be getting CVE ids for this or not? (or does it fall under the SPECTRE one)

I don't think there's any need to handle it as a security release when there's no demonstrated vulnerability. I think the patches on this bug should be published on Gerrit. In any case, I've reviewed them and they look fine to me. I was a bit concerned as to whether the timing resolution thing is overly paranoid, but I suppose it is OK.

Exploiting branch misprediction would require the processor to speculatively execute several VM ops. The Intel 64 and IA-32 Software Developer Manual states that in NetBurst microarchitecture (and presumably later), speculative execution can proceed through up to 126 instructions, 48 loads or 24 stores (whichever comes first). One way to assess the plausibility of branch misprediction vulnerabilities in Lua would be to compare these numbers against the typical number of instructions, loads and stores per VM op, which should be measurable with perf.

Another barrier to the exploitation of SPECTRE in Lua is the relatively abstract nature of values and tables compared to JavaScript. The closest thing Lua has to a byte array type is a string, but there's not even an opcode for accessing an offset into a string, it is delegated via the global metatable for the string type. Integer-indexed tables are stored compactly as an array of TValue. This type has a 64-bit value followed by a 32-bit type, so the whole thing is presumably padded out to 16 bytes. The most useful part of the value type is probably the lua_Number, a 64-bit floating point number in our configuration. Extracting an arbitrary bit from a 64-bit floating point number within the speculative execution window would (hopefully) be challenging. And even if you could do that, the whole TValue is aligned on 16 byte boundaries, with 8 of the 16 bytes effectively unreachable by the userspace.

I don't think there's any need to handle it as a security release when there's no demonstrated vulnerability. I think the patches on this bug should be published on Gerrit. In any case, I've reviewed them and they look fine to me. I was a bit concerned as to whether the timing resolution thing is overly paranoid, but I suppose it is OK.

Thanks, I uploaded and merged the patches in Gerrit, and backported them to the currently supported Scribunto branches: https://gerrit.wikimedia.org/r/#/q/topic:T184156 / https://gerrit.wikimedia.org/r/#/q/I2b5cc177bded1a9b5600d77116e67817841204be

Bawolff claimed this task.

I don't think there's any need to handle it as a security release when there's no demonstrated vulnerability. I think the patches on this bug should be published on Gerrit. In any case, I've reviewed them and they look fine to me. I was a bit concerned as to whether the timing resolution thing is overly paranoid, but I suppose it is OK.

Thanks, I uploaded and merged the patches in Gerrit, and backported them to the currently supported Scribunto branches: https://gerrit.wikimedia.org/r/#/q/topic:T184156 / https://gerrit.wikimedia.org/r/#/q/I2b5cc177bded1a9b5600d77116e67817841204be

Ok, looks like this is done. I agree with Tim that handling this as a security release is unnessary. Thanks everyone.

Bawolff changed the visibility from "Custom Policy" to "Public (No Login Required)".Feb 23 2018, 6:56 PM