Jump to content

New found CPU bug could seriously downgrade performance for most of us


Azimech

Recommended Posts

Wow. I hope some of you don't plan a long career in IT because you won"t last long :)

Now if you work with data-centers and want to stress a bit more then have a look at this and read at least the last few lines.

Link to comment
Share on other sites

6 minutes ago, sarbian said:

Wow. I hope some of you don't plan a long career in IT because you won"t last long :)

Now if you work with data-centers and want to stress a bit more then have a look at this and read at least the last few lines.

*suffers a sympathetic brain aneurysm*

Link to comment
Share on other sites

7 minutes ago, softweir said:

Having done a lot of reading and YouTube watching to find out more about these exploits, I'm amazed anybody even realised they existed. To use the exploits takes a lot of very real cunning. Similarly, I am not at all surprised these exploits exist. There is no way an engineer could have predicted that these techniques were possible. After all - it took ten years before some bright spark realised the exploit existed. Because of this, I am now firmly signed on to the "excrements happens" hypothesis in this instance rather than the conspiracy theory or even the "people are dumb" explanation.

There is a large gap between "That may potentially be used in strange ways" and "This is an exploit that uses that". Predictor being strange is not new.  This story if from 2005 and show how they can be strange :)

But yes, it s clearly a case of "excrements happens". 

Link to comment
Share on other sites

2 hours ago, sarbian said:

There is a large gap between "That may potentially be used in strange ways" and "This is an exploit that uses that". Predictor being strange is not new.  This story if from 2005 and show how they can be strange :)

But yes, it s clearly a case of "excrements happens". 

That was fascinating!

Link to comment
Share on other sites

Well looks like I am going to have to do a full update of Ubuntu.

4 hours ago, sarbian said:

Wow. I hope some of you don't plan a long career in IT because you won"t last long :)

Now if you work with data-centers and want to stress a bit more then have a look at this and read at least the last few lines.

My days of being a sysadmin are long over, and good riddance. All I have to do is keep my wife off the click bait.

Link to comment
Share on other sites

10 hours ago, wumpus said:

Allowing the branch predictor to access memory it shouldn't.  It seems Intel [and ARM] "checks their privilege*" a wee bit late and allows operations that have been predicted to use privileged data that they shouldn't to fiddle with branch prediction.  Once this data has been cleared out, the effects of which branch predictions succeeded and failed are still visible.

Allowing the branch predictor to update with data that "shouldn't exist".  Rolling back all speculated operations after mispredicting branches is a real pain, and rolling back the branch prediction updates not only suffers that pain but also loses data (and thus makes branch prediction miss more often).  But it turns out that it may be possible to use the data leaned from the mispredictions to determine what happened on data that shouldn't be accessed.

This is really not that easy. First of all, early privilege check is bad. The behavior of privilege check fail is segfault. What do you propose a speculative segfault to look like? Should it start invoking signal handlers speculatively too? The solution of just running with it, assuming permissions are green, and segfaulting if you actually take this branch is a correct one. Anything else leads to even more insanity that potentially has just as many exploits. All at cost of real performance with no gain to show for.

Rolling back properly, yes, that's the idea. And all registers and memory states are. Problem is, a speculative cache miss is still a cache miss, and results in lines being read. Un-reading lines of memory from cache is not something you can do on any sane system. And you still can't read any data from cache that has been speculatively obtained, unless you have permission to read the data. So even that's not the problem. The problem is that a speculative branch can read a byte of memory, then read from memory at some base offset + multiplier * value read. Now the cache line hit depends on the value, and you can use a timing attack to figure out which cache line it was. There is no caching scheme in modern CPU that protects you from this while providing a half-reasonable cache performance.

There are variations on all of the above that make some CPUs more vulnerable than others. That's why initial version of the attack didn't work on AMD processors. But there are variations of it that will. You can't fix this without complete rework of the architecture, and that will come with enormous costs in development time and performance setbacks.

 

The patches that exist out there do not fix any of this. They simply make it so that the attacker doesn't know where to look. With virtual memory space being so vast that it might as well be infinite, if you give each process a unique pages table, attacker won't be able to figure out which memory to read. They still can read any memory they like, but they don't know the address. The downside, of course, is the switching time, which causes a performance hit.

The other part of it, and I might be mis-reading it, so if somebody knows better, please correct me, but the patches that got pushed out only prevent user-space programs from reading kernel-space memory. I think the global tables for all user-space programs are still the same. If so, this still leaves any number of machines out there vulnerable to cross-origin attacks.

Link to comment
Share on other sites

11 hours ago, K^2 said:

This is really not that easy. First of all, early privilege check is bad. The behavior of privilege check fail is segfault. What do you propose a speculative segfault to look like? Should it start invoking signal handlers speculatively too? The solution of just running with it, assuming permissions are green, and segfaulting if you actually take this branch is a correct one. Anything else leads to even more insanity that potentially has just as many exploits. All at cost of real performance with no gain to show for.

Rolling back properly, yes, that's the idea. And all registers and memory states are. Problem is, a speculative cache miss is still a cache miss, and results in lines being read. Un-reading lines of memory from cache is not something you can do on any sane system. And you still can't read any data from cache that has been speculatively obtained, unless you have permission to read the data. So even that's not the problem. The problem is that a speculative branch can read a byte of memory, then read from memory at some base offset + multiplier * value read. Now the cache line hit depends on the value, and you can use a timing attack to figure out which cache line it was. There is no caching scheme in modern CPU that protects you from this while providing a half-reasonable cache performance.

As far as I know, the AMD systems either delay the checks or delay throwing the segfaults.  Unreading the cacheline is certainly possible (simply marking it invalid), but replacing the old values is the real problem and likely just as visible to the attacker.  Using some form of slew (or hashed) cache may make it difficult for the attacker to measure the performance of the old cache line (this was brought up in the realworldtech forums, but nobody commented on it.  It might take a secure hash [absolutely performance killing] to make this work.

There are a few defenses against an attack on the cache:

don't depend on privileged code.  This seems to be the current strategy (and patches).  Don't let memory access the stuff or branch to unavailable areas.

Wait for the branch predictor to "catch up" before filling the cache line.  While this appears performance killing, I don't think any cache fills starts until 8 cycles (probably more) after the initial attempt to execute the load/store.  In practice, this would likely use something that looks a lot like a victim cache to buffer the cacheline until the load/store instruction is retired.  With a more or less associative victim/buffer cache, it should hide which lines are effected.

There are all kinds of nasty holes in CPUs.  My favorite is that you can construct a Turing Machine out of the MMU page replacement scheme: this isn't a flaw in any implementation, it is exactly what the architectural spec demands.  But nobody is all that worried of somebody taking over the page replacement data.  It almost always makes more sense to attack the software than the hardware, and nobody is protecting the software to quite the same rigor.

Link to comment
Share on other sites

7 hours ago, wumpus said:

Unreading the cacheline is certainly possible (simply marking it invalid), but replacing the old values is the real problem and likely just as visible to the attacker.

The fact that the line is marked dirty is the exact thing the attack uses. It pre-fetches data from its own space that it knows is going to hit the lines of interest, so it knows these cache lines are clean. Then it executes the attack, and tries to fetch these lines again, timing the reads. Lines marked dirty during speculative execution will take longer to read. And because actual data in privileged memory determines which cache lines get hit, it effectively lets you read privileged memory. Albeit, very slowly.

Hashed cache will do nothing, but slightly slow down the attack. I don't need a reversible hash. I just need a known hash, or one I can experiment with in advance. Map 256 lines in memory to corresponding cache lines, and you're golden.

7 hours ago, wumpus said:

Wait for the branch predictor to "catch up" before filling the cache line. 

For a lot of tasks, cache performance is CPU performance. If speculative execution can't cache, you might as well not bother with speculative execution. This is true, for example, for every single video game out there. Are you prepared to go ten years back in terms of CPU performance in your video games? Nobody else is either.

AMD's CPUs are not safe from this attack. They were saved by some minor inefficiencies and a different way in which they handle memory pages.

I'm not saying this is fundamentally unfixable, but it will take major changes in architecture. Not some quick fixes and patches.

Link to comment
Share on other sites

58 minutes ago, K^2 said:

I'm not saying this is fundamentally unfixable, but it will take major changes in architecture. Not some quick fixes and patches.

I am afraid you are very right there. And I can just imagine software engineers and CPU designers in many companies having very long, overtime-heavy sessions trying to work out just how far they can get away with "a quick patch" and what the heck they will do when "a quick patch" fails and they need to implement a proper fix.

Link to comment
Share on other sites

4 hours ago, K^2 said:

For a lot of tasks, cache performance is CPU performance. If speculative execution can't cache, you might as well not bother with speculative execution. This is true, for example, for every single video game out there. Are you prepared to go ten years back in terms of CPU performance in your video games? Nobody else is either.

You missed the whole point of the answer: you use a buffer much like a victim cache (presumably caching while entering *and* leaving, while a true victim cache only caches while leaving).  This likely has to be fully associative (to avoid the same issues as hashing) but doesn't have to be big.  Since you would look up both simultaneously, you don't lose performance (well, some speed thanks to the extra capacitance on the lines, but nothing major.  Victim caches are a thing, and don't kill performance).

I suspect that AMD will at least look at this type of thing for Zen 3, but don't count on seeing anything beyond what they've already done.  Presumably Intel has to do a bit more (but luckily did most of what was needed *years* ago with the PTI data).

Link to comment
Share on other sites

5 hours ago, wumpus said:

You missed the whole point of the answer: you use a buffer much like a victim cache (presumably caching while entering *and* leaving, while a true victim cache only caches while leaving).

I'll just add a second cache miss. Or a third. Or a fourth. If I know the size of the victim cache, I'll exhaust it preemptively. If not, I'll just have multiple passes of my code before checking the cache. It's practically free for me as an attacker to add extra misses, while adding layers of victim cache is prohibitively expensive after one, two at the most layers. There's the reason why victim cache on every proposed architecture is tiny. This is not the solution.

Link to comment
Share on other sites

15 hours ago, K^2 said:

I'll just add a second cache miss. Or a third. Or a fourth. If I know the size of the victim cache, I'll exhaust it preemptively. If not, I'll just have multiple passes of my code before checking the cache. It's practically free for me as an attacker to add extra misses, while adding layers of victim cache is prohibitively expensive after one, two at the most layers. There's the reason why victim cache on every proposed architecture is tiny. This is not the solution.

Now you can look for this sort of stuff, you could probably to that from an higher level and how I would try to patch it. 
Looking for suspicious behavior is standard works well but demand that the behavior is know to be suspicious. 

Link to comment
Share on other sites

  • 1 month later...

I took a break from KSP, various reasons, and there have been several Linux kernel patches covering Meltdown/Spectre, also stuff such as graphics drivers.

KSP does seem to have slowed down, but making a comparison is a little tricky, as you need to have made some sort of speed measurement before all the patching was done, and then use the same craft in a similar game state.

Edited to add: Did some tests. and KSP wasn't maxing out my CPU or RAM. It's possible that KSP is a little more sensitive to high part counts than it was before the patching frenzy. But I would hate to have to be paging to disk.

 

 

Edited by Wolf Baginski
added result of a test
Link to comment
Share on other sites

This thread is quite old. Please consider starting a new thread rather than reviving this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...