Question to the Developers: In Unity 5 64 Bit, Will Multiple Cores Be Used For Single Ships?

cantab · August 23, 2015

Hmm. If that SQUADcast transcript is accurate / Max's claim is accurate (not that I mistrust anybody, it's just a grapevine effect), then I sure hope that it doesn't break saves, because Kidonia could REALLY use an FPS boost lately.

There's no reason to expect save breakage and good reason to expect Squad to try and avoid it.

The fix to fps issues, is, get a better cpu, this is how pc gaming works. My core system is two years old and I laugh at the idea of a 700 part ship being monstrous..so it's pretty easy to fix those fps issues..even on a budget. Get an intel cpu, boom..improved fps.

Go and build a 1500 part ship and you'll get lag and fps issues. It's the nature of a game like KSP, where the system load depends on what the player does. In any event, while "throw hardware at it" is a fix, it's not an especially good fix, for any performance problem.

Streetwind · August 23, 2015

One core with HT has the performance of about 1.5 cores (it varies, sometimes you can get even 1.9, sometimes it's just 1.1). So, quad core with HT is like six core without it.
Of course, you can only feel it if the program you are trying to run is multithreaded.

Reality isn't nearly that favorable. Expect 20% on average across all applications.

Simultaneous multithreading (the concept which Intel calls Hyperthreading for its own products) works by interleaving instructions from two threads where there are gaps in one thread's loading of the execution units. Most software loads the execution units well and has almost no gaps, leaving SMT only room to step in if there's an interrupt or if one thread goes on an extended branch jump or loop.

Though yes, there are certain cases where over 75% gain can be had, and then SMT really shines.

John FX · August 23, 2015

As you may see, multi-threading in PhysX SDK 3.3 is indeed functional and fairly effective, showing significant performance improvements in case of convex-convex collisions (2x times faster in average, 3 threads vs single thread) and stacking (1.88x faster), and lesser, but noticable performance gains in case of collisions between primitives (1.5x faster) and joints (1.2x faster) calculations.

As a downside, additional worker threads are increasing the memory footprint of the scene.

Also, we have discovered that Scene Queries (such as raycasts and sweep tests) are showing same performance regardless to the number of threads.

In any case, improved multi-threading capabilities of PhysX 3.x are making it even more consistent and futureproof, especially when compared with previous generation of the PhysX physics engine.

SDK 2.8.4 settings - default, 1 thread. SDK 3.3 settings - default, SAP broadphase, legacy contact generation, 1 - 3 threads.

System: i7 2600K CPU, GTX 580 GPU, 8 GR RAM, Win 7 x64

From the physXinfo website.

It seems to me that we may get up to a 2x performance boost in certain situations as a maximum although in the real world this may be lowered.

It won`t be world changing although IMHO I suspect the FPS boost will be around 30% and any boost is welcome. The benefit of many cores seems limited.

EDIT : Also the memory footprint increases with extra threads so the move to 64 bit will be good...

Renegrade · August 23, 2015

As for the actual performance difference between Sandy Bridge and today's architectures at the high end... people tend to underestimate it a bit. Even if each architecture step only brings a couple percent, that can compound over the years. Upon the release of the first Skylake CPUs earlier this month, my go-to tech website (in German, apologies) tested every top-end model from the i7-2600K to the i7-6700K. They found a 40% performance advantage for the Skylake chip over Sandy Bridge in applications and workstation tasks, and a 15% advantage in gaming under typical graphics-limited scenarios. This compounded into a 31% overall advantage across the entire benchmark suite. And it did this while consuming roughly 10% less power under non-synthetic full load - despite running far away from its sweet spot.

I don't buy it. My own personal synthetic benchmarking is showing a line that's basically proportional to clock rate for CPUs in the AthlonXP/Pentium III/Pentium M/Core lines. (well, leaving aside memory performance - the onboard memory controllers in the later processors boosted memory access enormously - as much as four hundred percent faster. Also leaving aside the Pentium IV, which is about the same performance clock-for-clock as the almighty 286, or maybe a lump of rock)

Even if the percentage improvements that Intel claims are true and universal and stacking (ex. improving the even-numbered instructions by 5% in one architecture and then the odd-numbered ones in the next architecture by 5% does not result in an overall performance improvement of 10%, let alone 10.25%), I'm still only calculating a 26.24% improvement over SandyBridge (assuming a 6% compounding universal improvement, which is probably quite overgenerous), which is completely erased by my i7-3820's effective 31.25% overclock (later x820s have decreasing base clock rates, to as low as 3200mhz).

Even if that 31% overall advantage is accurate and not some sort of rabid neophile fanboyism, again, why would I put down another grand just to run at the same speed for slightly less power? I could save that same difference in power by turning the refrigerator down a notch. Or by avoiding 1 out of 400 car trips. Or by not buying a new CPU. Or by reinstalling Windows.

TL;DR: Moore's Law is dead.

Superfluous J · August 23, 2015

(Well, since this thread is utterly off topic already anyway...)

I'm not complaining. Any reply by a non-Squad-developer is off topic really, so all this discussion is just keeping the thread in a position to be seen by them.

...some day.

mattinoz · August 23, 2015

One scenario where lag is a major issue is bases, with numerous craft in physics range. Even with one craft per thread, that'll be a huge boon for colonisers. Just to be safe, I'm getting a hexcore with hyperthreading...

If the engine handling orbital position is broken out to its own process that can handle any number of craft couldn't they then reduce the physics bubble of each ship down to say 100m outside the furthest part or even smaller?

The he other question that always comes to my mind is why do docked ships become one?

at least in an orbital situation couldn't they be a bunch of very tight bubbles orbiting in formation. If one part or multiple of these parts accelerate that produces a collision that accelerates the next bubble.

Streetwind · August 23, 2015

I don't buy it. My own personal synthetic benchmarking is showing a line that's basically proportional to clock rate for CPUs in the AthlonXP/Pentium III/Pentium M/Core lines. (well, leaving aside memory performance - the onboard memory controllers in the later processors boosted memory access enormously - as much as four hundred percent faster. Also leaving aside the Pentium IV, which is about the same performance clock-for-clock as the almighty 286, or maybe a lump of rock)
Even if the percentage improvements that Intel claims are true and universal and stacking (ex. improving the even-numbered instructions by 5% in one architecture and then the odd-numbered ones in the next architecture by 5% does not result in an overall performance improvement of 10%, let alone 10.25%), I'm still only calculating a 26.24% improvement over SandyBridge (assuming a 6% compounding universal improvement, which is probably quite overgenerous), which is completely erased by my i7-3820's effective 31.25% overclock (later x820s have decreasing base clock rates, to as low as 3200mhz).
Even if that 31% overall advantage is accurate and not some sort of rabid neophile fanboyism, again, why would I put down another grand just to run at the same speed for slightly less power? I could save that same difference in power by turning the refrigerator down a notch. Or by avoiding 1 out of 400 car trips. Or by not buying a new CPU. Or by reinstalling Windows.

I can't comment on your testing methodology, since I don't know it. I do however know some things about internal workings of CPUs, and there are significant differences in command execution between generations. The same execution unit in a modern architecture performs the same mathematical operation in significantly less cycles than an olden-day Pentium, and there are many other factors such as the length of the pipeline, the design of the branch prediction unit, the width of the frontend and so on and so forth that have a direct impact on performance. Of course, the mentioned "AthlonXP/Pentium III/Pentium M/Core lines" have pretty much nothing to do with the statements you quoted me making, since at no point in my post did I talk about any of them, but even among those there should be noticable IPC differences.

The problem with synthetic benchmarks is identical to their advantage - they test one single thing with a very high degree of reliability and repeatability. It's great if you want to check how that one single thing has developed over time. It's not so great if you want to check how the CPU as a whole developed over time, because the synthetic benchmark is incapable of testing the whole CPU. I'm fairly confident that your testing results are 100% accurate, because there exist synthetic benchmarks that are almost completely clockspeed dependant. They have their reason for existing, but they're not representative for all types of applications.

Being representative is an important topic in performance testing. Obviously the only way to be 100% representative is to test every piece of software out there, which is completely impossible. The question, then, is how you choose your selection of testing software in an attempt to be as representative as you possibly can. The website I was looking at for the numbers I quoted above chose a mixture of close-to-realworld synthetic benchmarks like SunSpider and actual workstation tasks like scripted sequences of actions in Adobe Photoshop in the article, which you can look at here if you happen to be able to read German. By the very nature of downselecting to one or two handfuls of programs, the result is not 100% representative. Surely, other websites with different testing regimes and different setups will come to slightly different numbers. But you can make an argument that it's a ballpark figure you can orient yourself by. Across the thumb, that's roughly where it should land - especially if you do happen to be looking to buy it for a specific task that is actually represented in the benchmark, like using Adobe Photoshop or playing GTA V.

And re: overclocking - indeed, if you have an older CPU like Sandy Bridge, overclocking can noticably close the gap, especially for gaming where CPU performance is secondary in the first place and the gaps are much smaller. But keep in mind that a.) overclocking is a niche practice, and only a tiny fraction of PC users actually does it; and b.) you're running the CPU outside of its specs. Some CPUs can handle it well. Others, not so much. I vividly remember the Sandy Bridge days where people would make a sport out of buying five different CPUs, testing which one would overclock the highest, and return the other four. Pissed off a lot of retailers, that did, to the point where they nowadays track customers by their returns behavior and outright ban people from shopping that abuse it. At least, here in Germany.

The article did look at overclocking as well, but rated the CPUs based on its stock speeds. You could make an argument saying "these are K-models, almost everyone who buys them overclocks". But the article is also the first look at the architecture overall that this website has done. The findings there are likely going to be reasonably representative across the entire product range. Well, maybe with the exception that the more you drop down in TDP, the better the Skylake architecture should be doing compared to its predecessors. Because it is designed for lower power applications. The K-models matchup against older K-models is probably Skylake's worst-case scenario. And that's why a Sandy Bridge K-series clocked to the crazy 4.6 - 4.9 GHz that some of them could pull will still be competitive today... for a little while, in some use cases. The age of giants like that has ended, and they will be sedately outpaced by architectures that have no interest in competing with them on the oldtimer's terms. Might be a different story with the enthusiast X-series platform, I admittedly don't pay attention to it because it's not for me.

Edited August 23, 2015 by Streetwind

sam.johnson841 · August 25, 2015

What did you guys make of the dev note Tuesday's bit on this, I think they meant that they don't know yet?

Sam

cantab · August 25, 2015

As you may see, multi-threading in PhysX SDK 3.3 is indeed functional and fairly effective, showing significant performance improvements in case of convex-convex collisions (2x times faster in average, 3 threads vs single thread) and stacking (1.88x faster), and lesser, but noticable performance gains in case of collisions between primitives (1.5x faster) and joints (1.2x faster) calculations.

When it comes to KSP, this might pour some cold water on the hypetrain's fire. Collisions between disconnected objects are not a major part of KSP, it's not like a game that has to check what bullets are hitting all the time, though I'll grant that the game may still need to spend time checking for them even if they're rare. Joints between parts are a major part of KSP and that's where the improvement is modest.

On the bright side, that 20% is *just* PhysX. Unity may well bring more benefits, and Squad's own code yet more. I'll make an optimistic guess at 50% better performance than 1.0.

Eric S · August 25, 2015

Joints between parts are a major part of KSP and that's where the improvement is modest.

Well, from a purely code-optimization point of view (without algorithm changes), 50% isn't that modest, and that's what the connected rigid body microbenchmarks are showing (unless I misread them).

Not saying I expect to see a 50% overall improvement, but it's within the realm of possibilities. I do think that they'll need to optimize the resource algorithms in order for us to see that kind of improvement, though.

Superfluous J · August 26, 2015

From the devnotes:

After last weekÃ¢â‚¬â„¢s Squadcast there have been questions about how exactly multithreaded PhysX will work in Kerbal Space Program, One of the biggest improvements is that is optimized for multi-threaded processing. The short answer is, we arenÃ¢â‚¬â„¢t sure how itÃ¢â‚¬â„¢ll work exactly. The insides of PhysX are hardcoded deep under UnityÃ¢â‚¬â„¢s hood, which means we have very little access to it from our end.
Ã¢â‚¬Å“ Our own code hasnÃ¢â‚¬â„¢t changed to support multithreaded physics simulation, so how exactly it will handle KSP objects like multi-rigidbody vessels, is not something we can see in detail.Ã¢â‚¬Â -Felipe (HarvesteR)
There also seems to be some confusion about the difference between the terms multi-threading and multi-core. When we talk about cores, we are generally talking about hardware. Software, however, deals with threads. Threads and Cores are related, but they are different concepts altogether. Threads are virtual, abstract entities used by software to split their workload for separate. Cores are the hardware used to actually process multiple threads at the same time.
What this means for KSP is that thereÃ¢â‚¬â„¢s no way to know which CPU cores will get used for any given vessel, part, or anything along those lines, and weÃ¢â‚¬â„¢re also not sure how simulation is split into separate threads. The way PhysX handles its multithreaded execution is internal to PhysX, and we donÃ¢â‚¬â„¢t really get to see behind those curtainsÃ¢â‚¬Â¦ ThatÃ¢â‚¬â„¢s not a bad thing though. We get to focus on making the game on top of the physics engine, and PhysX handles the low-level simulation/computation stuff.

So this seems to be a pretty big "we don't know exactly but it'll be better but probably not as better as you thought."

Which I'm satisfied with. It answers my question in the first post.

Edited August 26, 2015 by 5thHorseman
Fixed html formatting

Question to the Developers: In Unity 5 64 Bit, Will Multiple Cores Be Used For Single Ships?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation