Jump to content

Performance Improvements


Recommended Posts

I suggested performance improvements to kerbal space program support, and they requested I post it here because they thought it would greatly help the developers.

First off, I love KSP2 and think it's a great game. Y'all added a lot of features I really appreciate, like procedural wings, coloring (man do I love coloring), and the SAS control and UI I like so much more. I'm also really excited for all the features in the roadmap. 

Secondly, I'll say what I believe to be true and what you should do working off of those assumptions. I don't mean to sound pretentious, just writing this because I want to help if at all possible. I could also be completely wrong because I don't have that much experience.  I believe if you knew all this though, you wouldn't be having the performance issues you're currently having, but I could be wrong. If my assumptions are wrong then my suggestions won't help much either.

My assumptions: KSP2 is currently built in Unity, which uses PhysX engine for its physics, which is really good for collision detection, but only that. Therefore, your structure is probably something along the lines of every part is a prefab, players then are basically dragging prefabs to build a ship that is then loaded into the world. These prefabs probably all inherit from some part class that handles gravity, joint force, and aerodynamics. Then fuel tanks and engines (where you're currently having performance issues when users add several) have additional scripts handling fuel and force application. This is probably all within FixedUpdate, which is called serially. This means that every single part is executing its script on the main thread sequentially, which is why some computers with good single CPUs instead of multicore CPUs are performing well. However, most modern CPUs are built with multiple cores and threading in mind because of how much more efficient it is. This doesn't matter though, because Unity's APIs aren't thread safe and prevent multithreading. Final assumption, you're not using the GPU for a lot of your calculations.

Where I believe you can go from here working off of those assumptions: It's kind of infeasible to just improve performance.  Like sure, you can make your functions better, but only so much better, and there are some calculations you just have to perform. Once you get to colonization you also get ridiculously large structures and physics. Unity may not support multithreading (which honestly at a colony part level won't help too much), but it does support compute shaders which would allow you to perform a massive amount of calculations very quickly. For most of your parts you probably have the same function, just different inputs depending on the size and shapes. Then engines probably all have the same script that handles fuel consumption, just different thrust and ISPs, which is perfect for compute shaders, which are really good at performing the same function with different inputs in parallel. Unity even allows you to pass structs to the compute shader, which means each part could be simplified to a struct of its various variables, and then sent to the compute shader. Each block of the compute shader could be a different function / calculation, i.e. aerodynamics, gravity, fuel consumption, and each thread a different part (or be really fancy and use textures for input output through rgba values). Meaning most modern computers will be able to handle 1024 parts at the exact same rate as 1 part, because most have about 1024 threads. Worst case scenario you only get 512 parts, but even so Unity's compute shaders allow for a third dimension which means expanding beyond 1024/512 parts is simple. Or if you don't have a GPU, it will default to the fastest device which will be your CPU and now you've threaded it since CPUs just simulate the GPU and thread if there isn't one. Basically, any where you have for loops, double for loops, or scripts on every part that perform the same calculation, you could probably offload to the GPU to increase performance. 

Concerns:

- GPUs are slower. Yes. But if you have 1000 calculations and the CPU runs 3 times as fast (approximately maybe), the GPU does all 1000 calculations in 3 CPU time, because they are parallel. The CPU does 1000/3 = 333, which is a much bigger number than 3.

- GPU answers are a large array you'd have to loop through on the CPU anyway. Just do thread reduction.

- It'll slow down rendering since we're using the GPU now. Well, rendering waits for the CPU calculations, and your structure would look something like this CPU (initial calculations and storage) -> GPU (parallel calculations) -> CPU (handle those calculations and store them for the next frame) -> GPU (render).  Besides, slower rendering doesn't really matter if your calculations are the bottleneck.

- You don't have that many parts. Yea, actually the GPU is worse for if you only have like 10 calculations. Best I'd say is if you reach a threshold offload to GPU otherwise use CPU. But I imagine for colonization you will have to use the GPU.

I have two examples, one using CUDA, the other a compute shader to prove how many calculations you can achieve. The compute shader performs collisions, collision reaction, and gravitational constants (which are bad for the GPU since there's a cube and a square root involved in the calculation)  1.04 e10 (like 10 billion ish) times a second (10 blocks of 1024 threads that have a for loop going over all 10240 objects at an average of 100 fps - given, I took a lot of shortcuts to make it stupid efficient which you won't exactly be able to do in Unity.  10 * 1024 * 10240 * 100). The CUDA program is the same principles as the compute shader wince it runs on a GPU, but the CUDA program has a reduction implementation that's quite simple since it's only thread reduction. It performs about 20,000,000 calculations in .3 seconds. If you partition each function onto a block you can do the block reduction on the CPU since block reduction is difficult and can be risky.  You have to go back to the CPU anyway to store the information for later frames, and looping over blocks isn't going to be that long of a task. Both of these examples are using OpenGL and C, but Unity's compute shaders are pretty easy to use and comparable in principle, just use HLSL instead of GLSL.

I have the repositories for the two examples but don't particularly won't to blast my personal information so if requested I'll send links to them (they have my name and some info and such).

I hope this is helpful and you didn't already know all of this and I just sound like a jerk. I think that with great performance nobody cares as much about bugs since they can just immediately restart and not be waiting 10 minutes to see results again. I think performance is especially important since you want to make science accessible to everyone, and most people not properly exposed to science don't have modern hardware. I'd also be happy to talk through some questions or concerns if you have any.

Thank you for your time.

Link to comment
Share on other sites

4 hours ago, Cosmus said:

I like your perspective. In your opinion, was Unity a good choice for a game like KSP?

I don't think Unity was a good choice for KSP2 or at least i think there were better choices for KSP2

Link to comment
Share on other sites

Honestly, I think they shouldn't have used any game engine. The only nice thing about game engines is the built in tools that make development easier, but unity isn't built for a physics simulation. No game engine is really. They're just building something very unique so rather than helping them by providing them with nice tools a game engine is just holding them back. It means performance is worse and they have to do a lot of work arounds to get the systems they want in place.

Link to comment
Share on other sites

This thread is quite old. Please consider starting a new thread rather than reviving this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...