Jump to content

How to play KSP with Unity 2019 on Old Potatoes!!


Lisias
 Share

Recommended Posts

That's the History.

When KSP 1.8. came, using Unity 2019, my old Mac Mini 5.1 (i5, 2 cores, 4 HyperThreads) didn't handled it. I just could not fire up KSP and keep the machine useable - the whole thing started to stutter. Facebook, Youtube, command line terminals, you name it. Everything were stuttering, it was impossible to watch a video!

Since I was going to buy a new old rig anyway (that by pure chance ended up being an Mac mini 6.2 - 4 Cores, 8 HTs), I didn't gave it to much attention. The new old Potato ended up handling KSP >= 1.8.0 and that allowed me to keep using it for development (besides KSP 1.7.3 still being way more performatic, being the reason my main playing is still 1.7).

And so KSP 1.12 came, and screwed everything again: KSP 1.12 did to my Mac Mini 6.2 what KSP 1.8 did on the 5.1 . Krap.

Oh, well. Life goes on. I still can run KSP 1.12 for some "quick" and small tests and use 1.11 for the main workload until I figure out a way to buy yet another new old potato. :P 

But then I realized I made a less than ideal decision making on a thingy called Refunding from KSP-Recall. At that time, I had pretty little time to fool around and made things the fast as I could in order to work around the problem under my nose, rushed the thingy into the Mainstream and gone back to day job, and by doing this I ended up bullying the GC. I then optimised a bit the memory usage, but the bulling just would not stop. It ended up being (another) bug on KSP that was causing a memory leak, that was triggering the GC a lot, that was being sabotaged by spinlocks on the waiting threads!

Checking the worst processor hogs on the KSP's process, it came to my attention that almost 100% of the CPU time was being used on a system  call related to Semaphores inside an Unity thread, the dispatch_semaphore_wait_slow that by itself was spinning around os_semaphore_wait that by its turn was calling semaphore_wait_trap.

Process-Thread-View.png

Interesting. Checking the other threads of the KSP's process, I got horrified!

Process-ManyThread-View.png

LOTS AND LOTS OF THREADS hogging up 100% of the CPU, a clear indication of busy waits!

It's not a surprise Refunding was provoking a memory leak - there're so many threads busy waiting for the GC that there's no CPU left for the GC itself, and so we have a dead lock!!!

Process-All-Threads-Screwed.png

Every single thread, including OSX HID Input (Human Interface Devices) are boggling the CPU at 100%! It's evident now why the input is so sluggish when your crafts gets to big for your rig!!!!

Digging a bit more on the subject around the Internet, I came to a Swift code like this one:

        if (mutex_sem != 0)
            kr = semaphore_wait_signal_trap(cond_sem, mutex_sem);
        else
            kr = semaphore_wait_trap(cond_sem);

What explains a lot - semaphore_wait_trap is used without a mutex, and so somebody somewhere is using a spinlock to do the job - you know, we need to synchronize things between threads, right?

Remembering a very productive exchange with @darthgently, I decided to use one of the tricks he taught me, the MONO_THREADS_PER_CPU environment variable. It tells mono to, well, limit the number of threads per CPU. :) 

By limiting this number, we would have less spinlocks on the process, and so the poor CPU would have a better chance to do its job instead of spinning around the same code waiting for something that will never happens because the CPU is being completely screwed up by the waiting threads.

Since I'm on a UNIX machine, this is what I did:

KSP-with-MONO-THREADS-PER-CPU.png

The command MONO_THREADS_PER_CPU=1 ./KSP.app/Contents/MacOS/KSP will set the environment variable and then call the executable `KSP`. On exit, the environment variable will be lost, so no chance for it to linger and end up screwing up some other process you call later.

And that, my friends, solved the problem for me. My old Mac Potato 5.1 is now able to run KSP 1.8.1 . Barely, the game is still slugish but the rest of the machine is useable! I can even watch Youtube videos while running KSP 1.8.1, something that was impossible for this machine 17 months ago!

My less old Mac Potato 6.2 managed to withhold some more abuse from KSP 1.8.0 to KSP 1.11.2 because it have twice the cores of my previous rig, but now on KSP 1.12.0 someone increased the default number of threads per CPU again (or something similar) [edit: see note below], and it screwed up my i7: there's no point on increasing the working threads with spinlocks, all you will get is more threads waiting for something that will never happens because your threads are preventing everything else to run!

Note: Since the last time I revisited this article, I managed to diagnose the reason KSP 1.12.x screwed up so badly my punny Mac Potato 6.2: THE TEXTURES. Squad essentially quadrupled the VRAM footprint and this completely wrecked my GPU, as it has a maximum of 1536MB of VRAM. By manually shrinking the textures sizes to a quarter of the original size (more or less the sizes on the KSP 1.5 era) KSP 1.12.x behave nicely on my rig! More info on this thread.

 

Aftermath

Right now, I'm being able to run KSP >= 1.8 on my oldest rig by using this trick. I'm trying now to figure out the best compromise of threads for my rig (2 appears to be acceptable, I will try 4 on my next time window for this). I'm pretty confident that a similar trick will "solve" the issue for my less older Mac Potato, so soon I'm be able to test drive things (and diagnose problems) on KSP 1.12.x, something that until now I was not able to do properly.

These hacks are not solving the bad performance of KSP itself, besides it is getting slightly better (or less worse) too. What will solve the problem for good is using MUTEXES instead of spinlocks, and this is something I do not currently know if it's doable by options or environment variables.

I will update this article with my findings as they happen.

 

Addenda

On KSP 1.7.3

Out of curiosity, I fired up KSP 1.7.3 (Unity 2017) and inspected the process the same way I did on KSP 1.8.1:

KSP173-Process-ManyThread-View.png

We still have lots of threads screwing up the cores with semaphore_wait_trap, but we also have some others that don't!

Some unnamed threads are using nanosleep, and the OSX HID Input one is using mach_msg_trap.

We have now good evidences that my thesis have teeth.

 

On KSP 1.3.1

On KSP 1.3.1 (Unity 5) I got similar results!

KSP131-Process-ManyThread-View.png

It worths to mention that KSP 1.3.1 and older are simply the best performatic KSP versions on my i5. Point.

I think we have a pattern here. Unity is (ab)using dispatch_semaphore_wait_slow on Version 2019, and this is royally screwing KSP on anything not top notch (and probably hindering top notch machines as they would probably perform better without this mess).

 

Conclusions

Besides being a Bad Move™ (and Unity would have better results on the field without using that kind of stunt), what's really screwing up things is not necessarily the spinlocks, but using them where a proper MUTEX is really mandatory. The HID Input thread appears to be one of them, at least.

I think it's more than due time for Unity to start getting their excrements together and do things right. For a change.

This article is also published on my site.

Edited by Lisias
Notes.
Link to comment
Share on other sites

Just now, Commodore_32 said:

hi there

So, this will very likely reduce the CPU overload and very posibly remove my TLA issue from my other thread

But the question is, how do i run this code and where?, I'm very unfamiliar with C++ or MS DOS so yeah

You are running on Windows, right? Well, on Windows I only remember how to set system wide Environment Variables. Whats not exactly optimal, because we want a way to set it up only for KSP 1.8.0 and newer.

Anyway, this link explains how to do it:

https://www.twilio.com/blog/2017/01/how-to-set-environment-variables.html

You want to create a new ENVVAR called MONO_THREADS_PER_CPU and set it to 1, and see what happens. If your problem persists, it will not help you - so redo the steps above and remove this variable, as it will hinder your Unity games for sure.

On the other hand, if the stunt sticks, you will want to change 1 to a bigger value, the bigger you can set it up without screwing up your computer (neither triggering the TLA again). It's a trial and error process.

Let me know if it works for you! This appears to be important.

Link to comment
Share on other sites

6 minutes ago, Lisias said:

You are running on Windows, right? Well, on Windows I only remember how to set system wide Environment Variables. Whats not exactly optimal, because we want a way to set it up only for KSP 1.8.0 and newer.

Anyway, this link explains how to do it:

https://www.twilio.com/blog/2017/01/how-to-set-environment-variables.html

You want to create a new ENVVAR called MONO_THREADS_PER_CPU and set it to 1, and see what happens. If your problem persists, it will not help you - so redo the steps above and remove this variable, as it will hinder your Unity games for sure.

On the other hand, if the stunt sticks, you will want to change 1 to a bigger value, the bigger you can set it up without screwing up your computer (neither triggering the TLA again). It's a trial and error process.

Let me know if it works for you! This appears to be important.

Alright, I did the steps and made the "MONO_THREADS_PER_CPU" Enviroment Variable with a value of 1, I will restart and test later to see if it helps, thanks for the information.

Link to comment
Share on other sites

One thing to note is that the environment variable will only affect mono processes run from the shell/cli/terminal it is set within. If you set up a shortcut to launch a terminal to run KSP from a script, with the env var setting in the script, the env var will effectively be in a sandbox unless you run another mono app from within that same shell/cli/terminal

Link to comment
Share on other sites

20 minutes ago, Curveball Anders said:

Excellent analysis!

Just two questions:

1. What's the default value?

2. Does it matter how many cores and threads you have?

 

I don't know what the default is.  KSP reportedly only uses one core ever, which is why this setting can help.  By limiting to a single thread it apparently prevents multiple threads from fighting/waiting for resources via spinlocks.  This is a great find Lisias.  I was just dinking around with the var but it never occurred to me to set it 1.  I've been trying values between 5 and 20, lol

Link to comment
Share on other sites

4 hours ago, Lisias said:

You are running on Windows, right? Well, on Windows I only remember how to set system wide Environment Variables. Whats not exactly optimal, because we want a way to set it up only for KSP 1.8.0 and newer.

Anyway, this link explains how to do it:

https://www.twilio.com/blog/2017/01/how-to-set-environment-variables.html

You want to create a new ENVVAR called MONO_THREADS_PER_CPU and set it to 1, and see what happens. If your problem persists, it will not help you - so redo the steps above and remove this variable, as it will hinder your Unity games for sure.

On the other hand, if the stunt sticks, you will want to change 1 to a bigger value, the bigger you can set it up without screwing up your computer (neither triggering the TLA again). It's a trial and error process.

Let me know if it works for you! This appears to be important.

Would it make any difference to use this on a higher-end system?  and if so, any suggested values for it?

Link to comment
Share on other sites

52 minutes ago, linuxgurugamer said:

Would it make any difference to use this on a higher-end system?  and if so, any suggested values for it?

Don't have a clue, it can shoot backwards easily.

As a rule of thumb, you should not have more threads on spinlocks than the available CPUs (or hyper threads). But by doing so, you sabotage paralelism on your rig.

So there's a sweat spot where you don't screw up too much your cores to the point of saturation and still have decent parallelism. 

Apparently this sweat spot is 2 for i5 mobile - more than this and the whole machine start to stutter without any benefit for the game.

Xeon cores will probably withhold more abuse.

But you will need to try and err your way on your rig.

 

49 minutes ago, darthgently said:

So far, on a 12 core i7, I'm getting about ~30% (guesstimate) faster scene switching with it set to 1.  Actual ascent of a small craft is about the same from what I can tell, but will check with a larger craft later

Geez!!! That much?

Edited by Lisias
tyop! Surprised?
Link to comment
Share on other sites

10 minutes ago, Lisias said:

Don't have a clue, it can shoot backwards easily.

As a rule of thumb, you should not have more threads on spinloxks than the available CPUs (or hyper threads). But by doing so, you sabotage paralelism on your rig.

So there's a sweat spot where you don't screw up too much your cores to the point of saturation and still have decent parallelism. 

Apparently this sweat spot is 2 for i5 mobile - more than this and the whole machine start to stutter without any benefit for the game.

Xeon cores will probably withhold more abuse.

But you will need to try and err your way on your rig.

 

Geez!!! That much?

A guess.  And I'd just restarted the game.  I do have a lot of mods that generate a lot of usage during scene changes.  I dunno.  I'll be more rigorous later

Link to comment
Share on other sites

17 minutes ago, Lisias said:

Don't have a clue, it can shoot backwards easily.

As a rule of thumb, you should not have more threads on spinloxks than the available CPUs (or hyper threads). But by doing so, you sabotage paralelism on your rig.

So there's a sweat spot where you don't screw up too much your cores to the point of saturation and still have decent parallelism. 

Apparently this sweat spot is 2 for i5 mobile - more than this and the whole machine start to stutter without any benefit for the game.

Xeon cores will probably withhold more abuse.

But you will need to try and err your way on your rig.

 

Geez!!! That much?

I have an i9900, will have to test later

Link to comment
Share on other sites

I did some timings on a Mac Mini 5.1 (i5-2415M 2.3 / 16GB RAM) and KSP 1.8.1.

I just fired it up and stop the chronometer on the first chord of the intro music:

default                 5:48
MONO_THREADS_PER_CPU=1  5:10
MONO_THREADS_PER_CPU=2  4:48 (!!!)
MONO_THREADS_PER_CPU=3  5:02
MONO_THREADS_PER_CPU=4  5:03
MONO_THREADS_PER_CPU=5  5:00
MONO_THREADS_PER_CPU=6  5:04
MONO_THREADS_PER_CPU=2  5:00
default                 5:33
MONO_THREADS_PER_CPU=2  5:22 (???)

The values are inconclusive. Apparently, the best loading times are get with 2 threads per CPU on my rig, but the schizophrenic cache mechanism used by MacOS may be screwing up these values greatly. (The use of a spinning disk is also a factor, for sure)

(the tests were made one after the another in a row, with the first test not counted).

I will redo this test  by night, with the machine rebooted (something on Unity appears to need it sometimes) and without anything else running to see what I get.

Link to comment
Share on other sites

Geez…. Things are worse than I thought. Use of spinlocks on user-space was already chewed down by Linus Torvalds!

It's a trend, a nasty and naught trend. To the point that sooner or later this need to be addressed by the anti programmed obsolescence guys. These developers are deprecating hardware by no reason. :(

https://linux.slashdot.org/story/20/01/06/012251/linus-torvalds-calls-bloggers-linux-scheduler-tests-pure-garbage

 

 

On 7/6/2021 at 3:28 PM, darthgently said:

So far, on a 12 core i7, I'm getting about ~30% (guesstimate) faster scene switching with it set to 1.  Actual ascent of a small craft is about the same from what I can tell, but will check with a larger craft later

I didn't had too much improvements, other on the loading time - and even that, as long as I don't reload the game more than 2 or 3 times in a row.

Obviously, I'm already bottlenecking something else on the MacCrap 5.1. However, by using mono threads as 2, I feel the thing is slightly less sluggish, and the whole machine is now performing better. And 8oC (to 10!!)  cooler that using default. I'm serious, my machine is running cooler this way.

What, again, pinpoints a bottleneck somewhere else on the rig - almost surely on the GPU (that, frankly, is terrible - Intel HD 3000).

Another noticeable enhancement is the Main Menu animation - way smoother. What again suggests I'm bottlenecking the GPU - the Main Menu animation is somewhat spartan compared to the Flight Scene.

—— POST EDIT ----

I removed some GPU intensive add'ons (as Waterfall, that I was testing successfully on 1.8.1) and it improved the framerate a lot (almost to the level of 1.7.3 on the same machine!!!). The CPU temperature raised a degree too, so I was right about bottlenecking the GPU. :)

Edited by Lisias
Brute force post edit.
Link to comment
Share on other sites

I don't know what changed in the last few days of CKAN updates or what but now about every time I try to launch a craft it gets hung up on endless messages like the following.  I've had these before, as we previously msgd about, but now they never end and the craft never loads.  What a mess.  It would fill up my drive if I didn't kill KSP.  I've said it before: the main challenge in KSP isn't rocket science.  It is KSP (+mods)

[LOG 10:57:57.284] [PartSet]: Failed to add Resource 1566956177 to Simulation PartSet:54008 as corresponding Part Mk0 Liquid Fuel Fuselage-1825441388 SimulationResource was not found.

Edited by darthgently
Link to comment
Share on other sites

Found this, it appears to be related to the problem when it happens on MacOS:

https://developer.apple.com/forums/thread/124155

Apparently, it describes the situation I think we are having on KSP: a main, high prioritized thread being screwed up by waiting for lower prioritized threads. What should be nonsense on a sane World, as high priority threads "don't call", they are "called".

The "Busy Wait" (or SpinLock) I detected may be an undesired and unexpected side effect of the priority inversion of a main thread, being so this the problem. What suggests a less then ideal overall design of the process. On an educated (but somewhat blind) guess, a inversion of control should be applied here instead, where the low priority thread would "call" the high priority thread when the job is done, instead of having the high priority thread asking the lower priority one "are we there yet?" all the time...

Edited by Lisias
better phrasing
Link to comment
Share on other sites

  • 2 weeks later...

RL hit me in the balls badly on the last 10 days, and so just now I could (minimally) restore core functionalities on my (recently murdered by Apple's <insert your non forum compliant favourite expletive here about how to call people that does less than smart decisions>) MacMini 6.2 . 

On a first stance, yeah - KSP 1.12.x just screwed my MacMini 6.2 performance the same way KSP 1.8 did on my MacMini 5.1.

HOWEVER.

At least for KSP 1.12.1 (as I essentially missed the 1.12.0 due the hard disk recovery problems), I realised that setting the MONO_THREADS_PER_CPU=1 did not helped at all. At least, initially.

The stutter on the rest of the machine was slightly less worst with the MONO_THREADS_PER_CPU stunt, indeed, but still stuttered a bit (moving the Windows around,  playing Youtube videos, etc). And the Main Menu animation was simply horrible - less than 1 frame per second I guess, as the animation was "jumpy" with the kerbals freezing 2 or 3 seconds sometimes.

That was unexpected. I was pretty convinced that the MONO_THREADS_PER_CPU stunt would solve the issue again.

Then I noticed that apparently the performance of the CPU was not being exactly crippled by KSP, but the video updates. The apparent performance of small programs on the terminal apparently was't affected, only the terminal updates. (with the MONO_THREADS_PER_CPU stunt active, just to made it clear).

So apparently I have another problem with similar effects - two different lighting bolts hitting me at the same spot, if you prefer.

Thinking about, checked the KSP change log to see what had changed from 1.11.2 to 1.12.0 and 1.12.1 to see if a light bulb sparkles somewhere inside my dull head when I got an insight: a lot of visual's enhancements were deployed on KSP 1.12. Well, it makes sense, this rig uses a shared memory GPU up to 1.5G of VRAM, 3 times better than the previous rig but still less than the 2G that it's usually the minimal recommended for gaming nowadays (I will ignore the fact that Steam recommends 1GB of VRAM, and states 512MB as the minimum).

So I played a bit, or better, A LOT with  the Settings always starting with the default ones in order to reach the less amount of changes from the default values that would make my rig useable again - at least, on a vanilla (no Mods) installation. And I managed to do that.

So, if you are running KSP 1.12.x on a old rig and/or are facing some serious performance issues, this is what worked for me:

  1. Go to Settings, click Graphics, click Reset Settings, click Accept
  2. Set Render Quality to Good.
  3. Set Texture Quality to Quarter Res.
  4. Set V-Sync to Every Second V-Blank.
  5. Set Frame Limit to Default.
  6. Click Accept

And this solved the second performance issue for me (because, yeah, I still need to use the MONO_THREADS_PER_CPU stunt in order to avoid stuttering too much the rest of the rig).

I still didn't installed any Add'Ons on this thingy, probably I will be forced to reduce somewhat the quantity of add'ons installed for the tests, what will be an annoyance to say the least.

But at least I'm back to business…. (sort of)

— — — POST EDIT — — — 

Installed TweakScale and the mininum set of Add'Ons I use for toying it, and I had to reduce the Texture Quality to Eighth...

Edited by Lisias
post edit
Link to comment
Share on other sites

  • 1 month later...
On 7/7/2021 at 1:56 AM, Lisias said:

@steve_v, if you are still around, I think you will like this one.

I'm not, not really. This is just an echo.

With the end of updates and all the bugs and performance problems it left unsolved I kinda lost interest TBH.
But seriously, spinlocks for thread synchronisation? What is this, 1995?

Link to comment
Share on other sites

18 hours ago, steve_v said:

But seriously, spinlocks for thread synchronisation? What is this, 1995?

To tell you the true, the last time I had read about people resorting to stunts like this it was the 80's when 8 bits computers were still a thing.

The 6502 has a pretty fast and nice instruction to minimize delays while waiting for some hardware to be ready. When you need to react really fast - but really, really fast to the point that a NMI would be too slow (due the internal states to save the PC into the stack and jp into the handler), you would hook the signal to the SOB (pun not intended) pin, and then spinlock waiting it to change a bit on the P register using a TestAndBranch instruction (only 3 CPU cycles, against the 7 cycles needed by NMI, and even that if the CPU was'nt on a SYNC state).

The x86 have the WAIT instruction for similar results: a spinlock implemented internally in micro-code, you can't beat this on software.

And even the venerable IBM-XT already used interrupts for reading the keyboard instead of a busy-wait checking the keyboard I/O pins as it was done on Apple2 and similar machines. And we are talking 1981 and 1982 here.

If you are not working with micro-controllers (single-task, single thread), if you even think on spinlocks you should have your commit privileges revoked and be sent back to a BASIC programming school.

Edited by Lisias
Grammars. Don't you hate this thing?
Link to comment
Share on other sites

This thread is quite old. Please consider starting a new thread rather than reviving this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...