Jump to content

ADVANCED USERS - Improve KSP performance on Intel CPUs with HyperThreading [TESTED]


1096bimu

Recommended Posts

So it has been claimed that:

Okay. I have a very nice i7 2600 in my PC. It's wicked fast and very power efficient for a gaming PC processor and I love it lots.

The problem is it has HyperThreading which in most modern games isn't an issue any more as they multi-thread just fine. However, as most people know, KSP (or at least the physics?) is only capable of using 1 'CPU' or 'Core'. With HyperThreading, Intel divide each physical core into 2 virtual CPUs so, for example, my i7 has 4 physical cores and HT makes that look like 8 virtual cores. Each virtual core is only capable of using a maximum of 50% of the cycles available on the physical core it is on. So what? So that means KSP is confined to 50% of one CPU core's CPU power when it's running. As good as my i7 is, that makes it a bit slow. So what's the miracle cure?

This is for advanced users only who are 100% confident and comfortable changing their motherboard settings in their BIOS/UEFI setup screen, I also provide zero warranty and take zero responsibility for you messing up your PC doing this. It's purely experimental and done at your own risk.

When you boot your PC up to play KSP, enter the BIOS/UEFI settings screen (this is usually a key press at the power-on self-test screen like Del or F2). If in doubt do some RTFM for your motherboard. Navigate to the CPU settings and look for the HyperThreading option. Set it to OFF if you can. You shouldn't need to change anything else. Save your new settings and reboot.

When you reboot the PC after the change you will notice if you open Task Manager in Windows that there are half the number of CPU graphs (if you have it set to graph per CPU) and they match the number of physical cores on your CPU. Also, now, KSP can use 100% of one of the physical cores to run meaning you get 2x the available CPU power you had before. It runs WAAAY smoother on my rig now I've done this and as far as I can see Windows 7 doesn't care and neither do any of my other games or programs.

I have a feeling it might be a possible cure for some people's lag issues.

I'm sorry but I really have a problem with certain kinds of people. In particular, overconfidence, false claims/ misleading information, and illogical responses like:

Argue it out if you must. It worked for me (quantifiable improvement, no voodoo involved), it might not work for others. I'm done here.

I may yet ask for this thread to be deleted because a helpful suggestion has basically turned into a mud-throwing match, which wasn't my intention.

===============================================================================================================================

So here is a few hours of my time dedicated to test this myth.

aGsHmZj.gif

Test Platform:

Intel Core i7 3930k 6-core CPU 6x256k L2, 12MB L3 @4.0Ghz

Gigabyte X79S-UP5

4 x 4GB DDR3 9-10-9-27 2T 1866Mhz

ATI Radeon HD 7970 3GB @ 1100Mhz

Windows 7 Ultimate X64

Procedure:

VjtCrlu.jpg

Test rocket used, a Saturn V replica with 781 parts. I got it from the spacecraft exchange forum.

Test is performed by launching the same vehicle with MechJeb auto launch, wit the same ascend path settings.

The frame times are recorded for 120 seconds with Fraps.

Hit Esc, restart launch and do the same thing for 2 more times.

The procedure is repeated after reboot with other CPU and memory setups.

*No other vehicle or debris is present in the scene

*No other mission is ongoing during the launch

*The same webpage, which is the general discussions of this forum is open with Firefox 4 and minimized while the test is running.

CPU settings:

*To take into account of the possibility that total CPU resource influences the results, the same test in run with different settings to simulate other i7 models since they all have a similar architecture.

Default mode:

2cSQKGo.png

i7-2600 mode:

Reduced frequency to 3.8Ghz

Reduced core count to 4

Reduced memory bandwidth to dual channel 1600Mhz

nVk8Qb0.png

i7 4500U (mobile) mode:

Reduced frequency to 3.0Ghz

Reduced core count to 2

Reduced memory bandwidth to dual channel 1333Mhz

fDE7mKW.png

Results:

For maximum precision, FPS is calculated from frame times by dividing the total number of frames rendered by the frame time of the last frame. (higher is better)

Render time of each frame is calculated by subtracting the current and previous frame time. (lower is better)

standard deviation is calculated for render time to assess frame jitter. (lower is better)

nCIkGlv.png

Analysis:

-Disabling HT not only does not improve game performance, it has consistently reduced performance on all 3 settings.

-Disabling HT will also further reduce game performance, the less total CPU performance is available.

-Disabling HT also increases frame jitter, which increases perceptual lag, even if average FPS is unchanged.

Conclusion:

Disabling Hyper Threading will only reduce game performance under all 3 tested configurations.

Original spreadsheet for detailed results:

https://docs.google.com/file/d/0B9Rbd1PQiNycYXI5cU9BU2N3R3M/edit?usp=sharing

Extraordinary claims require extraordinary evidence.

The easiest person to fool in the entire world is yourself.

Edited by 1096bimu
Link to comment
Share on other sites

Wow, that's pretty conclusive. You really put a lot of work into this, which is much appreciated. It is a little disappointing that there's no magic bullet to frame rate issues, but it's a valuable service to put myths to rest.

Link to comment
Share on other sites

dat scientific method

This was a triumph.

I'm making a note here: HUGE SUCCESS.

It's hard to overstate my satisfaction.

1096bimu Science

He does what he does

because he can.

For the good of all of us.

Except the ones who are dead.

But there's no sense crying over every mistake.

Sadly there are so many voodoo priests in this forum.

Link to comment
Share on other sites

Thanks for putting the time in to do this. It does suck though that there is no work around. The game gets unplayable for me at about 800 parts and I find myself having to limit my creativity because of it. Idk what squad can do when it comes to Unity, but something has to change. Infact this should be priority number one. Get this game OPTIMIZED... not ... flags..

Link to comment
Share on other sites

Thanks for putting the time in to do this. It does suck though that there is no work around. The game gets unplayable for me at about 800 parts and I find myself having to limit my creativity because of it. Idk what squad can do when it comes to Unity, but something has to change. Infact this should be priority number one. Get this game OPTIMIZED... not ... flags..

800 ? Pretty decent machine i must say. mine starts to mess up around 350 parts loaded...

Link to comment
Share on other sites

Thanks for putting the time in to do this. It does suck though that there is no work around. The game gets unplayable for me at about 800 parts and I find myself having to limit my creativity because of it. Idk what squad can do when it comes to Unity, but something has to change. Infact this should be priority number one. Get this game OPTIMIZED... not ... flags..

Working with the limits can help.

I'm aiming at getting individual craft down to 50 parts. Then even docking etc should not over do 200.

Link to comment
Share on other sites

Maybe I'm reading it wrong, but you state in you conclusion that no-HT causes performance drops, but the blue bars on the left graph all go out farther than the red, and you state that the farther out the bar goes, the better. In the right graph, two out of the three blue bars are shorter, which you state is better. Do your graphs not correlate with your results?

Link to comment
Share on other sites

Maybe I'm reading it wrong, but you state in you conclusion that no-HT causes performance drops, but the blue bars on the left graph all go out farther than the red, and you state that the farther out the bar goes, the better. In the right graph, two out of the three blue bars are shorter, which you state is better. Do your graphs not correlate with your results?

For FPS, higher is better, HT outperformed no HT in all 3 cases.

for Frame jitter, lower is better, in two of the cases HT out performed no HT, and for the middle one the reverse is true.

However, since these differences are really small and probably not even statistically significant (which means they could have been caused by chance alone), on top of the problem that FPS is probably more important than jitter anyway.

Link to comment
Share on other sites

For FPS, higher is better, HT outperformed no HT in all 3 cases.

for Frame jitter, lower is better, in two of the cases HT out performed no HT, and for the middle one the reverse is true.

However, since these differences are really small and probably not even statistically significant (which means they could have been caused by chance alone), on top of the problem that FPS is probably more important than jitter anyway.

So, are you basing your conclusion on the assumption that the negligible difference, if the experiment were done a few hundred times, would actually show a not-so-negligable negative effect, than the slight positive effect?

Link to comment
Share on other sites

So, are you basing your conclusion on the assumption that the negligible difference, if the experiment were done a few hundred times, would actually show a not-so-negligable negative effect, than the slight positive effect?

I take the frame jitter information to be complementary, and the FPS data to be dominate. But you're right, it is probably more statistically valid to say that there is no difference between HT on and off.

Link to comment
Share on other sites

I take the frame jitter information to be complementary, and the FPS data to be dominate. But you're right, it is probably more statistically valid to say that there is no difference between HT on and off.

Aiding science at midnight... woo!... *passes out*.

Link to comment
Share on other sites

Are you sure that lesser CPU's with lower cache levels will act the same way? I'm not denying your results but I'm not sure you've "simulated" 8Mb and 4Mb shared cache CPU's, shutting down cores on a 12Mb shared cache processor still leaves the full 12Mb available? It's been suggested that on-die cache size is an important factor in large mathematical calculations - not by me though, I'm not saying it will make any difference, I wouldn't expect HT on or off to make much difference at all unless the system was loaded with other tasks, just that presenting data as "simulated" across a range of high to low end CPU's just by changing clock speed/core count may not be entirely accurate.

Link to comment
Share on other sites

Are you sure that lesser CPU's with lower cache levels will act the same way? I'm not denying your results but I'm not sure you've "simulated" 8Mb and 4Mb shared cache CPU's, shutting down cores on a 12Mb shared cache processor still leaves the full 12Mb available? It's been suggested that on-die cache size is an important factor in large mathematical calculations - not by me though, I'm not saying it will make any difference, I wouldn't expect HT on or off to make much difference at all unless the system was loaded with other tasks, just that presenting data as "simulated" across a range of high to low end CPU's just by changing clock speed/core count may not be entirely accurate.

Well that's why it's called a simulation, nobody said it was gonna be entirely accurate. My guess is that having less cache might favor no HT, since there's probably less threads to keep track off? But again, that's just a guess.

Link to comment
Share on other sites

I love you. Nothing like shooting down misinformation with extreme prejudice.

Only one minor thing to note - if you are using XP, the scheduler is unaware of which logical cores are on the same physical core, and so it's *possible* for two CPU-intensive threads to run slower. Still, extremely niche case. This is a nonissue in w7 - and i'd assume in vista+w8 too - as the scheduler is aware of which logical cores share physical cores and so schedules threads appropriately.

Oh, and for additional hilarity, the only way a single physical core would effectively run two logical threads at half speed is if both threads were no more than while(true) loops. In real world usage scenarios the worst-case performance per thread given two on the same physical core tends to be around 65-75% - assuming both would use 100% if they could. Depends on the type of work, of course.

Edit: Oop, i see the original OP (who is being responded to by this thread) mentioned using W7. Yep, complete FUD and/or placebo effect.

Edited by ZigZagJoe
Link to comment
Share on other sites

Great stuff! The original post this was directed at was so far off base I didn't bother wasting the time to reply lol.

Now if your up for more science:

Determine your max OC using hyperthreading

Determine your max OC using no hyperthreading (Should marginally more stable and be a bit higher clock than above)

Benchmark both of those in KSP similar to what you did here. I'm thinking the non-hyperthreading for this will win the performance test 8)

Link to comment
Share on other sites

This thread is quite old. Please consider starting a new thread rather than reviving this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...