[0.90] TestFlight [0.4.6.1][04FEB15] -Configurable, extensible, parts research & reliability system

Agathorn · January 2, 2015

Looks like we crossed posts. I updated my prior message.

Thanks. Is your custom failure module working with 0.1.0 as well?

I've reproduced the bugs on my dev build so i'll push a release out soon that fixes things up.

including interfering with other KSP windows

Can I get more details on this please?

- - - Updated - - -

I have tracked down both the issue with the MSD not showing failures, as well as why the configs are bad. The MSD update is already fixed, and I am working on fixing the configs right now. Once that is done I will push a 0.2.1 patch.

kujuman · January 2, 2015

In both 0.1.0 and 0.2.0, I've attempted to add a custom failure module, basically a direct copy of failure_explode for now. The game is not finding the module.

Module Manager addition


 MODULE
 {
  name = TestFlightFailure_Turbopump
  failureTitle = Turbopump sync error
  failureType = mechanical
  severity = major
  weight = 32
 }

The code

[FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]using [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]System;[/SIZE][/FONT][/SIZE][/FONT]
[FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]using [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]System.Collections.Generic;[/SIZE][/FONT][/SIZE][/FONT]
[FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]using [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]System.Linq;[/SIZE][/FONT][/SIZE][/FONT]
[FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]using [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]System.Text;[/SIZE][/FONT][/SIZE][/FONT]
[FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]using [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]System.Threading.Tasks;[/SIZE][/FONT][/SIZE][/FONT]
[FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]
[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]using [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]KSP;[/SIZE][/FONT][/SIZE][/FONT]
[FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]using [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]UnityEngine;[/SIZE][/FONT][/SIZE][/FONT]
[FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]using [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]TestFlightAPI;[/SIZE][/FONT][/SIZE][/FONT]
[FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]
[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]namespace [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]TestFlight[/SIZE][/FONT][/SIZE][/FONT]
[FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]{
[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]public [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]class [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#2b91af][FONT=Consolas][SIZE=2][COLOR=#2b91af][FONT=Consolas][SIZE=2][COLOR=#2b91af]TestFlightFailure_Turbopump[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] : [/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#2b91af][FONT=Consolas][SIZE=2][COLOR=#2b91af][FONT=Consolas][SIZE=2][COLOR=#2b91af]TestFlightFailureBase[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#2b91af][FONT=Consolas][SIZE=2][COLOR=#2b91af][FONT=Consolas][SIZE=2][COLOR=#2b91af]
[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]    {
[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]public [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]override[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff] void[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] DoFailure()[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]
        }
[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#2b91af][FONT=Consolas][SIZE=2][COLOR=#2b91af][FONT=Consolas][SIZE=2][COLOR=#2b91af]Debug[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2].Log([/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515]"Turbopump sync error detected!"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]);[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]
        }
[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]public [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]override [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]bool[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] CanAttemptRepair()[/SIZE][/FONT][/SIZE][/FONT]
[FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]        {
      [/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]return [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]false[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2];
}[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]
[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]public [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]override [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]bool[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] AttemptRepair()[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]
        {
[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]return [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]false[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2];[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]
       }
[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]private [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]void[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] FailPump()[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]
{
;
}
}
}
[/SIZE][/FONT][/SIZE][/FONT]

The error from KSP

[ERR 12:50:48.691] Cannot find a PartModule of typename 'TestFlightFailure_Turbopump'

I just need to fiddle some more getting it to work, I'm very much guess and check at this point. The .dll this compiles to is being put in GameData\TestFlight\ right now, I'll keep trying new locations and namespaces etc until I get success. Don't worry about it

On the GUI issues, I've reinstalled 0.2.0, and they are no longer there. I also got rid of some other mods in the install, so I probably was misattributing them. The bug was the prevention of KSP windows (the Esc menu, Alt-F2 menu, and the Alt-F12 menu) from appearing, which in my experience was nullref in the GUI from a mod (they were the bane of AdvSRB for a while).

Edited January 2, 2015 by kujuman

Agathorn · January 2, 2015

I haven't yet done a full test of the API from outside my project completely, although inside my project the existing core modules are actually a sub project that link the API .dll file, so its roughly the same thing.

Just to clarify the obvious, you are referencing the TestFlightAPI.dll in your project? The error you are getting from KSP implies that KSP is not loading the PartModule. I would troll through the startup sequence in your log and see if your DLL is even getting loaded. That's where I would start anyway.

One thing I had planned to do, and forgot, was build a completely separate example module project as a quickstart for modders. So I will do that as well.

- - - Updated - - -

On the GUI issues, I've reinstalled 0.2.0, and they are no longer there. I also got rid of some other mods in the install, so I probably was misattributing them. The bug was the prevention of KSP windows (the Esc menu, Alt-F2 menu, and the Alt-F12 menu) from appearing, which in my experience was nullref in the GUI from a mod (they were the bane of AdvSRB for a while).

Well the bug with the MSD not updating when certain types of failures occured was causing an NRE.

kujuman · January 2, 2015

I haven't yet done a full test of the API from outside my project completely, although inside my project the existing core modules are actually a sub project that link the API .dll file, so its roughly the same thing.
Just to clarify the obvious, you are referencing the TestFlightAPI.dll in your project? The error you are getting from KSP implies that KSP is not loading the PartModule. I would troll through the startup sequence in your log and see if your DLL is even getting loaded. That's where I would start anyway.

Yes, TestFlightAPI.dll is being referenced. KSP appears to load it fine

[LOG 14:03:21.673] Load(Assembly): /TestFlight_Failures_01
[LOG 14:03:21.674] AssemblyLoader: Loading assembly at C:\Program Files (x86)\KSP 0.90 DEV\GameData\TestFlight_Failures_01.dll

And still

[ERR 14:04:01.739] Cannot find a PartModule of typename 'TestFlightFailure_Turbopump'

I've never used interfaces or anything, so I wouldn't be surprised if I'm screwing up something there. My assumption is that KSP is failing to recognize TestFlightFailure_Turbopump as a PartModule, but like I said, I have no idea how interfaces work

---Edit---

I'm going to check if there's anything in the Alt-F12 screen

--Edit 2---

Found the issue! My .dll was loading into KSP before TestFlightAPI.dll, and so it was cleaned up when the loader couldn't find the base class. The module appears to work in game now

I renamed my .dll to zzTestFlight_Failures_01.dll and placed it in the same folder as TestFlightAPI.dll to get it to load. Now to making the module do some fun times

Edited January 2, 2015 by kujuman

Agathorn · January 2, 2015

My guess at this point would be that it isn't inheriting from TestFlightFailureBase properly. Though that makes no sense because it wouldn't even compile in that case. Perplexing.

- - - Updated - - -

--Edit 2---
Found the issue! My .dll was loading into KSP before TestFlightAPI.dll, and so it was cleaned up when the loader couldn't find the base class. The module appears to work in game now
I renamed my .dll to zzTestFlight_Failures_01.dll and placed it in the same folder as TestFlightAPI.dll to get it to load. Now to making the module do some fun times

Awesome! And now I feel slightly bad because that did occur to me earlier then when I was typing up a response I forgot to ask that! Damned senior moments. I think it might be best for me to make the TestFlightAPI.dll be _TestFlightAPI.dll then.

- - - Updated - - -

New Release

v0.2.1 Alpha

https://github.com/jwvanderbeck/TestFlight/releases/tag/v0.2.1

Change Log

v0.2.1 Alpha Release

Fixed Master Status Display not showing some failures
Fixed configs for stock parts

Agathorn · January 4, 2015

v0.2.1 experimental 1

https://github.com/jwvanderbeck/TestFlight/releases/tag/v0.2.1e1

This is an EXPERIMENTAL release. Experimental releases are very much "development snapshots" and are released for the express purpose of getting user feedback or testing on a very specific bug or feature that is being worked on. If testing an experimental release, please limit feedback to the scope of that release.

This is an alpha release and thus should be assumed to be buggy, and capable of breaking your game and game saves.

Experimental release of the new reworked GUI. This is still very much a WIP. This release adds a settings GUI and enables the user to modify TestFlight settings directly in the UI. Settings are saved as you update them.

Added new GUI system courtesy of **TriggerAu**
Added new Settings dropdown to the TestFlight Window. This panel will allow you to modify all global TestFlight settings.

Agathorn · January 4, 2015

FYI something I forgot to mention. GUI Experimental 1 is a test of functionality, not appearance. What I am looking for here is feedback on the usability of the new UI. Experimental 2's focus with be on looks.

razark · January 4, 2015

I've been running 0.2.1 for a few flights now.

1. Failures seem to be rather frequent. Is this how it should be, or is it something that you're planning on toning down later?

2. I've seen fuel tank leaks and engine thrust loss. I previously had problems with the small capsule, but now it doesn't even show the debug option. Did I break something, or was this intentional?

3. Failures are now showing on the main panel. I'm wondering if it might be possible to have an option to suppress display of parts that have not failed, to avoid clutter.

4. Is there any thought on tying this into the contract system? It would make a lot more sense than the current part testing contracts. Maybe a "Get us X amount of data from part Y in Z environment".

5. Is there any thought on tying this into the science system? For every so much flight data you return, you get so many science points.

The new UI in v0.2.1.e1:

I've had no problem

It moves! (But it doesn't stay in the new location.)

What is the "Enable HUD in Flight Scene" option? Some of the items have mouse-over descriptions, but this was one that did not.

Agathorn · January 4, 2015

I've been running 0.2.1 for a few flights now.
1. Failures seem to be rather frequent. Is this how it should be, or is it something that you're planning on toning down later?

This is something that is really hard to say. I know what you mean and I haven't figured out how to approach it yet. The way it works right now is every X period of time, it polls all the parts for a potential failure based on reliability and if they fail that check, then they get a random failure. Problem is if you are checking for a failure say every 30 seconds, then even something with say 80% reliability has a lot of chances to potentially fails.

In the new UI you will see that you can change the time between failure checks, and one way to "make it easier" is to simply make it check less often. I'm not sure if that is the solution or not though. On the other hand, I could do something like it has to fail 2 checks before it fails, or something like that but then you start stepping on the whole reliability system.

tl;dr I don't have a solid answer for you but i'd love input and thoughts on the matter.

2. I've seen fuel tank leaks and engine thrust loss. I previously had problems with the small capsule, but now it doesn't even show the debug option. Did I break something, or was this intentional?

No it being on the capsule was actually a mistake and fixed in 0.2.1

3. Failures are now showing on the main panel. I'm wondering if it might be possible to have an option to suppress display of parts that have not failed, to avoid clutter.

This is actually the intention behind the upcoming Flight HUD which you mention below. The in flight HUD will be a very condensed view only showing failures.

4. Is there any thought on tying this into the contract system? It would make a lot more sense than the current part testing contracts. Maybe a "Get us X amount of data from part Y in Z environment".

I hadn't thought about, and in fact TestFlight was conceived of long before contracts were added, that but its a neat idea. At the moment I know nothing about how contracts work in the API, but I will look into it. Can I possibly get you to submit that as a feature request on GitHub so I don't forget?

5. Is there any thought on tying this into the science system? For every so much flight data you return, you get so many science points.

That could get OP pretty quickly I think, but let me do some thinking on the matter.

The new UI in v0.2.1.e1:
I've had no problem
It moves! (But it doesn't stay in the new location.)
What is the "Enable HUD in Flight Scene" option? Some of the items have mouse-over descriptions, but this was one that did not.

Guess I forgot to save the window position See above for the info on the HUD.

Thanks for your detailed feedback. Very much appreciated!

kujuman · January 4, 2015

0.2.1 Ex1

In addition to the two points by razark (the gui not staying put is particularly an issue since it can get in the way of KSP windows (example, the AppLauncher and the show options panel button interferes with the staging list in the VAB)

Ã¢â€“ÂºGlobal Reliability Modifier: This wasn't intuitive for me, I was expecting it to be a multiplier like some of the other settings, so when it went negative I became very...concerned. It took me a bit to realize that it added percentage points to reliability. So a setting of -25 would turn a 100% reliable part into 75% reliable? And a 50% reliable part into 25%? It makes sense for it to be additive (so a 100% reliable part could be changed to being 99% etc., but I just had a bit of a time figuring this out.

EDIT: Just set this at 25, and now the MSD is reporting reliabilities at like 1500% - 2000%, so now I think it's a multiplier?

Ã¢â€“ÂºI do like that the options drop-down is persistent between scenes

Ã¢â€“ÂºFlying the default KerbalX, the MSD still was jittery with the options panel open; with the options open, the MSD was longer than my resolution height (I think 1600x900).

- - - Updated - - -

This is something that is really hard to say. I know what you mean and I haven't figured out how to approach it yet. The way it works right now is every X period of time, it polls all the parts for a potential failure based on reliability and if they fail that check, then they get a random failure. Problem is if you are checking for a failure say every 30 seconds, then even something with say 80% reliability has a lot of chances to potentially fails.
In the new UI you will see that you can change the time between failure checks, and one way to "make it easier" is to simply make it check less often. I'm not sure if that is the solution or not though. On the other hand, I could do something like it has to fail 2 checks before it fails, or something like that but then you start stepping on the whole reliability system.
tl;dr I don't have a solid answer for you but i'd love input and thoughts on the matter.

I have some ideas on this. I don't really have the statistics background to make the math work off the top of my head, but if given a bit of time I could probably make it work.

1) Failures should not be constant rate for some items, particularly ones that "start". For example, engines have a very high risk of not starting correctly, but once they are stabilized (0-6 seconds or whatever), the risk of failure is pretty low. Fuel tanks and similar items that don't "start" maybe need a constant risk of failure.

2) The current system of failures means that when failures do occur, they tend to occur in packs. This is sorta desirable, I'd imagine that a failure would increase the risk of failures for some time. So if a vessel experiences a MAJOR failure, maybe bump the vessel failure rate up by 25% for 3 seconds or something?

3) The concept used in real life for reliability is Mean Time Between Failures (MTBF). Fuel tanks maybe should have a MTBF and a StandardDeviation rating rather than a %.

3.1) MTBF calculations would be divided into periods.

3.2) We assume (for simplicity) that actual failure times are normally distributed around MTBF for certain items (like fuel tanks).

3.a) At the start of a period, the partmodule rolls a random number (0-1), this is our P-value

3. Use the P-value to reverse-lookup a z-value from a normal table (see the example 2 under the table here http://www.normaltable.com/)

3.c) Multiply the z-value by the StandardDeviation in the partmodule to determine the failure offset time from MTBF.

3.d) Save the FailureTime as UT.

3.e) At FailureTime, roll to determine which failure occurs, and do the failure.

3.f) After the failure is repaired (or at launch), start again at 3.a

Example:


A fuel tank has a MTBF of 1500 seconds and a StandardDeviation of 300 seconds.

When the fuel tank is launched (or activated, or whatever), roll a random number, 0-1. We rolled a 0.236. Our p-value is 0.236 ([I]3.a[/I])

Lookup (we may be able to compute this as well, there probably is a Math function on the system) the z-value from our p-value. (For reference, normaltable.com only goes 0.5 -> 1, so we do 1 - 0.236 = 0.764, and multiply the z-value by -1). The z-value of 0.764 is right about 0.72. We multiply by -1 to evaluate 0.236, and our z-value is -0.72. ([I]3.b[/I])

-0.72 * 300 is -216. -216 is the failure offset time from MTBF ([I]3.c[/I])

FailureTime is UT + MTBF + Failure offset time. If we just launched a new file, UT is 0 seconds. 0 + 1500 + (-216) = 1284. Save 1284 in the part module. ([I]3.d[/I])

Not sure the most efficient way to do this in game, but when UT (the master time in KSP) reaches 1284 (21m24s after liftoff), roll a failure using the way it's already done. ([I]3.e[/I])

I have some ideas about using FloatCurves (or AnimationCurves, same thing really) for more complicated cases such as part activation, but you get the idea. For things that cycle like solar panels, a duty limit might make sense, so instead of a mean time between failures, we evaluate mean cycles between failures. Same math and concept, just without worrying about the clock.

---EDIT---

I'll look at it more later, and this is also so I don't lose it, but an open source (not sure of license) stats package is at https://github.com/mathnet/mathnet-numerics/tree/master/src/Numerics/Distributions It looks chock full of things. The exponential distribution looks like it'd be good for engines.

Edited January 4, 2015 by kujuman

Agathorn · January 4, 2015

0.2.1 Ex1
In addition to the two points by razark (the gui not staying put is particularly an issue since it can get in the way of KSP windows (example, the AppLauncher and the show options panel button interferes with the staging list in the VAB)

So what you are saying is that in the VAB it is too far to the right? I will look into that (in addition to letting the window be moved properly)

Ã¢â€“ÂºGlobal Reliability Modifier: This wasn't intuitive for me, I was expecting it to be a multiplier like some of the other settings, so when it went negative I became very...concerned. It took me a bit to realize that it added percentage points to reliability. So a setting of -25 would turn a 100% reliable part into 75% reliable? And a 50% reliable part into 25%? It makes sense for it to be additive (so a 100% reliable part could be changed to being 99% etc., but I just had a bit of a time figuring this out.
EDIT: Just set this at 25, and now the MSD is reporting reliabilities at like 1500% - 2000%, so now I think it's a multiplier?

Is it indeed supposed to be a straight +/- modifier. At -25% your reliability should be 75% on a 100% reliable part. So what you are seeing is definitely a bug.

Ã¢â€“ÂºFlying the default KerbalX, the MSD still was jittery with the options panel open; with the options open, the MSD was longer than my resolution height (I think 1600x900).

Doh that isn't good. I will make it so that the window has a maximum height and scrolls if it needs more space.

1) Failures should not be constant rate for some items, particularly ones that "start". For example, engines have a very high risk of not starting correctly, but once they are stabilized (0-6 seconds or whatever), the risk of failure is pretty low. Fuel tanks and similar items that don't "start" maybe need a constant risk of failure.

The idea was that reliability modules would take care of this by increasing or decreasing the aggregate reliability under certain conditions. So for example if an engine is throttled higher or is under higher heat loads, a reliability module might penalize the overall reliability. It could also do the same thing for initial ignition.. sort of. Since parts are only polled every so often, that wouldn't work quite as expected.

I like your ideas on a MTBF system except I fear it makes things over complicated, and it is also hard to fit into the game play, because your calculated UT offset might occur when the part isn't even on an active vessel.

All these ideas are good, but would require some major refactoring of the system. Now i'm not against doing that, and I am not going to dismiss good ideas just because they mean more work, or major changes to my "vision" so let me spend some time mulling over things, and see how it works out. I know for one thing I don't currently like how failures tend to happen in batches, so i'd like to introduce some variability in that anyhow.

On the flip side I will point out how failures were extremely common especially in the early days of unmanned rocketry. Look at RTV-G-4 Bumper with 3 successful flights out of 8, or Vanguard, 3 successes out of 11 launches. I look at something like this page (http://www.windows2universe.org/space_missions/unmanned_table.html) and I think it captures exactly what I am trying to provide. At the top of the list, the early launches, you have lots of failed missions and as you get farther down the list, more and more missions start succeeding. But even then there are the occasional failures, and even to this day they still happen every now and then.

I think those early days of your space program should be will sprinkled with problems such as stages not igniting, or thrust being lower than expected, antennas not deploying, and yes big explosions every now and then That is what I am striving to provide and with everyone's great ideas I think we will get there

Let me spend some time thinking on ways of reworking the system to be a bit more fun and dynamic.

kujuman · January 4, 2015

Let me spend some time thinking on ways of reworking the system to be a bit more fun and dynamic.

Oh, I wasn't even aware of the reliability modules. I think if the core system is able to receive instant failures from reliability modules on parts, then there's no need to change anything. Maybe a public method in the core module; FailSinglePart(Part) could roll the failure mode. A reliability module could then call FailSinglePart(this.part) so the failure doesn't have to wait until the update timer (this may be what you had in mind, I didn't read through the code yet). I think that'd allow the best of both worlds, since a reliability module could do MTBFs.

Agathorn · January 4, 2015

Oh, I wasn't even aware of the reliability modules. I think if the core system is able to receive instant failures from reliability modules on parts, then there's no need to change anything. Maybe a public method in the core module; FailSinglePart(Part) could roll the failure mode. A reliability module could then call FailSinglePart(this.part) so the failure doesn't have to wait until the update timer (this may be what you had in mind, I didn't read through the code yet). I think that'd allow the best of both worlds, since a reliability module could do MTBFs.

Yeah the problem is right now it doesn't work that way Currently the system is essentially a "pull" model rather than a "push". Everything is controlled by the Scenario which polls all the parts and asks for an update. I did it that way planning to eventually support TestFlight on ALL craft not just the active vessel. Unfortunately after some research it turns out that really isn't doable. So TestFlight will have to be restricted to the active vessel, and thus I Can rework the system to a "Push" model instead, which will allow things to work like we are talking about, and more.

So once I am finished with the GUI rewrite, i'm going to get everything merged back in, stabilized a bit, then break off into another dev branch to rework the system to a push based system.

razark · January 4, 2015

This is just a quick observation from last night, as I haven't tried to replicate it. I loaded a ship on the pad, and then paused to go do something for a few minutes. When I got back and unpaused, I noticed that every part showed a failure status. It seems as though the failure polling is still taking place while the game is paused on an active ship.

I'm loading up the game to see if I can replicate it.

Edit:

Same vehicle I was using for tests last night, but now I'm getting "TestFlight is not currently tracking any vessels."

Edited January 4, 2015 by razark

Agathorn · January 4, 2015

When you say paused, you mean you had the menu open? I think this might be a consequence of the "pull" system as I outlined above. The part is probably being polled anyway because the game doesn't really pause.

razark · January 4, 2015

Hit "Esc", and sit there with the "Resume/Space Center/Revert" options.

Hrm. And now I've gone from "not tracking vessel" to showing two failures.

mysteriosmind · January 5, 2015

How about only checking for a failure, when there is some change in situation that could result in that failure, instead of checking in regular intervals? So for a rocket engine the system checks for a failure every time you start/stop, accelerate/decelerate the engine, its temperature changes a lot (either by use or getting really cold in space), g-forces changing a lot, it running at full throttle for a long time, it moving into a strong magnetic field (Jool, Kerbol), the outside pressure changing, etc. ... A system like that would take away a bit of the randomness of the failures while at the same time making them meaner, as you are more likely to encounter them when you use a part.

As for earlier missions ending in failures more often, how about having more than one reliability value? Like you got the part value you already use, that goes up relatively fast, then one for the part class (like all Liquid fuel+Oxidizer engines) that takes longer to go up and maybe even a third one for your entire space program that takes really long to change. The chance for a failure would then take them all into account. That way the reliability goes up the longer you play while suffering a penalty to it when trying out untested technologies (new part classes).

Ohh and here are a few more failures I thought of (just ignore those that have been mentioned before):

- Engine throttle being locked in place (you can't accelerate/decelerate, nor turn it off)

- Docking port failing to engage (you get it in the moment they normally connect and have to repair it before you can make another attempt)

- Structural failures (basically the part blowing up) for all parts when going really fast in the atmosphere (drag) or pressure getting really high or colliding at a lower speed than the normal crash tolerance

- Everything that uses electric charge using a lot more than normally

- Batteries losing a certain amount of the max charge

- Batteries not being able to recharge

- Fuel tanks having the fuel flow to the engine interrupted (the engine acting as if no fuel is left, even though there is)

- electric circuit being interrupted, so that electric charge can't flow through a certain part until fixed (maybe instead of just electric charge it could happen for every resource)

NonWonderDog · January 6, 2015

Ultimately I don't think you're going to be able to avoid the literature on reliability analysis if you want something that feels "right." You at least need to use Bayes' Theorem to set it up so the update rate doesn't change the number of failures per hour.

The wiki page is a decent start: http://en.wikipedia.org/wiki/Failure_rate

This random Powerpoint is okay: http://www.wilsonconsultingservices.net/MTBF_M2.pdf

The simplest way that would give good results would be to report the reliability of each part as MTBF, with a constant hazard rate derived from that (2000 hours MTBF = 0.0005 failures/hour = 1.39e-7 failures/sec). That's an exponential failure density, which is fine. If you want to be fancy you can let parts define one each of exponential, Weibull, lognormal, etc. failure densities and use the greatest hazard rate (which would let you define a bathtub curve for parts), but I don't really know how you'd report that data to the player. Presumably you'd have to average it into the MTBF score somehow.

Plus, you could have it literally say "mean time between failures = 12 seconds" on the starting boosters. That's a lot more fun that "50%."

Making engines more likely to fail during starts, etc, is a good idea, but it needs to be tied to the overall reliability. I'd recommend you just have a multiplier to the base hazard rate that applies for two seconds or something, maybe as a oneshot floatcurve. So if you engine is 0.01/hour likely to fail at any time, it's up to 10x more likely to fail (0.1/hour) during the two seconds after an engine start. And maybe the reliability is increased 100x when it's off. Again all of that is hard to communicate to the player. (Maybe you need separate active/inactive reliability scores.)

For inactive vessels, since you'll be using Bayes' theorem you can just sum up the probability that a failure occurred while you were away. And boo on your Kerbals for not telling you their life support was broken, I guess?

Edited January 6, 2015 by NonWonderDog
terrible spelling

Agathorn · January 6, 2015

How about only checking for a failure, when there is some change in situation that could result in that failure, instead of checking in regular intervals? So for a rocket engine the system checks for a failure every time you start/stop, accelerate/decelerate the engine, its temperature changes a lot (either by use or getting really cold in space), g-forces changing a lot, it running at full throttle for a long time, it moving into a strong magnetic field (Jool, Kerbol), the outside pressure changing, etc. ... A system like that would take away a bit of the randomness of the failures while at the same time making them meaner, as you are more likely to encounter them when you use a part.

This is actually what I am leaning towards.

Ohh and here are a few more failures I thought of (just ignore those that have been mentioned before):
- Engine throttle being locked in place (you can't accelerate/decelerate, nor turn it off)
- Docking port failing to engage (you get it in the moment they normally connect and have to repair it before you can make another attempt)
- Structural failures (basically the part blowing up) for all parts when going really fast in the atmosphere (drag) or pressure getting really high or colliding at a lower speed than the normal crash tolerance
- Everything that uses electric charge using a lot more than normally
- Batteries losing a certain amount of the max charge
- Batteries not being able to recharge
- Fuel tanks having the fuel flow to the engine interrupted (the engine acting as if no fuel is left, even though there is)
- electric circuit being interrupted, so that electric charge can't flow through a certain part until fixed (maybe instead of just electric charge it could happen for every resource)

Thanks!

Ultimately I don't think you're going to be able to avoid the literature on reliability analysis if you want something that feels "right." You at least need to use Bayes' Theorem to set it up so the update rate doesn't change the number of failures per hour.
The wiki page is a decent start: http://en.wikipedia.org/wiki/Failure_rate
This random Powerpoint is okay: http://www.wilsonconsultingservices.net/MTBF_M2.pdf

Thanks for the links, I appreciate the reading.

The simplest way that would give good results would be to report the reliability of each part as MTBF, with a constant hazard rate derived from that (2000 hours MTBF = 0.0005 failures/hour = 1.39e-7 failures/sec). That's an exponential failure density, which is fine. If you want to be fancy you can let parts define one each of exponential, Weibull, lognormal, etc. failure densities and use the greatest hazard rate (which would let you define a bathtub curve for parts), but I don't really know how you'd report that data to the player. Presumably you'd have to average it into the MTBF score somehow.
Plus, you could have it literally say "mean time between failures = 12 seconds" on the starting boosters. That's a lot more fun that "50%."

Isn't that just very unrealistic though? My main problem with a MTBF system is that the mean time would have to be stupidly low for gameplay reasons and it would just plain feel silly to me. 12 seconds MTBF on a rocket sounds silly. Or is it just me?

Making engines more likely to fail during starts, etc, is a good idea, but it needs to be tied to the overall reliability. I'd recommend you just have a multiplier to the base hazard rate that applies for two seconds or something, maybe as a oneshot floatcurve. So if you engine is 0.01/hour likely to fail at any time, it's up to 10x more likely to fail (0.1/hour) during the two seconds after an engine start. And maybe the reliability is increased 100x when it's off. Again all of that is hard to communicate to the player. (Maybe you need separate active/inactive reliability scores.)

This is similar to my thinking. For game play reasons the player really has to have *some* indication of the overall reliability of a part, even if it can fluctuate. One thing I am toying with now, and have in my GUI mockups, is a "Resting Reliability" and a "Momentary Reliability".

For inactive vessels, since you'll be using Bayes' theorem you can just sum up the probability that a failure occurred while you were away. And boo on your Kerbals for not telling you their life support was broken, I guess?

True, I could do something like that, but doesn't that seem like it could be pretty harsh? Seems unfair to hit you with a failure that you couldn't do anything about, or even know about.

Edited January 6, 2015 by Agathorn

razark · January 6, 2015

As for earlier missions ending in failures more often, how about having more than one reliability value? Like you got the part value you already use, that goes up relatively fast, then one for the part class (like all Liquid fuel+Oxidizer engines) that takes longer to go up and maybe even a third one for your entire space program that takes really long to change. The chance for a failure would then take them all into account. That way the reliability goes up the longer you play while suffering a penalty to it when trying out untested technologies (new part classes).

I like this idea. It would give the frequent early failures, but drop off as your team gets better a running a space program.

Possibly an increase in reliability (or an increase in the rate that reliability increases) as the R&D facility is upgraded?

Agathorn · January 6, 2015

Ultimately I don't think you're going to be able to avoid the literature on reliability analysis if you want something that feels "right." You at least need to use Bayes' Theorem to set it up so the update rate doesn't change the number of failures per hour.
The wiki page is a decent start: http://en.wikipedia.org/wiki/Failure_rate
This random Powerpoint is okay: http://www.wilsonconsultingservices.net/MTBF_M2.pdf
The simplest way that would give good results would be to report the reliability of each part as MTBF, with a constant hazard rate derived from that (2000 hours MTBF = 0.0005 failures/hour = 1.39e-7 failures/sec). That's an exponential failure density, which is fine. If you want to be fancy you can let parts define one each of exponential, Weibull, lognormal, etc. failure densities and use the greatest hazard rate (which would let you define a bathtub curve for parts), but I don't really know how you'd report that data to the player. Presumably you'd have to average it into the MTBF score somehow.
Plus, you could have it literally say "mean time between failures = 12 seconds" on the starting boosters. That's a lot more fun that "50%."
Making engines more likely to fail during starts, etc, is a good idea, but it needs to be tied to the overall reliability. I'd recommend you just have a multiplier to the base hazard rate that applies for two seconds or something, maybe as a oneshot floatcurve. So if you engine is 0.01/hour likely to fail at any time, it's up to 10x more likely to fail (0.1/hour) during the two seconds after an engine start. And maybe the reliability is increased 100x when it's off. Again all of that is hard to communicate to the player. (Maybe you need separate active/inactive reliability scores.)
For inactive vessels, since you'll be using Bayes' theorem you can just sum up the probability that a failure occurred while you were away. And boo on your Kerbals for not telling you their life support was broken, I guess?

Would you be willing to help with the math on this? It really isn't my strong suit, and i've been reading up and trying to figure it out but its tough.

NonWonderDog · January 6, 2015

Isn't that just very unrealistic though? My main problem with a MTBF system is that the mean time would have to be stupidly low for gameplay reasons and it would just plain feel silly to me. 12 seconds MTBF on a rocket sounds silly. Or is it just me?

It depends on your opinion of Kerbal engineering.

Here are MTBF numbers for the SSME in 1993:

http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19930012456_1993012456.pdf

The MTBF for in-flight shutdown is between 112 and 8.3 (!!) flights, depending on power level. A flight is eight and a half minutes, which works out to MTBF times of 1 to 16 hours. For NASA's premiere lifter engine!

It has to vary per part, though. A jet engine has about 100,000 hours MTBF.

I'll see if I can put something together on the math later.

kujuman · January 6, 2015

I'd just like to add that I'm still planning on writing more failure modules once things settle down again. I have a module working where the lf/o ratio gets knocked off by some random amount.

If you'd like, I could put together a guide on making a 3rd party failure module (very simple really) to give you more dev time.

Agathorn · January 7, 2015

I'd just like to add that I'm still planning on writing more failure modules once things settle down again. I have a module working where the lf/o ratio gets knocked off by some random amount.
If you'd like, I could put together a guide on making a 3rd party failure module (very simple really) to give you more dev time.

Very cool and I am glad you found it easy to do. That was certainly my goal! Right now things are definitely a bit rocky, and there will be some major changes to the API coming when I overhaul things for the Push vs Pull setup and the new Reliability and Failure systems (which I plan to type up a proposal for soon).

After those changes though if you would like to help on some documentation that would be exceptional!

NonWonderDog · January 7, 2015

The thing that will make this mod hard, and the reason why I said you'd need an understanding of the literature, is the randomness. If you try to balance it by play, you will simply never get a good result. You have to have a a result in mind and the ability to implement it, and then have the willingness to ignore people (or yourself!) who complain that their engines fail every launch or never fail at all. You have to design the reliability of the entire population of RT-10 boosters amongst all players, and trust that individual RT-10 failures follow the pattern.

To that end, you have to understand real failure distributions, and the math needed to model them. It's actually not that bad as long as you keep things simple.

If we take the simplest case, say we have 1000 widgets, and 10% of them fail each day. 100 widgets will fail the first day. There are only 900 left, so 90 widgets will fail the second day. 81 will fail on the third, 73 on the fourth, etc. As you see, the simplest model has an exponential failure distribution. This constant risk of failure -- a constant hazard rate -- is the basic assumption behind the Mean Time Between Failures metric.

The hazard rate h(t) is actually somewhat of an abstract concept in order to account for non-constant rates. It's equal to the expected number of failures in a population divided by all the accumulated time of all the items in the population, over an infinitesimal time slice, given that every item in the population is t hours old. Stated in a way that actually makes sense, the probability of an item experiencing its first failure over the next dt hours, starting at time t, is equal to h(t)*dt as dt approaches zero.

But for an exponential distribution, it's easy. The math works like this:

Hazard rate is constant, so let's call it lambda. MTBF is equal to 1/lambda for an exponential distribution. Hazard rate can be estimated directly from a sample population (the measured failure rate): if there are 10 failures in a sample of 100 devices scheduled to operate for 100 hours each, the hazard rate is 10/(100*100) = 0.001 -- A MTBF of 1000 hours.

The probability of a failure at time t (divided by duration) is f(t) = lambda*e^-(lambda*t). This is the failure density function. (10% of widgets fail each day (lambda = 0.1), 100 fail on the first day (f(0) = 0.1), 90 on the second day (f(1) = 0.09)), 81 on the third day (f(2) = 0.081), etc.)

The probability that that item will have failed after t hours is the integral of the density function, F(t) = 1 - e^-(lambda*t). This is the failure distribution function. F(infinity) is equal to one.

The probability that an individual item will survive is one minus that, or R(t) = e^-(lambda*t). This is the reliability after t hours.

With our 1000 hour MTBF, the probability that any individual item will survive for 100 hours is:

R(100) = e^(-0.001*100) = 90.5%

The probability that it survives for 1000 hours is:

R(1000) = e^(-0.001*1000) = 36.8% (Yes, 63.2% of our samples have failed after the mean time between failures. Math is weird.)

And honestly, if you don't go beyond an exponential failure distribution that's all you need. The probability of surviving for the next 1000 hours is 36.8% no matter how many thousands of hours it has survived so far, so you can just use the reliability function and be done with it. Only thing to keep in mind is possible numerical precision issues past 100,000 hours MTBF, and any fiddling you might have to do to get all the bits out of Unity's RNG.

Things get a lot more difficult with a variable hazard rate. If I'm feeling particularly brave I'll try to work out the math for a Weibull distribution, since front-loaded failures would make launches a bit more exiting, but I'll ignore wear-out as probably a bad idea for gameplay.

In the most general case, I think you only need the reliability at time t (when you last checked), the reliability at time t+dt (now), and Bayes' theorem to determine if something should fail. You should be able to foist the reliability calculation itself off on another module. (At least, that's how I remember probability working. I'll have to run through that.) I get incredibly confused when I think about the *second* failure using that method, though...

Edited January 7, 2015 by NonWonderDog

[0.90] TestFlight [0.4.6.1][04FEB15] -Configurable, extensible, parts research & reliability system

What are would you like to see focused on next 73 members have voted

1. What are would you like to see focused on next

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

What are would you like to see focused on next
73 members have voted