Jump to content

[1.12.4] ScrapYard (SYD) The Common Part Inventory - v2.2.99.0-prerelease `<Project Zelda II>` edition [08 Jan 2023]


zer0Kerbal

Recommended Posts

On 5/28/2023 at 7:15 PM, Gotmachine said:

As I mentioned, persistentId does indeed change. It needs to since it is well, persisted into a craft file, but at the same time must be guaranteed to be unique for every instantiated part. So when KSP loads a craft, the id that are persisted can already be in use (and often will be, since a craft tend to be loaded multiple times, for example by launching it as vessel, then loading it again in the editor, or simply launching multiple vessels out of the same craft file).

However I caught the thing happening when there's no craft alive in the game. There's a race condition happening somewhere, being triggered by something I'm still trying to determine.

Additionally, and this is yet another problem, from KSP 1.4.1 to some release I didn't cared to check yet, the persistedId was being already initialised when the PartModule.OnAwake()** is called, but from some point (I checked only KSP 1.12.5 at this time) that attribute started to be still 0 on the OnAwake, and anyone relying on the previous behaviour would be caught with its pants down.

** NOTE: Not necessarily. Since I'm loading and reloading the damned thing consecutively, there's a chance that the part.persistentId that I'm interpreting as being already initialised is, in true, an unitiiliased data from a previous PartModule incarnation that by <insert your non forum compliant favourite expletive here> "luck" ended up being relocated to a new PartModule by some really, really, really less than ideal memory allocation algorithm. But this is still Unity, with a butchered GC from a old Mono's release. I can't say this is impossible.

** NOTE AGAIN: If it would be a left over from a previous incarnation, the KSP.log's first logged occurrence from the part's persistentId would be zero or a random value. I just confirmed that the first logged occurrence of a persistentId had already the correct value, so this is really the right value read from the Craft File on memory and, so, this value is already known before calling the OnAwake() and could be had checked and fixed by KSP at that time. Could, but as it appears, it was not.

 

 

On 5/28/2023 at 7:15 PM, Gotmachine said:

Said otherwise, there is no bug and no "misbehaviour". persistentId changing is by design, and always has been. It's even documented on the official API : https://www.kerbalspaceprogram.com/ksp/api/class_game_events.html#a87bb044eae00585d041c7f2babf47c72

With the due respect, unless further information suggesting otherwise is provided, this sounds like an ugly kludge to me.

See,  at least on the initial implementation (probably on 1.4.0, but I only checked 1.4.1 to this moment), the persistentId was already initialised correctly on PartModule.OnAwake(). Any clash on the a unique identifier should be already be known before OnWake(), after all the code was able to had fetched it from the Craft File at this point and so could had verified and fixed any collision with an existent part before calling the PartModule's OnAwake() - by doing this way, we would not had to cope with a mutating persistentId at all.

I fail to see the need for this callback, but this ship has already sailed - we need to cope with it as it is right now.

It will be interesting to see how this GameEvent will behave on my Test Bed where I can trigger the race condition I diagnosed above - once I determine exactly how to deterministically reproduce it, because at this time besides having a hunch about how to trigger it, it's still a try and try again until "success" process.

 

 

On 5/28/2023 at 8:49 PM, magico13 said:

I'm just stopping by to mention how weird it is for something I did over 5 years ago to have butterfly effected like this. I don't remember exactly the reasoning for swapping to persistent id off of the guid that I was using, except perhaps a somewhat ironic desire to use more stock functionality and less custom code. At the time it didn't seem to cause any issues but I'll admit I wasn't always the most careful developer.

At that time it didn't caused any (observable) issues (as long KSP would be behaving). It worked at that time, it only happens that it didn't aged very well as KSP was being developed with further features (and bugs) being introduced.

You did found something on KSP, see this post (and also this one) on my issue tracker. [Kraken knows how many times I fell on similar traps over the years  - you are not alone on this]

— — EDIT — — 

Additionally, @magico13, this post essentially exempt you from any wrongdoing. That piece of code is a bit cheesy :P for today's standards but it was not causing any harm.

The need to initialise the internal structures twice, on on OnStart and another on OnInitiaze, may also be seen as redundant, but my experience on KSP taught me that sometimes redundant code is a last measure attempt to work around an undiagnosed situation. Since I didn't reproduced such situation on a clean up KSP instalment, I'm guessing that you were probably fighting a unhappy interaction with a 3rd party code - a not so uncommon occurrence even nowadays.

Edited by Lisias
Brute force post merging. And some notes. And a post edit
Link to comment
Share on other sites

53 minutes ago, Lisias said:

we would not had to cope with a mutating persistentId at all.

You want to be able to cope with a persistentId mutation, not being able to would be a design hole in the KSP API.
Lets say you have a "controller" partmodule of some some sort, that has some functionality where the user should select another part on the same vessel/ship.
This mean you needs to persist a reference to that other part, which is exactly the use case that persistentId was designed for.
So you grab the persistentId of the target part and persist it on the controller module.
Now, when the ship is re-loaded, you want to find again the part that was targeted, finding it by it's persistentId (FlightGlobals has methods for that).

This is where you want to get notified if the id has changed, which is what the onPartPersistentIdChanged event is for.
And you want to get notified of it after the module has been loaded (ie, Load()/OnLoad() called), because otherwise you can't check if the id that has changed is matching an id you had saved.

Edited by Gotmachine
Link to comment
Share on other sites

On 5/28/2023 at 9:32 PM, Gotmachine said:

You want to be able to cope with a persistentId mutation, not being able to would be a design hole in the KSP API.

This is fait accompli, so I have no other choice but to accept it and keep walking. It doesn't really matter if it makes sense to me or not - so I'll silently complain about the subject to myself from now on.

 

On 5/28/2023 at 9:32 PM, Gotmachine said:

This is where you want to get notified if the id has changed, which is what the onPartPersistentIdChanged event is for.
And you want to get notified of it after the module has been loaded (ie, Load()/OnLoad() called), because otherwise you can't check if the id that has changed is matching an id you had saved.

So this is how things happens on the log now, on a ideal Scenario where:

  1. A single craft called "Untitled Space Craft" exists on the VAB folder
    1. Let's focus only on the root part, mk1pod
  2. An instance was previously launched and left on the Launch Pad
  3. Editor is launched, and the previously loaded craft file is loaded automatically
    1. And it's the same one that is on the LaunchPad now
  4. There's a PartModule called KSP-Recall.Test.PersistentId on each Part those only reason of existence is to log which phase of its life cycle it is at the moment. This is how I know what's being called below.

This is the sequence of events I'm getting:

  1. Part mk1pod(Clone) is created, an InstanceId is given, the OnAwake is called. The persistentId is loaded from the Craft File on KSP 1.4.1, and it's zero on KSP 1.12.5
  2. Part mk1pod (it was renamed in the process), with the same instanceId from the last step, have its OnLoad called. The persistentId was changed (as I have the same craft already on the LaunchPad)
  3. Part mk1pod, with the same InstanceID and same persistentId from the last step, have its OnInitialize() called.
  4. "Untitled Space Craft loaded!" is logged
  5. Part mk1pod, with the same InstanceID and same persistentId from the last step, have its OnSave() called.
  6. Part mk1pod, with the same InstanceID and same persistentId from the last step, have its OnStart() called.
  7. "Autogen thumbnail" is logged.

Oukey, things make sense and it's exactly what you are telling it is. The reason I'm wasting Forum's bandwidth with this information is because there's only 4 situations in which the persistentId from the part could be being overwritten on ScrapYard:

I'm not arguing with you about unnecessarily writing back values into the Part.persistentId being bad. It is, and this will be fixed. Point.

However, and this is the point in which I think ScrapYard is more of a messenger than a perpetrator (at best, an involuntary accomplice) is that from that 4 situations above, two will just write back a value that is already there (the OnStart and OnInitialize), so no harm can be possibly done and a third is not called unless the user clicks on something inside the ScrapYard's Window, and none of us had done it on our tests.

The remaining option is the MakeFresh() method, that so would generate a new persistentId for this current part that should be remembered, and so should be really written on the current Part's persistentId if it's not there yet (or at least it's what I'm understanding of this code). It worths to mention that FlightGlobals.CheckPartpersistentId calls GameEvents.onPartPersistentIdChanged if a new persistentId is issued (and it probably always is, as the persistentId argument given is 0).

This may be a mistake, or perhaps not. But the real problem I'm dealing  is that the [FlowGraph] LogSpam was not happening earlier this month when I spent a whole week bashing my sorry SAS on the matter, except a couple times when I was using a savegame provided by a user that reported the problem. More interesting yet, once I quit to the Main Menu and reloaded it (i.e., had written the SFS file, and then loaded it again), the problem ceased to happen - I didn't even restarted KSP. And after this, I just couldn't reproduce the LogSpam no matter what (using that savegame or new ones).

And the user had KSPCF installed, while I had not.

This last Friday was the first time ever I could easily reproduce the LogSpam, and using a savegame that I just created on my rig that never had KSPCF installed. So, yeah, on this last Friday I could rule out KSPCF from this equation, but not before.

But we still have this weird misbehaviour on the wild, and now I have concrete evidences that there's something that need to happen and nowadays most of the time it's not (or something that shouldn't, but nowadays is constantly happening) that suddenly makes the LogSpam to cease in the very same rig and savegame where the damned thing was happening half an hour before.

All we can do now is to suppress the LogSpam on ScrapYard, but whatever is happening under the bonnet, will still keep happening - it only will not be logged on the KSP.log by something triggered by ScrapYard.

And I want to diagnose it, because it may explain some other pretty weird happenings I'm getting sometimes (as some PartModules giving me the finger after a Restart to Launch).

Edited by Lisias
tyops as usulla…
Link to comment
Share on other sites

BREAKING NEWS.

I just got evidence that the problem is not and never was the persistentId. What we have is a collision on a thingy called cid (or ClassID CraftId) on the part.

First, I determined that the cid thingy is not changing with the PersistenId:

[LOG 18:41:57.873] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod(Clone):FFFC2988 has persistentId=91659911 at OnAwake. The cid is 2480147.
[LOG 18:41:57.875] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFC2988 has persistentId=91659911 at OnLoad. The cid is 2480147.
[LOG 18:41:57.884] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFC2988 has persistentId=91659911 at OnInitialize. The cid is 2480147.
[LOG 18:41:57.885] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFC2988 has persistentId=91659911 at OnSave. The cid is 2480147.
[LOG 18:41:57.886] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFC2988 has persistentId=91659911 at OnStart. The cid is 2480147.
[LOG 18:42:45.857] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod(Clone):FFFC10FC has persistentId=2366749665 at OnAwake. The cid is 2480147.
[LOG 18:42:45.861] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFC10FC has persistentId=1795537535 at OnLoad. The cid is 2480147.
[LOG 18:42:45.871] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFC10FC has persistentId=1795537535 at OnInitialize. The cid is 2480147.
[LOG 18:42:45.871] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFC10FC has persistentId=1795537535 at OnSave. The cid is 2480147.
[LOG 18:42:45.874] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFC10FC has persistentId=1795537535 at OnStart. The cid is 2480147.
[LOG 18:42:45.876] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFC2988 has persistentId=91659911 at OnDestroy. The cid is 2480147.

And later I realised that the FlowGraph's LogSpam is complaining bout the cid, not the persistentId!!

LOG 01:54:36.612] [UIMasterController]: ShowUI
    [ERR 01:54:36.714] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294401494

    [ERR 01:54:36.714] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294401494

--- Craft File
 PART
    {
            part = mk1pod.v2_4294401494      <--- here! The cid is the number after the underscore!
            partName = Part
            persistentId = 1768470123
            
--- SFS file
PART
    {
    	name = mk1pod.v2
    	cid = 4294401494                     <--- here!
    	uid = 3987364613
    	mid = 1624790515
    	persistentId = 1935682175
    	launchID = 1

I've been chasing ghosts until the moment, there's nothing wrong on the persistentId handling by ScrapYard, as it appears.

Unfortunately, the ClassID is not properly documented on the KSP's API. Details on KSP-Recall Issue 67.

And. again, there's still that test session I have documented on the Issue 67 where the LogSpam suddenly stopped and didn't came back until rebooting the game, not to mention the whole week late April, early May, where I was trying to reproduce the damned thing and just could not!

Edited by Lisias
Corrections. I misunderstood ClassID with CraftID.
Link to comment
Share on other sites

14 hours ago, Lisias said:

And later I realised that the FlowGraph's LogSpam is complaining bout the cid, not the persistentId!!

The error message is indeed logging cid/craftId (not ClassID, that's something entirely different), but that is just to give a way to find back the offending part from the log entry.
The messages comes from the SCC graph, which actually doesn't care about any id, it holds direct part references and detect the issue by comparing those, way after the problem actually occurred.
The underlying cause of duplicate entries in the SCC graph lies in the DeltaVStageInfo class, which uses persistentId to build the part lists.

Edited by Gotmachine
Link to comment
Share on other sites

15 hours ago, Gotmachine said:

The error message is indeed logging cid/craftId (not ClassID, that's something entirely different), but that is just to give a way to find back the offending part from the log entry.

Yep, Point taken. ClassID & CraftId (cid) are different values on my savegame, and after fixing the code, things match now.

 

15 hours ago, Gotmachine said:

The messages comes from the SCC graph, which actually doesn't care about any id, it holds direct part references and detect the issue by comparing those, way after the problem actually occurred.

SCC would be Strongly Connected Components? 

Anyway, I fixed the test bed and come with this after loading a craft file twice in a row (with another already lingering on the LaunchPad).

[LOG 15:46:51.738] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod(Clone):FFFCCE8E has persistentId=438584920 at OnAwake. The ClassID is 2480147 and the CraftId (cid) is 4294758030.
[LOG 15:46:51.742] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFCCE8E has persistentId=91659911 at OnLoad. The ClassID is 2480147 and the CraftId (cid) is 4294757984.
[LOG 15:46:51.762] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFCCE8E has persistentId=91659911 at OnInitialize. The ClassID is 2480147 and the CraftId (cid) is 4294757984.
[LOG 15:46:51.768] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFCCE8E has persistentId=91659911 at OnSave. The ClassID is 2480147 and the CraftId (cid) is 4294757984.
[LOG 15:46:51.791] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFCCE8E has persistentId=91659911 at OnStart. The ClassID is 2480147 and the CraftId (cid) is 4294757984.
[LOG 15:53:56.513] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod(Clone):FFFCCA78 has persistentId=438584920 at OnAwake. The ClassID is 2480147 and the CraftId (cid) is 4294756984.
[LOG 15:53:56.516] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFCCA78 has persistentId=631267469 at OnLoad. The ClassID is 2480147 and the CraftId (cid) is 4294757984.
[LOG 15:53:56.524] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFCCA78 has persistentId=631267469 at OnInitialize. The ClassID is 2480147 and the CraftId (cid) is 4294757984.
[LOG 15:53:56.525] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFCCA78 has persistentId=631267469 at OnSave. The ClassID is 2480147 and the CraftId (cid) is 4294757984.
[LOG 15:53:56.527] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFCCA78 has persistentId=631267469 at OnStart. The ClassID is 2480147 and the CraftId (cid) is 4294757984.
[LOG 15:53:56.529] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFCCE8E has persistentId=91659911 at OnDestroy. The ClassID is 2480147 and the CraftId (cid) is 4294757984.
[LOG 15:53:59.005] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFCCA78 has persistentId=631267469 at OnSave. The ClassID is 2480147 and the CraftId (cid) is 4294757984.
[LOG 15:53:59.388] [KSP-Recall.Test.PersistentId] INFO: Part mk1pod:FFFCCA78 has persistentId=631267469 at OnDestroy. The ClassID is 2480147 and the CraftId (cid) is 4294757984.

We have two incarnations of the mk1pod (from  that omnipresent Untitled Space Craft file), the one with InstanceID FFFCCE8E and the one with FFFCCE8E . They never shared a persistentId, but the ClassId and the CraftID are the same for both of them on this test run.

The one interesting thing I noted is that for a brief moment, from [LOG 15:53:56.513] to [LOG 15:53:56.529], both parts are alive in memory - with the FFFCCE8E being destroyed after  FFFCCA78 being started.

If these parts are used to populate and depopulate some internal data structures, we have a situation in which the new part's OnAwake, OnLoad, OnInitialize and OnStart have to live with a dying previous incarnation still in memory and, so, yet on that data structures. If by any reason the new Part ends up sharing the same persistentId from the previous incarnation, we have a Window of Opportunity for the LogSpam be triggered. Worst, probably the new part will not be registered on these Data Structures and left orphan - and this opens a whole new can of worms inside the game.

So, yeah, I think we may have identified where the problem may be happening - as long someone screws up the persistentId.

Problem - until now, none of my logs even remotely suggests that ScrapYard is changing the persistenId of the part until the moment the LogSpam happens! I had instrumented the suspect code to yell when it would change the current this.part.persistentId with a new value, and that code remained silent for the whole ordeal - and, again, I recently had reached some mysterious event (and i don't have the slightest idea what it would be yet) in which the LogSpam just doesn't happens, sometimes on the same test session where the damned thing was happening half a hour [5 minutes!!!] ago. Not to mention early May where the problem just didn't happened no matter what I did!!!

My (current) conclusion is that, until new information is gathered somehow, ScrapYard is not directly involved on this persistentId  mess (at least, until the first batch of LogSpam happens). Assuming, of course, that the problem is really related to the persistentId and not something else.

The LogSpam mentioning the cid is, well, unfortunate because this test session just proved that a rogue Part can share the same cid with a genuine one and, so, we need to figure out by inference what would be the part screwing up things on the SCC graph. Logging the instanceId would be better IMHO.

Now, with a way better understanding of the current mechanics, my next task will be redo the same tests but with the instrumented ScrapYard installed too, and then redo the same analysis again and see what changes - while hopping to the LogSpam to do not vanish again, of course.

 

15 hours ago, Gotmachine said:

The underlying cause of duplicate entries in the SCC graph lies in the DeltaVStageInfo class, which uses persistentId to build the part lists.

So we have a definitive source for the trigger of this problem. Well, this helps to reduce possible targets - the persistentId needs to be involved on the mess somehow.

 

— — POST EDIT — — 

It happened again. The LogSpam happened ONCE, and then just vanished from the log.

Forum rules prevent me to further express my thoughts at this moment.

I'm documenting this test session the most Forum Friendly I can at this moment at https://github.com/net-lisias-ksp/KSP-Recall/issues/67

Edited by Lisias
POST EDIT
Link to comment
Share on other sites

Just to confirm, the Unity+KSP mod environment doesn't provide functionality to break on object variable write? I know that if this were native code, I'd be reaching for that right now, and I believe that .NET under Windows provides such support via WinDbg, but Unity is running its own VM so WinDbg can't access the C# level of things.

I still have my isolated install which is getting a reliable repro, but due to neither having a debugger capable of attaching at the C# level nor being familiar with KSP's internal structure I unfortunately can't help chip away at finding the underlying culprit, beyond saying it's something between KSP and SYD.

 

EDIT: Or wait, can't Harmony decorate KSP's methods that write the ID? Barring a debugger, I'd probably try using that to log call stacks at write, and see if anything curious shows up.

Edited by TruePikachu
Link to comment
Share on other sites

6 hours ago, TruePikachu said:

Just to confirm, the Unity+KSP mod environment doesn't provide functionality to break on object variable write? I know that if this were native code, I'd be reaching for that right now, and I believe that .NET under Windows provides such support via WinDbg, but Unity is running its own VM so WinDbg can't access the C# level of things.

Yes, it's possible - but it's pretty tricky to do so on a multithread environment. You set a breakpoint on a thread, then you break the chain of event in which the problem happens.

I'm seriously thinking that the "vanishing LogSpam effect" is happening because I'm injecting a lot of debugging messages on the KSP.log, breaking the "timing" needed to happen to the bug be triggered.

 

6 hours ago, TruePikachu said:

I still have my isolated install which is getting a reliable repro, but due to neither having a debugger capable of attaching at the C# level nor being familiar with KSP's internal structure I unfortunately can't help chip away at finding the underlying culprit, beyond saying it's something between KSP and SYD.

Reading the logs with the extra messages, and comparing with previous logs with less messages, my current best guess is a kind of race condition somewhere, perhaps between SYD and KSP, perhaps inside KSP and so SYD only made it more prone to happen.

Interesting enough, the LogSpam ceased to happen exactly on KSP 1.12.5 where the persistentId is still 0 at OnAwake and later I flagged SYD writing a different value on persistentId (finally!!), something that it's not happening on KSP 1.4.1 where the persistentId was already initialised with the correct value at OnAwake but the LogSpam is still happening.[edit: I can't affirm that anymore. I may had screwed this specific test due biasing]

Since SYD does the write back stunt on OnInitialise and also on OnStart, when by that time the persistentId usually was should be correctly initialised, in order to this to happen the other thread (or coroutine - it can be also a coroutine) appears to be getting delayed somehow (perhaps by the huge amount of log messages I'm shoving into KSP.log?) to a point in which the value is not yet initialised on the SYD's OnInitialise.

Yeah… Apparently that code that writes back the persistentId is that mysterious event that should happen (or a sign that something else happened or ceased to do) in order to suppress the LogSpam on my 1.12.5 rig (and not vice versa).

The current difference between my KSP 1.4.1 and 1.12.5 rigs are the huge amount of logs that 1.12.5 is configured to emit (as a log of KSP-Recall modules are active on 1.12.x, but only one or two on 1.4.1), and I'm betting this was one of the reasons I wasn't being able to reproduce the problem earlier this month, an obscene amount of logging that I reactivated in the last test session (and probably was activated early month) - I really should had preserved that logs, but since the problem was not happening on them, it didn't occurred to me.

 

6 hours ago, TruePikachu said:

EDIT: Or wait, can't Harmony decorate KSP's methods that write the ID? Barring a debugger, I'd probably try using that to log call stacks at write, and see if anything curious shows up.

Code injection, on the Rational Purify way! :)

Yes, it helps, and it helps a lot - as long you are not handling race conditions or unsynchronised code, because the decorators will inject also new latencies and this will change the behaviour of the test subject.

But it still worths a shot, because you don't need to inject decorators everywhere, only on a very few key points and so there's a chance that it will not screw up the test. You will not be able to prove the problem is not there (due the latencies injected by the new code), but with luck you may prove the problem is there if the latencies as harmless enough and the problem is triggered.

It's not too much different from what I'm doing by brute force on SYD (see my last post on the issue 67), but since we can't do the same on KSP itself, it's a nice idea. I would like to instrument the FlowGraph logspam to give more details about the part that it's already there, and the part that is being rejected as duplicated. This information will help to pinpoint where in the part's life cycle this is happening, shrinking the probable suspects to a more manageable number.

Edited by Lisias
Better phrasing
Link to comment
Share on other sites

11 hours ago, TruePikachu said:

Just to confirm, the Unity+KSP mod environment doesn't provide functionality to break on object variable write?

It does. You don't need to resort to native debuggers, it's possible to debug plugin managed code from within Visual Studio. Debugging the managed code from KSP itself is a bit more involving, but can be done from within VS with a bit of hacking around, or more easily with DNSpy, which has Unity debugging capabilities.
I can give you a quick rundown in PM if you want.

6 hours ago, Lisias said:

it's pretty tricky to do so on a multithread environment.

There is no multithreading going on in the KSP codebase, at the single exception of the integrator used for the maneuver planner added in 1.12.
While Unity native side is multithreaded, the managed API isn't thread safe and Unity actively prevent any call to it from any thread outside of the main thread.
The Unity game loop is strictly single-threaded, so there is absolutely zero chance for such things like race conditions happening unless a plugin explicitly implements some threaded code.

Link to comment
Share on other sites

KSP supports coroutines, though, which while not strictly multithreading still require care to be taken in some cases. If memory serves, the vanilla ∆V calculation takes a bit of time to process, which implies it's being scheduled as a coroutine and awaited upon by the UI code to update the readout (or something similar).

My current suspicion is that a data structure that the vanilla ∆V calculation transverses is being modified after the calculation coroutine starts execution but before it finishes, resulting in it reaching some nodes multiple times and then complaining about it. Normally, this can happen when editing the ship, but that seems to be smart enough to cancel the coroutine.

It shouldn't be a stretch to say that differences in execution time will affect the actual ordering of coroutine execution, especially considering that coroutines that have yielded might even have the ability to work multiple times per frame, which would imply that a faster CPU could be less likely to hit the issue and a more complex craft (or slower coroutine because of decorations) could widen the window of opportunity.

Link to comment
Share on other sites

A coroutine is just an extra execution point in the game loop, and the execution is just as deterministic as MB messages like Update or FixedUpdate.
There are a few corner cases that can indeed result in a non-predictable order of execution, like the fixed loop frequency versus the main loop frequency, or a coroutine set to yield based on time, but I'm quite confident that no such thing is involved in the interactions between ScrapYard and the delta-v stuff.
What happen is simply that on vessel/ship load, the stock deltav code create a cached data structure referencing the vessel/ship parts. That cache is then used latter when the delta-v calcs are run, but by that time, in some specific cases, Scrapyard has changed the parts persistentId. This result in the parts not being found in the cache, and consequently added again. The log error happens way further the execution stack thanks to some safeguard detecting those duplicates.

I haven't done a ton of tests, but the errors have been perfectly reproducible and deterministic when I checked.
It won't always happen, simply because ScrapYard doesn't always change the parts persistentId. For example, it will happen if you launch from the VAB, but not if you "revert to launch", because the "vessel rollout" event ScrapYard is listening to isn't fired in that case.

Link to comment
Share on other sites

18 minutes ago, Gotmachine said:

What happen is simply that on vessel/ship load, the stock deltav code create a cached data structure referencing the vessel/ship parts.

My reproduction steps don't even need to touch flight mode, FWIW; I can repro from strictly inside the editor, though the current repro steps use SYD's part list. I don't know if this changes your analysis, but I felt I should mention it just in case you're running under the assumption it's something that only happens when the vessel is loaded into flight mode.

18 minutes ago, Gotmachine said:

I haven't done a ton of tests, but the errors have been perfectly reproducible and deterministic when I checked.

I've had this same personal experience, but clearly something not strictly deterministic is going on if some people can repro it but others can't using the same steps and basically the same code in play (we'll assume it's not a bug introduced by an external library to KSP's distribution)

18 minutes ago, Gotmachine said:

That cache is then used latter when the delta-v calcs are run, but by that time, in some specific cases, Scrapyard has changed the parts persistentId.

Assuming this is the problem, shouldn't SYD just wipe the cache when it changes the ID?

Edited by TruePikachu
Link to comment
Share on other sites

32 minutes ago, TruePikachu said:

Assuming this is the problem, shouldn't SYD just wipe the cache when it changes the ID?

It shouldn't change the ID. This is a ScrapYard oversight, it is hijacking a stock variable for its own needs, and I am very doubtful there is any valid reason to do that.
As I mentioned, that ID is relied on by various pieces in the stock codebase, not only by the delta-v calcs, and by mods as well.
And while there is a mechanism that can cause it to change in the stock implementation, this is a narrow/early event which is largely ignored both in stock code and mods using it (because it doesn't affect them), so even if ScrapYard was actually calling the id changed event, this would still cause the same issues.

Edited by Gotmachine
Link to comment
Share on other sites

8 hours ago, Gotmachine said:

A coroutine is just an extra execution point in the game loop, and the execution is just as deterministic as MB messages like Update or FixedUpdate..

A coroutine, from the running code point of view, is a kind of concurrency. If you have a bunch of coroutines dealing with the same data structures, you will see the need of some kind of synchronisation the same.

And you are still on a real multitask environment, your running thread will be scheduled out regularly no matter what you do, and since you just don't have control about the scheduling, you can't guarantee that whatever fickle equilibrium you manage to reach now will work as times goes by on new processors and operating systems with different scheduling algorithms.

 

7 hours ago, Gotmachine said:

It shouldn't change the ID. This is a ScrapYard oversight, it is hijacking a stock variable for its own needs, and I am very doubtful there is any valid reason to do that.

No. ScrapYard is trying hard to cope with how KSP works, it's trying to blend.

It creates a new persistentId only when creating new parts for itself, a code that we don't even touched yet, as the code involved on the mess is the one on OnInitialize and OnStart. I found very interesting the need to do this twice in a raw, it looks like the Author was trying to fight a race condition, by the way: sometimes the data wasn't there yet on OnInitialize, so he tried OnStart and it works for some cases, but not for others, then he managed to pull a kludge that worked at that time, and now obviously is not working consistently anymore - I want to stress that I reached a situation in which SYD writing back the known persistentId coerced KSP into stopping the LogSpam, what may suggest that, hell, SYD is fixing something.

Again, it's interesting that on my last test session, the LogSpam ceased once I caught ScrapYard writing back the persistentId. The LogSpam was being caused (on that test session) when the write back was not happening on the same test run. So, at least on my last test session, the weird situation that suppresses the LogSpam is also the one that triggers SYD into writing back the data.

Further investigations are needed, however. I just learnt how to stop the LogSpam this week. Now I need to trigger it again on the same test session to see if the M.O. is consistent.

 

8 hours ago, TruePikachu said:

My reproduction steps don't even need to touch flight mode,

I managed to stop the LogSpam by having a identical craft on the Launch Pad, and then going back to Editor and reloading the craft (even if it is was loaded automatically).

On the logging I did, when SYD dos not writeback the persistentId, I have a LogSpam. When SYD does the writeback, the LogSpam doesn't happens!

This evidence suggests that the write back have a reason to be. Not saying that this is the best workaround, but it is a working work around under determined circumstances. Further investigation is necessary.

 

8 hours ago, Gotmachine said:

I haven't done a ton of tests, but the errors have been perfectly reproducible and deterministic when I checked.

I did. Way more thoughtfully than yours, by the way.

I managed to suppress the LogSpam, and the suppression is happening exactly when SYD writes back the persistentId.

I suggest you analyse my tests and take my results into consideration.

Edited by Lisias
Hit Save too soon.
Link to comment
Share on other sites

7 hours ago, Lisias said:

If you have a bunch of coroutines dealing with the same data structures, you will see the need of some kind of synchronisation the same.

No. You don't need any kind of synchronization on data access. There is no concurrent access to anything.
Everything runs from a single thread and all calls are sequential, no matter if you use a coroutine or a MonoBehaviour message. If you have a coroutine with "yield null", it will yield sequentially on the Update loop, like any MonoBehaviour.Update() method. If you have a coroutine with "yield new WaitForFixedUpdate()", it will yield sequentially on the FixedUpdate loop, like any MonoBehaviour.FixedUpdate() method.

7 hours ago, Lisias said:

And you are still on a real multitask environment, your running thread will be scheduled out regularly no matter what you do, and since you just don't have control about the scheduling, you can't guarantee that whatever fickle equilibrium you manage to reach now will work as times goes by on new processors and operating systems with different scheduling algorithms.

The only thing that is affected by performance characteristics is how many FixedUpdate loops are happening within an Update loop. This is indeed something that one might have to consider if accessing the same data both from the FixedUpdate and Update loops, and has nothing to do with thread scheduling, but simply with the fact that the FixedUpdate loop is called a varying amount of times (including not at all) from within the Update loop, in order to be roughly called 50 times per second, whereas the Update loop call frequency is directly tied to the framerate.
While this is something to keep in mind, anything wrongly relying on a 1:1 ratio between FixedUpdate and Update calls is pretty much guaranteed to break, especially in KSP where the load on the FixedUpdate and Update loop has a huge variability.
There are some other shenanigans with that in KSP, for example PartModule.OnStartFinished() is called on the third Update loop after the part instantiation, and consequently an undetermined amount of FixedUpdate() calls will have happened. There are a few other cases like that, and they do indeed impose some extra care in implementations, but this is just order of execution considerations which are almost entirely limited to initialization sequences, nothing in common with what one have to deal with in a concurrent/multitask environment.

I suggest you take a careful look at the Unity documentation for all that : https://docs.unity3d.com/2019.4/Documentation/Manual/ExecutionOrder.html

7 hours ago, Lisias said:

I suggest you analyse my tests and take my results into consideration.

Yep, it's entirely possible that some consequences of ScrapYard altering the persistentId aren't consistently visible depending on order of execution shenanigans.
But I don't see why this matter at all. The root cause of those issues is ScrapYard overwritting the parts persistentId, it seems we are all finally agreeing on that, so I don't see the point of loosing time analyzing the failures modes instead of just fixing the root cause.
ScrapYard didn't rely at all on persistentId prior to the commit I linked. I can see why it would benefit from reading that id, but I don't see any valid reason to overwrite it.

Link to comment
Share on other sites

39 minutes ago, Gotmachine said:

But I don't see why this matter at all. The root cause of those issues is ScrapYard overwritting the parts persistentId, it seems we are all finally agreeing on that, so I don't see the point of loosing time analyzing the failures modes instead of just fixing the root cause.
ScrapYard didn't rely at all on persistentId prior to the commit I linked. I can see why it would benefit from reading that id, but I don't see any valid reason to overwrite it.

It does matter.

What you are failing to understand is that the LogSpam is happening when the SYD writeback is harmless (i.e, when the value being written is already there, so no change).

The really interesting part is that when SYD writes back a value that ends up overwriting something different, the LogSpam does not happen. Essentially, given the current configuration on my test rig, what you are saying should not be there is, in essence, "fixing" the problem.

Now, it's beyound doubt that SYD is triggering something on KSP, and it can be even SYD's fault somehow - but the code you are pinpointing as a problem, until this moment, is harmless as worst (writing back the same value has literally no practical effects), or effectively preventing the problem at best.

Whatever SYD is triggering inside KSP, is not related to the persistentId being (re)written by it. We need to look for the problem somewhere else.

 

Edited by Lisias
tyop! Surprised?
Link to comment
Share on other sites

2 hours ago, Lisias said:

Whatever SYD is triggering inside KSP, is not related to the persistentId being (re)written by it. We need to look for the problem somewhere else.

Okay. Do a simple thing, comment out that line : https://github.com/zer0Kerbal/ScrapYard/blob/2fdfcc0c0bdc72c8e777d34356ccb574b0b858d0/source/ScrapYard/Modules/ModuleSYPartTracker.cs#L31
Then recompile ScrapYard and use it, good luck trying to get the flowgraph error to happen now.

Link to comment
Share on other sites

6 hours ago, Gotmachine said:

No. You don't need any kind of synchronization on data access. There is no concurrent access to anything.

Just to clarify this a little bit, while you don't need standard synchronization primitives to control objects shared between coroutines (assuming this is a standard single-thread implementation), there's still a requirement for synchronization at a higher level under specific circumstances. While you only have one task interacting with a data structure at a time, and (under the context of user code execution) task execution can only be interrupted at certain points of time, certain actions performed in one context can still break an invariant or other assumption imposed by the current thing the other is doing. Using C++ as an example (mainly because I know many standard library requirements offhand), an `std::set<>::iterator` remains valid only while the original `std::set<>` is unmodified. If a task is iterating through the particular object and yields every, say, 200 iterations, but during a yield another task modifies the object, the iterator is no longer considered valid and access is undefined behavior.

14 hours ago, Lisias said:

And you are still on a real multitask environment, your running thread will be scheduled out regularly no matter what you do

This is irrelevant (except in matters of execution timing) unless you have an external process or something that's modifying your program structures behind your back; all bets are off if that's the case.

21 hours ago, Gotmachine said:

It shouldn't change the ID. This is a ScrapYard oversight, it is hijacking a stock variable for its own needs, and I am very doubtful there is any valid reason to do that.

I agree with this assessment; while SYD's changing of the ID to something already recorded seems well-intentioned (especially considering that mods that affect parts in such a way as to give them a sort of history might be using a non-partmodule mapping keyed by ID to maintain some state), actually doing the change "might" (read: if I wrote KSP it probably would) break a class invariant. It's probably enough that the ID remains strictly constant for a part's singular journey through flight mode, without needing it to still remain the same value once the part has been recovered and built into a new craft.

14 hours ago, Lisias said:

I managed to stop the LogSpam by having a identical craft on the Launch Pad, and then going back to Editor and reloading the craft (even if it is was loaded automatically).

Under my testing procedure, at least, log spam under the editor is triggered by instantiation of a part in SYD's part inventory. You made sure you didn't run out of stored parts, right?

Edited by TruePikachu
Link to comment
Share on other sites

11 hours ago, TruePikachu said:

This is irrelevant (except in matters of execution timing) unless you have an external process or something that's modifying your program structures behind your back; all bets are off if that's the case.

Or some of your coroutines are yielding waiting for some external event now and then, completely screwing the order of events that you assumed while developing the code where such events weren't active by some reason.

Unity makes heavy use of spinlocks while looking for some events, by the way, to the point it's impairing overall performance - see the MONO_THREADS_PER_CPU=1  stunt… What's something I should had checked before, by the way. I will work on it.

Having initialization code in both OnInitialize and on OnStart seriously suggests that the original author didn't managed to accomplish what he intended deterministically by using only one of them, what again suggests that.

Also, I didn't managed to suppress the LogSpam on a test bed with way less logging happening on the KSP.log (as well less KSP-Recall PartModules active). Now I need to redo the tests using only KSP 1.12.5 mimicking what 1.4.1 was running, to rule out some influence from KSP 1.4.1 internals to determine if the environment is being mangled by the logging or by the presence of additional PartModules on the affected parts - or it's just another ghost that I ended chasing.

 

15 hours ago, Gotmachine said:

Okay. Do a simple thing, comment out that line : https://github.com/zer0Kerbal/ScrapYard/blob/2fdfcc0c0bdc72c8e777d34356ccb574b0b858d0/source/ScrapYard/Modules/ModuleSYPartTracker.cs#L31
Then recompile ScrapYard and use it, good luck trying to get the flowgraph error to happen now.

I did better. I'm logging on KSP.log when the value being written is the same and it's different from the current one.

                if (value != this.part.persistentId)
                     Logging.Log(
                         string.Format("Part {0}:{1:X} has persistentId={2} and it's being changed to {3}. The ClassID is {4} and the CraftId (cid) is {5}."
                                         , this.name, this.part.GetInstanceID(), this.part.persistentId, value, this.part.ClassID, this.part.craftID
                                    )
                         , Logging.LogType.WARNING
                     );
                 else
                     Logging.Log(
                         string.Format("Part {0}:{1:X} has persistentId={2} and the value ({3}) is being written back. The ClassID is {4} and the CraftId (cid) is {5}."
                                         , this.name, this.part.GetInstanceID(), this.part.persistentId, value, this.part.ClassID, this.part.craftID
                                    )
                         , Logging.LogType.WARNING
                     );

                 part.persistentId = value;

If you are diagnosing a problem, checking only your hypothesis is… well, like being chased by Wisdom but managing to run faster.

Spoiler

(Users are not playing the game in your rig, you know?)

 

11 hours ago, TruePikachu said:

Under my testing procedure, at least, log spam under the editor is triggered by instantiation of a part in SYD's part inventory.

I'm not even looking on SYD yet, I'm trying to understand the persistenId behaviour because even if SYD rolls back to the previous, custom GUID system, it still need to cope with how KSP works internally after the First Great Breakage at 1.4.0.

Right now, what I have on my logs is:

[WRN 17:42:19.236] [Part]: Cannot have ModuleCargoPart and ModuleInventoryPart on same Part [mk1pod.v2'. Removed ModuleCargoPart
[WRN 17:47:05.766] [ScrapYard] Part mk1pod.v2:FFF8C676 has persistentId=1590504603 and the value (1590504603) is being written back. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:47:07.416] [ScrapYard] Part mk1pod.v2:FFF8C676 has persistentId=1590504603 and the value (1590504603) is being written back. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:47:21.382] [ScrapYard] Part mk1pod.v2:FFF8C676 has persistentId=1590504603 and the value (1590504603) is being written back. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:47:23.471] [ScrapYard] Part mk1pod.v2:FFF8C676 has persistentId=1590504603 and it's being changed to 3513538705. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:47:24.219] [ScrapYard] Part mk1pod.v2:FFF8C676 has persistentId=3513538705 and the value (3513538705) is being written back. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[ERR 17:47:24.464] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294504548
[ERR 17:47:24.464] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294504548
[ERR 17:47:24.522] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294504548
[ERR 17:47:24.522] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294504548
[WRN 17:47:38.707] [ScrapYard] Part mk1pod.v2:FFF7EBB6 has persistentId=1590504603 and the value (1590504603) is being written back. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:47:38.790] [ScrapYard] Part mk1pod.v2:FFF7EBB6 has persistentId=1590504603 and the value (1590504603) is being written back. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:47:49.979] [ScrapYard] Part mk1pod.v2:FFF7ABE0 has persistentId=4043188831 and it's being changed to 1590504603. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:47:49.997] [ScrapYard] Part mk1pod.v2:FFF7ABE0 has persistentId=1808313221 and it's being changed to 1590504603. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:48:41.567] [ScrapYard] Part mk1pod.v2:FFF7A77E has persistentId=33424378 and it's being changed to 1590504603. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:48:41.580] [ScrapYard] Part mk1pod.v2:FFF7A77E has persistentId=151739214 and it's being changed to 1590504603. The ClassID is 49675735 and the CraftId (cid) is 4294504548.

You see, the persistentId is being changed and not changed before the LogSpam, but also later. This log is an excerpt of a test run in which:

  1. Compile and install the HEAD of the branches above on the target KSP
  2. Fire the damned thing
  3. Create a new savegame called test-scrapyard (delete any previously existent one)
  4. Create a craft called Untitled Space Craft on VAB. Save it and then launch it into Launch Pad
    1. I used a mk1pod and a Flear SFB.
  5. Go back to KSC, leaving the craft on the Launch Pad
  6. Enter VAB
  7. Load again the Untitled Space Craft.
  8. Do it again just for the lulz
  9. Exit the game session
  10. Close KSP

https://github.com/net-lisias-ksp/KSP-Recall/issues/67#issuecomment-1569102163

Note that I didn't clicked on the New button before loading the craft again. To tell you the true, I never tried this by clicking the New first, this is another test session in need to be executed.

You see, changing or not changing the persistenId is not directly linked to the LogSpam, point. There's something else happening, and this is proof that we need to look somewhere else.

 

18 hours ago, Gotmachine said:

ScrapYard didn't rely at all on persistentId prior to the commit I linked.

Before the commit you linked, SYD was running on KSP 1.3.1. That commit happened after the Great Breakage of KSP 1.4.0.

It's probably be the reason the changed was made, but only the original author can answer that (if they recall it, of course).

 

18 hours ago, Gotmachine said:

I can see why it would benefit from reading that id, but I don't see any valid reason to overwrite it.

What doesn't means there's none. SYD intends to enhance a part reliability by "reuse", and so it needs to keep track of each part "incarnation", so the same Part Identifier needs to be trackable between different craft instances, because from the SYD point of view, it's a part that was refurbished and used again - not to mention that firing up two equal crafts in sequence should not be counted as reusing the exact same part from the warehouse.

That said, I'm not telling that this is the best way of accomplishing the task - it may be even the worst, but this is just not under discussion at this moment. It's out of the scope of this issue.

What is being in dispute here is the changing of the persistentId being the trigger of the LogSpam, something that it's being refuted again and again.

 

11 hours ago, TruePikachu said:
On 5/31/2023 at 6:36 PM, Gotmachine said:

It shouldn't change the ID.

I agree with this assessment;

We don't have evidence of that. It may be even true, but until the moment we DO NOT have evidence of this causing problems - au contraire, my last test session suggests it's harmless.

Correlation does not imply causation.

We need to start to look for the problem somewhere else.

 

11 hours ago, TruePikachu said:

You made sure you didn't run out of stored parts, right?

NOW that the persistendId is definitively out of the suspects lists, I will start to look somewhere else.

Since you did some work in advance, and it made sense, I will see how to probe this new hypothesis.

Edited by Lisias
Brute force post merging
Link to comment
Share on other sites

On 6/1/2023 at 10:32 PM, Lisias said:

What doesn't means there's none. SYD intends to enhance a part reliability by "reuse", and so it needs to keep track of each part "incarnation", so the same Part Identifier needs to be trackable between different craft instances, because from the SYD point of view, it's a part that was refurbished and used again - not to mention that firing up two equal crafts in sequence should not be counted as reusing the exact same part from the warehouse.

IMO this can be accomplished solely with the PartModule that exists on instantiated parts in conjunction with the record keeping that's already being made in the mod's scenario(?) node. What purpose does maintaining the exact same persistent ID accomplish that just doing part module copies from recordkeeping doesn't? The ID is already being saved in SYD's PartModule, by the way, so write hijacking the vanilla version of it shouldn't even be necessary, exempting a state where the PartModule is strictly unavailable.

Thinking about it, if the ID itself isn't the problem, maybe it changing part modules around could be? This is just a wild guess which I haven't tested in any manner (and probably can't since I don't have a dev environment nor the API experience), but considering that the state of the fully-loaded craft on the pad depends both on the vanilla .craft file mechanics and the modded SYD mechanics (or, in the editor case, vanilla part instantiation with a defaulted module versus SYD logic with modified modules coming from the savegame)... when does restoration of the PartModule state happen anyway?

 

Right now, I'd probably experiment with removing requirements to write to the vanilla ID field; if they're not needed, we might have improved compatibility with some other mods, but if they are needed, the reason would show itself in testing. But if the same exact issue happens when the writes are removed, the writes clearly aren't the entire problem (though they could still be part of the issue).

 

EDIT:

On 6/1/2023 at 10:32 PM, Lisias said:
[WRN 17:47:23.471] [ScrapYard] Part mk1pod.v2:FFF8C676 has persistentId=1590504603 and it's being changed to 3513538705. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:47:24.219] [ScrapYard] Part mk1pod.v2:FFF8C676 has persistentId=3513538705 and the value (3513538705) is being written back. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[ERR 17:47:24.464] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294504548
[ERR 17:47:24.464] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294504548
[ERR 17:47:24.522] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294504548
[ERR 17:47:24.522] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294504548

This suggests that changing the ID is the issue; it gets changed to something new, the new value gets written back again (nop), and then spam on the part that had the ID changed.

 

On 6/1/2023 at 10:32 PM, Lisias said:

You see, changing or not changing the persistenId is not directly linked to the LogSpam, point. There's something else happening, and this is proof that we need to look somewhere else.

You can only say this if you provide an example where a part's ID doesn't get changed at all but it still contributes to the spam. Right now, for all we know, logic the flowgraph depends on is still referring to the older ID because it never expected the ID to change.

Edited by TruePikachu
Manual post merge from desktop
Link to comment
Share on other sites

A couple of comments:

  1. A lot of time and effort is being expended on researching a behaviour which, in many people's opinion, should not be done.
  2. It might be better served to first, find all the places where this is being used, and then replace them all with a safe var in the part module.

It would certainly be nice to understand what's going on, but the end goal is to fix the bug.  

Link to comment
Share on other sites

  • 3 weeks later...
On 6/2/2023 at 12:42 PM, linuxgurugamer said:

A couple of comments:

  1. A lot of time and effort is being expended on researching a behaviour which, in many people's opinion, should not be done.
  2. It might be better served to first, find all the places where this is being used, and then replace them all with a safe var in the part module.

It would certainly be nice to understand what's going on, but the end goal is to fix the bug.  

The bug appears to be on KSP itself, its proven more than enough that the piece of code it was under scrutiny is not a factor to the problem: the Problem is being triggered even when the value is not changed.

You can't fix a bug that you can't see.

Edited by Lisias
better phrasing.
Link to comment
Share on other sites

On 6/23/2023 at 5:31 AM, Lisias said:

The bug appears to be on KSP itself,

I've only ever seen this log spam happen with SYD installed, which heavily implies SYD is doing something it shouldn't be doing. Note that even if the game is exposing a bug to the API, the mod using the API can still be considered to have a bug if it's not accounting for the game's bug in its handling of the particular API. Also note the thing SYD is doing doesn't appear to be a documented feature of the KSP API, technically making it unspecified behavior at best; it might have been convenient in older versions that it worked, but there was no guarantee it would continue to work in future versions.

 

On 6/23/2023 at 5:31 AM, Lisias said:

its proven more than enough that the piece of code it was under scrutiny is not a factor to the problem: the Problem is being triggered even when the value is not changed.

Your own log indicates the value was changed and there was log spam after it was changed. The fact the log spam didn't happen on the very next line after the change is inconsequential, as is the facet that it appears to have had the value written twice. Note that the following example snippet of C (which is not from KSP itself, but rather only exists to highlight the flaw in your argument) exhibits the same events that your log implies are happening, but still indicates that manually changing the persistent ID is probably not a good idea.

long you_dont_see_this = 1590504603;

/* Later... */
long persistentId = you_dont_see_this; /* Set the initial value */
persistentId = 3513538705; /* BUG! You shouldn't change the value */
/* Output from KSP log when the above "happens":
[WRN 17:47:23.471] [ScrapYard] Part mk1pod.v2:FFF8C676 has persistentId=1590504603 and it's being changed to 3513538705. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
*/
long some_other_memory = persistentId; /* Copy it elsewhere */
persistentId = some_other_memory; /* Copy it back, this is nop */
/* Output from KSP log when the above "happens":
[WRN 17:47:24.219] [ScrapYard] Part mk1pod.v2:FFF8C676 has persistentId=3513538705 and the value (3513538705) is being written back. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
*/
assert(persistentId == you_dont_see_this); /* Check is false because of the BUG! line above */
/* Output from KSP log when the above "happens":
[ERR 17:47:24.464] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294504548
[ERR 17:47:24.464] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294504548
[ERR 17:47:24.522] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294504548
[ERR 17:47:24.522] [FlowGraph]: Graph already contains item! Part mk1pod.v2 with id 4294504548
*/

 

On 6/23/2023 at 5:31 AM, Lisias said:

You can't fix a bug that you can't see.

It looks like you're only looking for a bug in KSP so you can try to fix it in Recall, versus considering that SYD might be the one with the bug and KSP is working as best as it can after SYD changes something it shouldn't have.

Edited by TruePikachu
Typo fix in code
Link to comment
Share on other sites

1 hour ago, TruePikachu said:

Your own log indicates the value was changed and there was log spam after it was changed.

THAT specific log. There're some more to compare with.

https://github.com/net-lisias-ksp/KSP-Recall/issues/67

Do you want me to fill this thread with tons and tons of logs so your laziness can be satisfied?

In a way or another, here:

[WRN 17:47:49.979] [ScrapYard] Part mk1pod.v2:FFF7ABE0 has persistentId=4043188831 and it's being changed to 1590504603. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:47:49.997] [ScrapYard] Part mk1pod.v2:FFF7ABE0 has persistentId=1808313221 and it's being changed to 1590504603. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:48:41.567] [ScrapYard] Part mk1pod.v2:FFF7A77E has persistentId=33424378 and it's being changed to 1590504603. The ClassID is 49675735 and the CraftId (cid) is 4294504548.
[WRN 17:48:41.580] [ScrapYard] Part mk1pod.v2:FFF7A77E has persistentId=151739214 and it's being changed to 1590504603. The ClassID is 49675735 and the CraftId (cid) is 4294504548.

FOUR times the persistentID was changed, and no LogSpam was triggered, on the same log.

 

1 hour ago, TruePikachu said:

It looks like you're only looking for a bug in KSP so you can try to fix it in Recall, versus considering that SYD might be the one with the bug and KSP is working as best as it can after SYD changes something it shouldn't have.

Interesting, from my point of view it looks like you are cherry picking the pieces of logs I post, and completely ignoring that I posted the logs where the LogSpam happened, between a lot of logs where it didn't (and told about).  Should I post them all here? Isn't easier to just check the logs I posted on the issue tracking, looking for a flaw on my Test Sessions, instead of wasting everybody times making assumptions with little to no evidence to support it?

If my test sessions are flawed (it happens), pinpoint the source of the flaw. Talk is cheap, show me the code.

Besides, as I had said and you conveniently ignored:

On 6/2/2023 at 2:32 AM, Lisias said:

That said, I'm not telling that this is the best way of accomplishing the task - it may be even the worst, but this is just not under discussion at this moment. It's out of the scope of this issue.

There's no doubt that SYD is going to be changed. But without understanding why, you will just move the problem to another place and then someone will have to diagnose this problem again.

Until I'm convinced that changing the persistentId is really the source of the problem with reproducible evidences, I will keep advocating for looking the problem somewhere else in the SYD's code (or not). AND AGAIN, I'M NOT ADVOCATING TO WORK AROUND IT SOMEWHERE ELSE, I want to understand what's happening and why.

And I was candidly ignoring a detail about this problem: exactly what happens when the LogSpam is triggered? What the real side effects to the game? Memory Leaks? Performance Issues? Savegame corruption?

Until this moment, I was focusing on the symptom, because I agree we should be aware about what is happening and why.

But if I create a Log filter on SYD and just omit the LogSpam from being written on the KSP.log (what would be a <piiiiii> move, to make it clear, and I do not condone it), exactly what would be the consequences?

Because, you see, this may be another case of false alarm, in the same sense that damned ADDON BINDER ERROR that happens when you merely probe if an Assembly is loaded or not.

 

Edited by Lisias
Kraken damned tyops
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...