foonix Posted January 9 Share Posted January 9 (edited) Background There has been a lot of debate on the internet if KSP 2 is just "ksp1 but new graphics", or if there is some common code heritage, or to what extent the developers might have been ordered to "not change KSP 1 code when developing KSP 2." Personally, I find this debate moot. The idea of "reusable code" is a goal and cornerstone of many software design philosophies. And yet, with time, change of circumstance, change of technology, and change of goals, code will need to be modified to adapt. Even a from-scratch rewrite can result in substantial similarities, or even some identical code, because the rewrite is tying to solve the same set of problems. Re-use of code (or lack thereof) thus speaks to neither the code quality, developer "laziness," or "legacy baggage." It is possible, for example, a developer to not waste time modifying some old code because it happens to be working just fine. But some cursory analysis and a little set theory can shed light on to what extent code might have been used "as-is", at least. This study compares the class/struct names and distinct behaviors of code to look for overlap. The assumption is that as KSP2's development progressed, changed or added code from KSP1 would generally tend to reduce the number of identical behaviors. Holding over more code from KSP1 would tend to increase the number of identical lines. Methodology This study compares the KSP1 and KSP2 codebases by analyzing differences between Assembly-CSharp.dll in the two projects. The releases compared are the portable windows versions of KSP 1.12.5 (sans expansions), and KSP2 0.2.2. In a Unity game, Assembly-CSharp.dll is where most of the game code lives. The developers can use additional 3rd party DLLs, and can put some of their game code in other DLLs. But both projects put the lion's share of the game-specific code in this dll, so focusing on this gives a good indicator of similarities or differences between the two games, while reducing the amount of 3rd party code that is considered. (3rd party code CAN go in Assembly-CSharp.dll, but it's not the norm for large 3rd party libraries.) The code was analyzed with a tool that is commonly known among C# developers. The output of that tool is certain important textual analysis information about the DLL. The text output will tend to change depending on what the DLL does. I can't name it here, but if you ask a skilled C# developer to name two or three tools off the top of their head they might use to compare managed DLLs, the tool used here would probably be one of them. A few files (csproj and some automatically generated Unity files) were manually deleted from both output, as they are not relevant to what we are trying to analyze. Classes names were determined by the filenames output by the tool. Namespaces were ignored. The text was then converted to Unix LF and run through a series of unix-style text processing and munging steps. Empty lines were removed. Leading white space was stripped. Outputs were de-duplicated (the same behavior shared between multiple classes/methods is only counted once) The separate set of text lines from each output were combined, then run through more process to determine the set intersect and set union. ("cat ksp1-uniq.txt ksp2-uniq.txt | sort | uniq > lines_set_union.txt" and again with "uniq -d" for the intersect) The text in the resulting files were counted with "wc -l" to get the data for the below tables. The resulting line and class counts are intended to represent a VERY rough (See "Limitations") gauge of similarity. Study Limitations I don't have a good way to address the "Ship of Theseus" problem with this kind of analysis. It can't address, for example, if a class was sufficiently reworked to not be considered the "same code but with minor changes." I can only look for stuff that is exactly the same. Similarly, I can't tell if brand new code just happens to look the same as the old code. I wasn't standing behind the developers when they wrote it to check that they weren't just cribbing old code and that getting the same results is just a coincidence. There are some cases where more than one class share the same name but exist in two different namespaces. For example, "Wheel" and "VehiclePhysics.Wheel". Due to the fact that I'm blindly text munging, the total classes in a game (which includes classes with the same name in different namespaces) won't add up exactly in the overlap totals (which don't). I'm just trying to get an approximate estimate here. Although I'm using the same tools to analyze both projects, It's possible that the same source code can appear different due to difference in compiler versions. KSP 1 uses an older version of Unity, which could output different Intermediate Language for the same source code, which is what I'm actually analyzing. (This isn't straightforward to fix without access to the actual source code.) There is definitely code in KSP2 that comes from KSP1, but is totally unused. No attempt is made to eliminate this code from the results. See for example, KerbalFSM, KFSMState, KFSMEvent, probably a bunch of of other stuff. This does not compare things like shader/compute code, which is not part of the dll. Results Class set intersect (possibly shared between projects) 587 classes Class set union (all unique class/struct names across both projects) 8,869 classes Line set intersect (possibly shared between projects) 26,683 lines Line set union 434,801 lines KSP 1 KSP 2 Total classes/structs in project 3,298 6,193 Classes/structs not in the other project 2,711 5,606 Unique class percentage 82% 90.5% Total (deduplicated) lines of code in project 214,272 247,212 Lines of code not in the other project 187,589 220,529 Possibly unique percentage 87.5% 89.2% My own rambling analysis after working extensively on KSP2 and briefly thumbing through KSP1 Regardless of how it happened, KSP2's codebase is wildly different from KSP1. Major systems have been significantly altered or radically replaced. Some systems have been converted to burst jobs or even rewritten as compute kernels. The basic "spine" of the game is radically different. They are just wildly different animals that happen to live in the same cage (Unity) and eat the same food (parts/vessel/physics/orbit/terrain simulation). The amount of engineering that went into KSP2 (regardless of what one might think of the results) was significant. A few bits here and there seemed to hold over. A few bits are still in the files but are vestigial. But I don't think suggestions that they are significantly the same really hold up. Edited January 10 by foonix Fix some minor issues. Move a sentence to fit better. Reword to some stuff to be more vague in an effort to enhance compliance with forum rules. Quote Link to comment Share on other sites More sharing options...
The Space Peacock Posted January 9 Share Posted January 9 Excellent analysis, well done @foonix! Thanks for taking time to finally shed some light on this topic. Based on this, i personally do not believe the claim that KSP2 is build on KSP1 spaghetti code holds much water today. Either the KSP1 code was refactored so extensively barely anything of the original is left, or the developers did indeed start from scratch at some point but reused a minor amount of code, either directly or as inspiration. Whatever the case, it does not seem to have played a significant role in the state of KSP2- atleast in the version available to us today. This does not definitively prove (i doubt anything can, save for a resumption of development) KSP2 could have been developed to its full potential with the existing codebase , but it does make for a pretty convincing argument that it's atleast not as simple as "it was never going to be finished because it runs on KSP1's code" Quote Link to comment Share on other sites More sharing options...
Tony Tony Chopper Posted January 10 Share Posted January 10 You have no imagination how much I appreciate your analysis. This opens a better insight of general developing to me which I can translate to other software topics now - at least to some degree. The copy paste argument is basically a insult to the developers now. Why did no one do this before? Was it difficult to do so? I didn't really know this kind of analysis is possible, but there are so many people in the community stating they are developers for themself you could think that at least one of them would have done this long time ago. Thank you! Quote Link to comment Share on other sites More sharing options...
PDCWolf Posted January 10 Share Posted January 10 (edited) 15 hours ago, foonix said: -snip- Just to add extra context, if you look at ShadowZone's video the idea that KSP2 was to be KSP1.5 was only the initial pitch, with a second pitch coming a couple years later moving them on from a remaster of KSP1 to KSP2 proper. Also considering you're admitting to doing something against the EULA and the forum guidelines, I don't see this thread staying up too long. And that's not even going into how the limitations basically make the comparison meaningless as well. You're basically saying "I looked at it and it doesn't look similar", which of course skips most of the human factor of intent and "looking at your homework." Finally, the assembly, as you point out, is a collection of what unity outputs and has almost nothing to do with what things would look like as you set up scenes, scripts and game objects inside unity. 4 hours ago, Tony Tony Chopper said: Why did no one do this before? Was it difficult to do so? Against EULA to do, against forum guidelines to post. Edited January 10 by PDCWolf Quote Link to comment Share on other sites More sharing options...
foonix Posted January 10 Author Share Posted January 10 7 minutes ago, PDCWolf said: And that's not even going into how the limitations basically make the comparison meaningless as well. You're basically saying "I looked at it and it doesn't look similar", which of course skips most of the human factor of intent and "looking at your homework." For what I posted, I think that's actually kinda true in a way. "Lines of code" is a derided metric of of developer work output exactly because it doesn't consider human factors or things like how important a change was. But two things: Making this kind of judgment is a "preponderance of evidence" type decision, not a "my paycheck depends on hitting specific numeric quotas" situation. We're just trying to judge what is more likely. So the normal objections against LoC don't really apply here if we're limiting our judgement to if the devs did something or not. The goal here is to provide some kind of objective evidence to drive that subjective judgment, one way or the other. I know a lot of stuff about the codebase that I'm not sure I can share here in more detail. I definitely have some anecdotes from development of my KSP2 mod. I've gone back through some of KSP2's performance issues I've addressed and checked the relevant code in KSP1. My "human factor" impressions from that experience are in line with the data above. Even some "re-used" code received significant code refactoring. That's why I thought it relevant to include both the class names and the LoC in the data. If you want to know more, ping me on Discord I guess. 59 minutes ago, PDCWolf said: Finally, the assembly, as you point out, is a collection of what unity outputs and has almost nothing to do with how you set up scenes, scripts and game objects inside unity. Code and data tend to be coupled. Changing scenes, object hierarchy, etc will tend to result in changes to the code. If they didn't change these things, their would be no need to change the code. If they did, they'd probably have to. Unity hasn't fundamentally changed much in terms of basic handling of such things. You still use stuff like Instantiate() and GetComponent<T>() and the like to do the same things. So broadly speaking, I'd expect these things to be reflected in terms of code change. But oddly enough, a lack of such coupling in a specific system is one of the performance issues I've tried to address by modding. But it's clear to me that the system was reworked, and that rework is reflected in the numbers. So, you know, it's complicated. vOv Quote Link to comment Share on other sites More sharing options...
foonix Posted January 10 Author Share Posted January 10 6 hours ago, Tony Tony Chopper said: You have no imagination how much I appreciate your analysis. This opens a better insight of general developing to me which I can translate to other software topics now - at least to some degree. The copy paste argument is basically a insult to the developers now. Why did no one do this before? Was it difficult to do so? I didn't really know this kind of analysis is possible, but there are so many people in the community stating they are developers for themself you could think that at least one of them would have done this long time ago. Thank you! Thanks, glad it helps. Software development is a balancing act that depends on technical, business, and human factors. I've found that being empathetic about that helps deal with irritation from hitting "game breaking bugs" or other problems in my favorite games. Why no one did it before, well, I don't know of a specific tool for this. Programmers are usually more concerned with questions like "will it work?" "how does it work?" "What will happen if I do things differently?" The programming tool used is intended to answer those kinds of questions. The programming tool is half of the process. The text manipulation I used is the other half. The former is a tool only used by programmers, and the latter is something more known in the domain of systems administration. I happen to have both skill sets. So the group of people who could do it this way and who would want to do it may be very small. Quote Link to comment Share on other sites More sharing options...
GluttonyReaper Posted January 10 Share Posted January 10 3 hours ago, PDCWolf said: Just to add extra context, if you look at ShadowZone's video the idea that KSP2 was to be KSP1.5 was only the initial pitch, with a second pitch coming a couple years later moving them on from a remaster of KSP1 to KSP2 proper. It's been a while since I watched it so I might be misremembering: I think the ShadowZone version of events also alleges that the KSP2 devs had the KSP1 code base, but were struggling to untangle it without any input from anyone who worked on said code. If that's true (and that's a big if), it would also make sense that a lot of KSP2 code had to be more or less written from scratch regardless, especially post-repitching (Plus there's the added complication that KSP1 kept updating even after KSP2 was supposed to have started development). Personally, I'd always assumed that the similarities between KSP1 & 2's behaviour (at least at a user level) were a result of A) intentional choices to recreate a lot KSP1's quirks in KSP2, and B) a kind of convergent evolution - both games seem to have been built by non-physics-experts effectively independently, so it makes sense that they would run into the same issues if they were trying the same methods. Quote Link to comment Share on other sites More sharing options...
PDCWolf Posted January 10 Share Posted January 10 2 hours ago, GluttonyReaper said: It's been a while since I watched it so I might be misremembering: I think the ShadowZone version of events also alleges that the KSP2 devs had the KSP1 code base, but were struggling to untangle it without any input from anyone who worked on said code. If that's true (and that's a big if), it would also make sense that a lot of KSP2 code had to be more or less written from scratch regardless, especially post-repitching (Plus there's the added complication that KSP1 kept updating even after KSP2 was supposed to have started development). Personally, I'd always assumed that the similarities between KSP1 & 2's behaviour (at least at a user level) were a result of A) intentional choices to recreate a lot KSP1's quirks in KSP2, and B) a kind of convergent evolution - both games seem to have been built by non-physics-experts effectively independently, so it makes sense that they would run into the same issues if they were trying the same methods. I'm sure that, past the original pitch to remaster the game and as they were well into "KSP2 proper", there still were a ton of "let me look at your homework". It is too much of a coincidence in some aspects for how the game handles things, whilst others like aerodynamics are radically different at least in the user-facing side. 4 hours ago, foonix said: For what I posted, I think that's actually kinda true in a way. "Lines of code" is a derided metric of of developer work output exactly because it doesn't consider human factors or things like how important a change was. But two things: Making this kind of judgment is a "preponderance of evidence" type decision, not a "my paycheck depends on hitting specific numeric quotas" situation. We're just trying to judge what is more likely. So the normal objections against LoC don't really apply here if we're limiting our judgement to if the devs did something or not. The goal here is to provide some kind of objective evidence to drive that subjective judgment, one way or the other. I know a lot of stuff about the codebase that I'm not sure I can share here in more detail. I definitely have some anecdotes from development of my KSP2 mod. I've gone back through some of KSP2's performance issues I've addressed and checked the relevant code in KSP1. My "human factor" impressions from that experience are in line with the data above. Even some "re-used" code received significant code refactoring. That's why I thought it relevant to include both the class names and the LoC in the data. If you want to know more, ping me on Discord I guess. At any rate, this'd be better worded as a "does KSP2 include copypasted code", making the presentation of the analysis more literal to what's being analyzed. We'd also need to dive down and see how many of those lines of codes, or unique classes and stuff belong to things not present in KSP1 or as you well say, systems that required being refactored. This is something Nate himself clarified long ago: KSP2 does not use copypasted code. However, reading between the lines as one always had to do with Nate, he's never said they didn't look at KSP1s code to see how a particular problem was solved. In fact, it'd be very dumb if they didn't, wasted time and resources when the solution, or at least -a- solution is already produced. It's also part of why they'd go with the same vehicle middleware... which would also mean at least part of the code that interacts with the middleware should be very similar. Quote Link to comment Share on other sites More sharing options...
Skorj Posted January 10 Share Posted January 10 (edited) 8 hours ago, PDCWolf said: Just to add extra context, if you look at ShadowZone's video the idea that KSP2 was to be KSP1.5 was only the initial pitch, with a second pitch coming a couple years later moving them on from a remaster of KSP1 to KSP2 proper. Was going to say exactly this. What we expected was that KSP2 started as a baked-in mod for KSP1 with some fixes. That would have been the Star Theory days. We knew a lot of work got done after that, including a whole new system for planet textures, Blackrack's work, and a ton of new assets. Heck, they would have had to change a lot just to break things so badly. But it does seem safe to conclude they started with a copy of the KSP1 code base, before a couple hundred dev-years of new work was done. The percentages foonix found don't surprise me at all: a big chunk of KSP1 in a mostly-new codebase. I suspect the original code is effectively double or triple what the numbers suggest, given how simple refactoring can change thousands of lines of source code without changing the object code (actual behavior) at all, and that's expected when a new team is making sense of a legacy codebase and makes it conform to the current coding standards and naming conventions. Edited January 10 by Skorj Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.