Reinforced Learning in KSP(using kRPC, maybe?)

DunnoAnyThing · October 8, 2019

Sorry to ask "Is this mod X supported in game version Y?" sort of question, but I've run out of time, so...unfortunately.

My friend wants to practice his 'reinforced learning' techniques on this game(yes, I mean KSP.).
First mod that came to mind was kRPC, but I'm not sure if it works well with the current KSP version(no time to check myself).
So 1). (the 'kRPC works on currnet KSP?') question

2) I think I've come across a post on Red*** about a similar project, but I cannot recall it. Any recommendations for this project(?) is welcome.(especially from those who do RL)

AHHans · October 8, 2019

18 minutes ago, DunnoAnyThing said:

So 1). (the 'kRPC works on currnet KSP?') question

Hmmm... <copy> <paste> <select "titles only"> <hit search>

Seems to be "officially" supported until KSP 1.5. So I guess it will work with KSP 1.7. But you (or your friend) should test it. My guess is that the new features from Breaking Ground may give you problems.

Or your friend stays with KSP 1.5.

DunnoAnyThing · October 8, 2019

At least what my friend is trying to do is Not usable KSP assistance, but RL (preferably on KSP), so gotta try 1.7 then fallback to 1.5 if any problems occur. That'll do.

So on the Reinforced Learning part...Comments are still welcome

AHHans · October 8, 2019

Just now, DunnoAnyThing said:

So on the Reinforced Learning part...Comments are still welcome

Well whom are you planning to force to learn what?

DunnoAnyThing · October 8, 2019

Plans aren't yet set up(this wasn't my idea in the first place), but we'll probably start with things like "making planes reach a certain altitude and keep cruise height while using the least amount of fuel." I'm not quite familiar with RL.(my friend seems to be. probably.)

Pds314 · October 20, 2019

Another option is KOS, but of course you need to save weights and biases globally since exploration of the configuration space of possible weights and biases for a neural network controlling a rocket or plane or whatnot is most definitely not safe and WILL destroy your vehicle many many times.

I'm also a bit curious how you plan to train the network seeing as dying can often be a consequence of stuff you did much earlier, so you can't reliably know where in a run your mistake was.

Edited October 20, 2019 by Pds314

DunnoAnyThing · October 23, 2019

On 10/20/2019 at 11:38 AM, Pds314 said:

Another option is KOS, but of course you need to save weights and biases globally since exploration of the configuration space of possible weights and biases for a neural network controlling a rocket or plane or whatnot is most definitely not safe and WILL destroy your vehicle many many times.

I'm also a bit curious how you plan to train the network seeing as dying can often be a consequence of stuff you did much earlier, so you can't reliably know where in a run your mistake was.

Dying is actually needed. Will just give high reward on flight time, but details may and will vary.

Plus the 'delayed reward' problem is a classic one in RL, so that won't be quite a problem.(I hope?)

Edited October 23, 2019 by DunnoAnyThing

Pds314 · October 23, 2019

9 hours ago, DunnoAnyThing said:

Dying is actually needed. Will just give high reward on flight time, but details may and will vary.

Plus the 'delayed reward' problem is a classic one in RL, so that won't be quite a problem.(I hope?)

It basically means in-flight rewards are hard to do. Your get like 1 useful data point per launch, and simple backpropogation won't work well because you don't know the activation state of the network at the point mistakes were made, because you don't know when mistakes were made. You can either punish/reward it for everything it did in the whole flight by logging all of the inputs every frame, or you can use a non-backpropogation algorithm such as, say, genetic evolution. These do work, but are very experiment-hungry ways to train a network. And in the case of genetic evolution, storing 10 or 100 or 1000 slightly-altered copies of the network and testing all of them every generation makes for even slower progress.

Punishing it for whatever it did a frame before the crash will probably not be useful though, as most crashes cannot be avoided from one frame away. And same goes for punishing it more for stuff that happened more recently to the crash, as, again, it could have been a mistake very early in the flight.

Edited October 23, 2019 by Pds314

DunnoAnyThing · October 25, 2019

Quite right, I guess.
It seems like first creating a 2-D simplified model of ksp then trying ksp-RL out will be a more feasible approach, since there are many problems like ones you mentioned.

Pds314 · October 28, 2019

On 10/25/2019 at 12:56 AM, DunnoAnyThing said:

Quite right, I guess.
It seems like first creating a 2-D simplified model of ksp then trying ksp-RL out will be a more feasible approach, since there are many problems like ones you mentioned.

Yes. Especially since even a 3D stupidly simple rocket sim where everything is one part would run at >10000x physics warp quite easily. It wouldn't need to be perfect. You could randomly vary the rocket/plane parameters and you could port the results to KSP and then do slower training after it figured out the basics.

Igor-cheb · December 1, 2019

Hey, perhaps these would be a good starting point for you:
- https://www.youtube.com/watch?v=09OMoGqHexQ

- https://gym.openai.com/envs/LunarLander-v2/

Reinforced Learning in KSP(using kRPC, maybe?)

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation