Search the Community
Showing results for tags 'i have no idea what im doing'.
Found 1 result
Hi everybody! (UPDATE on page 3, it's now sort of working) Hope I'm in the right section, I don't come very often to the forums, so just tell me if I got this wrong! So here's the idea: for a few months now I've been thinking about hooking a machine learning program to a rocket and pressing 'Launch' to see how bad things could go. I've looked it up, but it seems not much people have * actually* done it successfully. So HUGE disclaimer: I have no idea where this is going to go. Maybe I'll stop in 2 weeks, maybe I'll get super on board with the idea. But I think there's something really cool to try here and that's all I need. The main idea I have for how to do this is: - Control KSP through the kRPC mod, enabling us to run Python, which means we can import any machine / deep learning librairies alongside it. - Use reinforcement learning to learn from each launch, and try to improve the performances of the rocket. Note that here I'm not trying to generate a rocket design, I just start from a particular rocket and try to teach it to fly it. To put a bit of context around this, I have a few months of practice on neural networks (using mostly Keras over TenserFlow, which will most probably be the library used for running the basic neural network that will be trained) I have barely used kRPC for a hours the last few days to say what was possible with it, I just made a successful program that reloads a quicksave when a rocket after a certain time and looks at the altitude it's gone, to see if it was possible. I've also been through some of the documentation. Oh, and I've never done anything related to reinforcement learning, so just like when I started playing KSP a few years a go, I don't really know what I'm doing, but I have theoretical knowledge, and think this could be fun, that's good enough for me to start! I thought opening a new post here on the KSP forum could be a good place for me to first of all take the time to lay down what I want to do, and get some feedback from people that might have more experience than me, maybe some people might even want to follow progress on this! So to make things a little more interesting, I'm gonna give my IA a name, because I don't like how we just refer all the time to "IA". I'm going to call my 'IA' Bertrand. So, here are the initial ideas I have (this will be a bit technical as the trickiest part of this is to get a neural network running and interacting with KSP, while minimising the RUD counter). - In order to make any reinforcement learning, we need to define a loss function, this is what the neural network is going to have to minimize. Basically what is Bertrands purpose in life? What I would like to try for the very first stage is to get the rocket to a certain altitude. Say 50km to start with. That sounds super easy: just full throttle and nothing else. Yeah, but you haven't seen how dumb a untrained neural network is. So basically the reward function would be something like a function of the altitude, say Altitude*10 and you get -500 if you blow up (this is just a rough idea first, I have to figure out the specifics later). I think taking into the account the amount of fuel left when reaching the altitude is also important, as that's what we want it to do: be the best rocket out of all the self-teaching rockets out there, and make the best ascend possible, using the least amount of fuel possible. - Second, the controls: Bertrand, my IA, needs to be able to control the rocket to make it blow up fly it. But the thing is, with KSP, controls aren't really something we're short of. Remember, Bertrand is going to be really, really dumb at first, we're talking cat-walking-on-a-keyboard-level dumb here. I thought having control over yaw, pitch and thrust would be a good start. Since most rockets are cylindrically shaped, roll doesn't have much of an effect, and I think keeping just 6 buttons (2 for each control) is already going to be hard enough. I'm also afraid that if we start letting roll in, it will heavily confuse Bertrand, because it will change what yaw and pitch do. If this thing turns out to be a good idea, I might add multiple stages to my rockets and thus decoupling as a control. (I'm also pretty curious to see how it starts just randonly decoupling at any time and things just blowing up even more). But I'm worried that at first decoupling will make the capsule hop and give an easy win, so Bertrand keeps doing it without learning anything else. But we'll save that for later, let's not spoil Bertrand too much for now. - Third, this is going to need to run a loooot of times before Bertrand starts doing something not stupid and actually understand it has to throttle up and not do anything else to lift off. This is where my little kRPC program comes into play. I can make it reload the game as soon as the ship blows up, or reaches a certain altitude, and store the final altitude at which the game is reloaded. That's what I managed to do in a few hours of tinkering on kRPC, but I'm also going to need to register all the keystrokes Bertrand does, those are going to be the inputs of the neural network. That's a bit more tricky, but I don't think that going to be very hard. The hard part is going to be feeding that to the neural network. - Fourth I only have a Xiaomi Mi Notebook Pro, and that thing only has an 8th gen i5, with 8Gb of Ram and most laughable of all, an mx150 for a GPU. This bad boy doesn't have any issue running KSP at beautiful max settings and 60fps no problemo. But we're gonna have to train Bertrand too. And Bertrand is going to require some serious hoursepower to get better. This means I'm probably going to run KSP at a stupidly low resolution, and potato graphical settings to just even *try* to train the neural network. This is why it's going to be very important to keep the neaural network architecture as simple as possible! If that turns out to be really too long, I'll see if it isn't possible to use a Kaggle Kernel to get access to the free GPU and connect it to kRPC. Otherwise I might have to pay an online GPU, but I'd rather keep that as a last resort, I'm just a poor lonesome student with a bit too much time on his hands, but surely no money to rent a GPU. So I'll simply take this as the "optimization"' success trophy in my Bertrand plays KSP game. And if hooking a Kaggle Kernel to kRPC doesn't work, I'm not even sure it would with any other paid service; but that's for later anyways. - The idea I have, from this article from people that previously seemed to have relatively done what I want to do would be to use OpenAI's Gym, as a support for the reinforcement learning. OpenAI launched a thing called Universe a while back that was supposed to support KSP, but that got cancelled before we had time to see Terminator-powered rockets take over. But these guys (the ones from the article) made their own environment compatible with the Gym, which I will very happily clone, and tweak according to what I will be trying to do (after I eventually understand all of it). I'm basically going to use what these guys have done to try to get something working. I do think I should spend some time learning how to use OpenAI's gym on easier examples first before doing that though. So Bertrand isn't even going to see the world inside KSP first! That's basically how far my reflexion has gone for now. I've spend a good amount of time going through the internet looking for people trying this, and as I mentioned earlier, didn't seem to find anybody that did achieve this, apart from the guys who's work I shared in this last point. I also saw that a few years ago a guy streamed his neural network training, but that's long been finished, so I can't see that anymore Just to conclude here is what I have already done: Using kRPC, I made a program that goes on for 5 epoch (or 5 times) and each time makes the rocket turn after epoch_number seconds, then waits 5 seconds, gets the altitude the rocket got at after those 5 seconds, and reload to the last quicksave. These altitudes are stored in a dicitonnary called Dict_alt that is displayed at the end. This was my initial test to see if it was simply possible to be able to run multiple launches one after the other without me coming in. And storing in a dictionnary will probably need to be changed to a dataframe using a cuddly panda. TL;DR: I have no idea what I'm really doing, but I want to train an IA to fly a rocket, and I have a lot of work ahead of me for this to even remotely work. Anyways, if this is interesting to follow, well take a seat and prepare to wait a long time, and if you have any good ideas on how I could do this, things I have forgotten even if I haven't even started or improvements, please let me know! As I said, I don't really know what I'm doing with this, so all and any suggestions would be greatly appreciated! Thanks for having taken the time to read this! I hope this turns out to be a real project, maybe!