Jump to content

Jirokoh

Members
  • Posts

    45
  • Joined

  • Last visited

Everything posted by Jirokoh

  1. That's the thing. What I would like to do, is provide zero domain specific knowledge to the algo. Asking it to check if the Thrust / wieght ratio is better at some stage is hard coding what we want it to do. Again, in this context, it is easier to solve the problem of flying like this, but I'm not going for the easy approach (I mean, where's the fun in that? :P) What my algo would be able to learn, I would know once I'm able to get it working ^^ Which, for now isn't the case, because I haven't really gone back to it. I'm working on other projects at the moment, but this one is still somewhere in the backk of my mind, and I do plan on going back to it at some point! At the moment, my rocket doesn't have enough fuel to go high enough for drag loss to be the most important ones. I'm only doing a 10 or so seconds flight for now. Hopefully in Bertrand 2.0 that would be implemented. But for now, Bertrand 1.0 isn't even here!
  2. UPDATE: IT'S (sort of) WORKING!! First of all, because that's the coolest thing, here's the video: I've had a bit more time on my hands, since my final exams are finished. I still haven't completed my thesis yet, but I'm nearly done, so I can come back to KSP. So, what I've done is basically train the neural network not only on the time segment where it currently is at, but look at the previous ones as well. When it's train and comes time to predict the action to take, Bertrand looks at the observations from the current time step as well as the previous one (so it looks at (t) and (t-1)), and makes a prediction based on that. As you can see, it's not perfect, but it's doing (modestly) better than random already! In the example seen above, it takes in total 9 actions: it throttles down for the 3 first, but then for the following 6, it always go throttle up! And the cool thing is, that was only trained on 30 examples which is a really, really small dataset. I'm way too excited that this (kinda) worked, so I shared it before trying to improve that with more training examples. The neural network itself can probably be improved with a bit of fine-tuning, so I still have a lot of work I can do. So, yeah, there you go. I made an IA that can (sorta) learn totally on it's own to throttle up to go higher. (well, technically it doesn't yet have time to go higher, since it always goes down for the first 3 steps, which makes it loose of thrust at the very beginning and there aren't enough time steps after to overcome that early loss. But what I mean is that it then just goes full throttle, which on the long term is what I want it to do).
  3. Yep, I actually started with quickload, and in both cases I'm incounter the same issue. But, when I have more time, there are a few tricks I'd like to test, that might allow me to solve, or at least work around that! We'll see!
  4. KSP. Apparently, if you revert to launch too many times, RAM starts getting saturated. That's just what I found online, but what I did see is that indeed, as I revert to launch more and more, my frame rate goes down. And I'm already at the lowest settings and lowest resolution to make this run as easily as possible for this.
  5. Well, I've been thinking about coding a "simple" KSP sim, but I'd like to try to see if I can't solve the problem in KSP first. Because, while this solution might work for this easy problem, it's still only the first problem I want to try to solve. If I can make this work, I'd like to go further, and tackle more complicated problems. That would mean also making my simulation more complicated, to train it on those more complicated issues. But, maybe, why not? I think I'll have to wait to have more time on my hands to do that then, so maybe only in a few months! While I am getting 1 data point for every run, each run is only 12 to 13 seconds long, and during 1 run I actually get 6 / 7 meaurement points. So it's not that bad. The real issue is the RAM leakage, meaning I can't scale my data scrapping on long periods of time. If I could, I'd just leave my PC to run during the night and get data like that. It would be long, but it would work. Fixing this issue is what I'd like to do, because it would allow me to at least be able to then scale to more complicated problems using the same method, even if each run would be longer for more complicated problems.
  6. Thanks a lot to those how have been following this then! Hopefully in the next decade uh, weeks or months I'll actually have something working! I'm not sure I get your idea here? The goal is to have something learn to fly in KSP, not in the real world (though, if people have waaaay too much money and don't know what to do with it, my DM's open, just sayin'). So I would rather have to make an environment (understand code the said laws of physics, gravitation, atmospheric drag, intertia, you name it) that mimics KSP, but runs much, much faster, in order to speed the training process up, while still being in an environment that is close enough to then transfer the learning into KSP. No problem, all ideas are welcomed :-) Well, my bad then!
  7. Thanks a lot! I'm very glad to see people have an interest in this! (and obvioulsy no, having an AI is not cheating, as long as it's mine) EDIT: damn, those other posts are impressive. I feel like I've got to raise the bar in order to match what the others are doing! Awesome to see so many different fun projects!
  8. Again, I think we're going off track here. And, to be honest, I have some doubts about being instinctly afraid of a lion. I'm no expert, so I don't want to go deeper into this sort of discussion. It's only going to go more into what probably both of us don't know about. Thanks for the book recommendation though, I'll have a look! In other news, still pretty busy at the moment, so I haven't gone back to improving anything or making any progress. I'll keep you guys posted!
  9. I'm not sure about that. From what I understood, it's rather that we have a hugely adaptable brain that can learn very efficiently and understand correlations very easily. It feels like the brain is an awesome correlation finding machine, which does allow us to in turn learn how our environment works, rather than that being hard-coded. That's why blind people, semi-paralized, or others can learn to interract with the world in a totally different way than you and me do, it's adaptation rather than rule based. Just like good code: it's scalable to new problems. But we kind of digress here. (still, super interesting topic, which I know way too little about compared to what people working in the field know. So, as the rest, don't trust what I write here )
  10. Well, this analogy works but only up to a certain way. Humans are stupidly efficient in the way they learn compared to even the best of the articifical intelligence that we can make today. So, at the level where I am trying to make this, it's really super basic things. I think we should keep in mind these are analogies, that are good at explaining the big picture when first trying to understand these concepts, but that rapidly become ireelevant when trying to build or debug implementations of these algorithms. At the moment, the issue I have is more on getting enough data, and being sure that my model is actually learning what I want it to, so this starts really being outside of the boundaries of such an analogy. But again, these are great for explaining in simple terms how in general machine learning works! But we should be very careful not to push the analogy too far, that becomes anthropomorphism, which is starting to become irrelevant. There's still so much to do before we can even consider AIs to be really learning like humans. Even more in my case: this is really a dumb program, it's basically trying to find correlations and a local minima of 3 (for the moment) different data parameters.
  11. Well, that's basically what I'm doing already, kRPC is controlling everything, and the only parameter is throttle for the moment. Pitch will come later, when I'll have something working for just thrust. This is more of a test bench to see if it can actually work. And, it's not *really* an evolutionary alogrithm, that works a bit differently, but the point about data is still pretty much the same. That is the biggest problem I have at the moment. Again, tha'ts why I'm only trying to teach it to throttle, going straight up, for only 7/8 seconds And I don't think I'll be able to tranfer the learning just as easily to be honest.
  12. That did cross my mind at some point. There are multiple drawbacks to that: - Even if it's not *that* hard on paper, it's still going to take an aweful lot of time for me to do that. because, in programming, just like in KSP, things don't quite go well on the first try - It has to be very similar to the environement of KSP. The IA is going to learn how things are going in the environement it has trained in. But if the envrionement in then moves around in changes a bit, I'm afraid it would get totally lost That's more related to the way machine learning models work, they are very sensitive to this kind of change. Usually, we try to avoid this as much as possible. But, since I'm only using altitude, speed and throttle at this point, it might as well be possible. The big issue might be when trying to scale this up. Bascially I'm going to have to make KSP from scratch if I want to do this ^^ It could also be an intersting benchmark to test the machine learning model itself, not necessarily to train it for KSP, but just to see that in this environement, it does learn properly. Because right now, I don't know my problem actually is a lack of data. It might be, but it might also not be. But the big benefit is indeed that once it's working, it would speed up training by multiple orders of magnitude. (And not to say it probably would be a great excercise for me to try to implement that) The idea is quite tempting, at least for this easy first case. I'll have a more in depth look at what it would take to do this, because on paper it is a good idea, and I'll let you guys know. (I just have a thesis to finish at the moment, I'm not sure I'm going to have a lot of time to devote to this unfortunately)
  13. Yeah, I've been following Code Bullet for a while! But, as you pointed out, KSP is quite different from Pacman, or the other games he plays, because he usually makes the environement himself, whereas I'm using one that already exists: KSP So I'm trying to work around the limitations of KSP, that clearly wasn't made with people trying to make IAs with it in mind. But it's an interesting project then! I did the live-stream, thanks for those that tuned in! I have a recording on it, but it's not on Twitch, I don't know if I can upload it once the stream is over. (Streamed for the first time, so of course a lot of things didn't go as expected!) Anyways, the take-away is that my overall program runs for the moment, but it's just not actually training the way I'd like it to. So there's going to be a lot of work to figure out what the issue is, and fixing it. I'll post a bit more later I think.
  14. WOOHOO I'm not dead (and I've actually been working a bit) Just wanted to let you guys know, I think I have a little something that seems to be working. The end result is actually very bad, but I think that's mostly due to lack of data (and maybe of my skills too). That shouldn't be a problem too long (I hope). Anyways, if you guys are interested, I'm going to be streaming what I found and explaining it tomorrow on Twitch: So, on May 12th, at 1:00pm UTC time (I live in Taiwan at the moment, a lot of the people I know that play KSP are in France, so that's the time that most fits everyone). That makes it: 9am EDT and 6am PDT for the folks in the US. Hope you guys will be interested in this! Though, I should warn you: Don't get your hopes up, this is mostly a first test of linking KSP, Python and some machine learning together, it's faaaaaaaaaaaaar from perfect. (I promise I didn't harm any Kerba... Nah nobody believe that, you know how Kerbal science is done)
  15. Just saw your post now! Thanks for sharing I didn't talk about that, but I'm most probably going to be using Tensorboard, sticking to what I know. It's a great tool for looking into models on TensorFlow / Keras. No news on this project for the moment, I have exams coming up, and also spend a lot of time preparing for interviews for internships. Hopefully when that's passed I can go back to KSP.
  16. That's going to be maybe onther step after. Recording it really isn't the priority, that's why I'm thinking of live streaming, not much effort I really want to focus on the machine learning itself
  17. That's pretty much the plan, trying to have some ideas of what I'm dong and where I'm heading with that, to try to make it even remotely interesting to watch. It's going to be a bit of trial an error though, I've never done that before, but it sounds appealing I'll let you guys know how it goes!
  18. I am totally aware of that, I've been training some models for a few months now, I know it's not the most interesting thing ever. It's not really the learning itself I'd like to stream, because there's nothing to do or even to really watch during that time. It would rather been the making of the program, and testing it to see if if has correctly access to all the controls, efore actually training it. During learning, I'm not even around on my computer, I just let it do it's thing, because it mostly is computation. I think it also could be a good opportunity to explain to people what I'm doing, maybe answer a few questions if people have any. While neural networks and AI are big buzz words at the moment, I do think there's a lot of confusion around them and this could be a good opportunity to try to explain the little I know to people interested. I just want to give this a try, maybe it's not going to work out, we'll see Editing requires a whole different level of skills that I don't have, and a lot of time that I do not know if I want to spend doing that. That's why I was thinking about streaming.
  19. Question Would you guys be interested if I streamed some of the tests I am doing to try to get this AI to fly? These would obvioulsy be a bit messy, with a lot of things going wrong, but I can promise explosions, at least. Not sure how this would go as I've never done this before, but I'm thinking about it, what do you guys think? Let me know if you'd be interested, and also if you wouldn't be! I'd still keep updating this post here each time I actually make some meaningful progress.
  20. A little update since it's been a few days since I last posted here. I have been talking about this project with two friends of mine, one being @Dakitess (whom I've talked about this for a few months already) which some of you might be familiar with. He's kind of my KSP Guru, whenever I have KSP related questions, he's the one I go to. Basically the conclusion I am at for the moment is that this requires neural network features that I am not familiar with. I already knew I was entering uncharted seas as far as what my knowledge of reinforcement learning was, but didn't think I'd have too many things to learn. I was wrong. I have started learning to use OpenAI's Gym, to get an understanding of how it works, and what I can do with it. I've learned quite a lot with this, but it's not going to be enough for Bertrand. One of the easiest thing you can do is implement a neural network that learns how to stabilize a pole on a sort of cart, which can only go left or right. Once trained, here's what it looks like: This might not look impressive, but at the slightest move, the pole being in a very unstable situation, it falls. Except here the neural netowrk learned how to stabilize it, not perfectly, but good enough. Yeah, nothing impressive, but to code, it's actually not that straight forward. Here's the best tutorial I found for this specific example (The only difference being that the guy uses TensorFlow in this article, and I went for Keras, that I find much simpler to use and work with. I also have more experience with Keras and feel much more confident using it. For this project, that should work, hopefully). Let me explain to you how this works: Step 1: We tell the agent (that refer to the program that takes the actions, here either left or right) what actions it can take in the environment (here, the game) and how to calculate its score: you gain points each second the pole is straight up, if it goes beyond 15 degrees, you loose. The aim is to be able to maximize this score, aka keep the pole straight as long as possible. Step 2: Get some initial data to train our neural network on. This is pretty easy in this case: Just ask the agent to take random actions until the poles falls. Since it takes random actions, usually it lasts one or 2 seconds before the poll falls more than 15 degrees, but we don't really care . We just want to see what each action does. We are going to keep track of every aciton the agent is taking, and the resulting position & velocity of the cartpole. Step 3: Feed all that data into the neural network, telling it to figure out how to maximize the score (here, this being keeping the pole up as long as possible). That's what neural networks are good at: finding patterns to maximize (or minimize, depends how you see it) a certain function. Here a very basic neural network is enough. At the end of this stage, the neural network is trained, and ready to make predictions. Step 4: At each time step, take the position of the cartpole and the pole, give it to the neural network as input, and ask it to predict the best move to make (either left or right) to keep the pole balanced. We take that move, and observe the result, which gives us another state. We feed that again to the neural network for the prediction on the best action to take this time. We do this for every time step, and voilà ! (Step 5:? Step 6: Profit) So there we have it, a neural network trained to steer a cart to keep a pole steady. That took me about 2 evenings to figure out shamely copy, but adding and testing new things along the way to see how different elemts of code worked and understanding the ins and outs of that. While that's pretty cool (Okay, maybe not. I find it cool at least, especially once you've worked for a few hours on it). But that's not really teaching a rocket to fly now, is it? Well, nope. The big difference here is that the cartpole actions only depend on the state at which we are now. So that's actually pretty easy to figure out. We could take a picture of the cart and ask the neural network what to do; meaning nothing depends on the previous states the agent was in. That's not going to be as easily possible with a rocket, because each state of flight is different, the atmosphere's propreties are changing as we go higher, the air resistance also changes with our speed and altitude, and the agent itself (here the rocket) changes mass as the flight goes on, burning fuel. So we need some sort of time dependancies. And that's what I don't know yet, and where I need to leave this project aside for some time, while I go learn about that, how they work, and even more importantly what we can do with these, and how to use them. (for people interested these seems to be LSTM layers mostly, also looking at the A3C algorithm, since this seems to be pretty much the state of the art, and what the guys that did this before me used. But that requires multiple instances of the game at once, so another challenge. Though it should be not too complicated to code multiple instances of the game run at once, I don't know if my little laptop will really be able to handle it). I'm also working on other machine learning projects at the moment, and I'm looking for a 6 months internship too, so this project isn't getting my full attention. But I'm still super motivated to work on this, and do believe it's feasible! I'll let you know if there is any other major step forward, but it's probably going to take me multiple weeks to get a full graps of these time dependancy neural layers, and I'm also learning reinforcment learning as I go. I'm pretty new to this type of AIs, but it's super interesting to learn so much, especially with a project like this to apply it to! Basically take a seat and don't expect to see anything that soon! I still hope this is at least interesting to follow :)
  21. I did try it on my PC first, but using local host only. Thanks for the Discord link, going there right away!
  22. 1st UPDATE I'm stuck. That was fast. Seriously though, I'm trying to connect kRPC to a Google Colab Notebook to run all of the code online, and keep all my local ressources for running KSP. Especially since Colab gives a free GPU to use, that's really the reason why I'm trying this. So I've tried doing this: !pip install krpc import krpc conn = krpc.connect(name='Web testing', address=ip, rpc_port=50000, stream_port=50001) With my IP adress just above that last line, and launching a websocket server in kRPC with that same ip adress, RPC port, and Stream port. Basically nothing happens and eventually the Notebook just tells me 'Connection timed out'. (no issue on installing kRPC with pip nor importing it, that works fine.) I think I'm missing something here, does someone have any idea what I could be doing wrong? I should mention I don't really know much about servers and protocols, I'm entering unchartered territory of my knowledge here.
  23. That's what I have in mind if it doesn't work. But that feels a bit like cheating to be honest ^^ And I do also want to blow it up a lot, just for fun. But if eventually it never goes past the blowing-up phase, then I'll obviously have to try something else. I just don't want to discard the fully unsupervised learning without even trying it in the first place
  24. Hense wanting to run multiple instances of the game at once. If I can outsource the code running Bertrand to Kaggle or Colab, that means my laptop can only work on running KSP, that would be a good starting point I think. I also need to make the restarting conditions as efficient as possible: as soon as the rocket goes into a unrepairable trajectory, restart. That's going to be a bit tricky to code, but I don't think it's going to be unfeasable.
  25. That's sthe stream I was talking about, but I can't seem to find it. If any of you guys knows where I can find it, then I'd be more than happy to have a look! And, I know this is going to be a complicated thing, I'm not even saying it's going to work. I'm just a little bit confident on the fact I think it can work (that doesn't sound very confident, does it? ) Now, just because some have tried and not really succeeded, doesn't mean it's not possible Flying a rocket isn't that hard I think, and that's also why I'm narrowing down the parameters Bertrand will be able to interact with. I'm even been thinking about only giving him access to pitch and throttle. Does anybody know if there's a mod to lock the game in yaw and roll? That would basically make the game a 2D environment, that could be interesting too. About learning Go, that's not really what happened, and there are mostly two different algorithms: AlphaGo and AlphaGo Zero. AlphaGo used human games to learn from them, a little bit like any human player would. But what AlphaGo Zero did, was totally unsupervised learning: it played against itself, only, and got better from there. This means it actually came up with strategies that never occured to humans before, and completely smashed the original AlphaGo. So no, IA is really not just brute forcing in that term, otherwise it's just any other program. What is commonely referred to IA, and what I want to try here is a little bit how humans learn: trial and error, with a feedback on your progress. In Bertrand's case, the feedback is the loss function I talked about in the original post. To come back to the IA taking hours, days, weeks or months to learn how to fly a rocket in the previous streamer's tentative, that's one thing I'm going to try to optimize, how long before each epoch, because I'm going to need to go through hundreds or even thousands. If it turns out ot be too complicated, I might need to record myself playing and feed that to Bertrand to decrease the learning time. But that's just not as fun. I'm also trying to think if my gaming beast of a computer would handle running two instances of KSP + Bertrand. Of course I'll only try with one in the first place, but running 2 games at once means it could double up speed time. The guys that made their neural network fly a rocket had 6 instances of KSP running on a 1080Ti, and then there algorithm on a dedicated server. I don't really have that, so we'll see what's possible. But then they didn't really tell at what framerate, graphical settings nor resolution they were playing. I do believe we could go pretty low before we start going too low. But we'll see that a bit later as well. For now I mostly need to get familiar with OpenAI's gym I think. Anyways, I do have skepticism too, but still believe this could be achievable, so I guess there's only one way to find out!
×
×
  • Create New...