Jump to content

Lisias

Members
  • Posts

    7,364
  • Joined

  • Last visited

Everything posted by Lisias

  1. Announce! TweakScale Companion for Firespitter 1.3.0.2 is on the wild. Fixes a screw up on the distribution. Thanks to kmsheesh for the heads up! Download here or in the OP. TweakScale Companion for Frameworks 0.4.0.4 is on the wild. Correctly scales System Heat. Download here or in the OP. The ÜberPaket will be updated Soon™
  2. Please don't! I'm planning to use your metadata do double check what I'm doing - I don't wanna loose content due some unexpected condition not handled by the stack! Cross checking is the key to guarantee that. Thank you!
  3. Your report is incomplete and alarmist. Yes, you found a problem. But failed to report that other threads were fetched Allright. I agree that this is still work in progress. I disagree that it's useless. It's just not ready yet. My guess is that @bizzehdee's crawler is failing to detect when the reponse returns an empty page under an http 200. I suggest to check if the response is valid and, if not, to sleep a few seconded and try it again. It's what I was doing, by the way, when I accidentally fired the crawler without auto throttle and got a 1015 rate limited from cloudflare... oh, well... I will do some KSP coding in the mean time. What reminds me: NEWS FROM THE FRONT I gave the 1 finger salute to the pywb's wombat. I'm doing the crawling using Scrapy and a customized script to detect the idiosyncrasies, and pywb is now setup as a recording proxy - something it really excels at. The only drawback is the need to setup a redis server for deduplication. On the bright side, the setup ended up being not too much memory hungry, I'm absolutely sure I will be able to setup a Raspberry PI 4 (or even a 3) to do this job! Setting up a public mirror, however, may need something more powerful (but I will try the RaspPi the same). For replaying, you need a dedicated CDX server in Java to be minimally responsive. And, yes, the thing is creating WARC files as a champ. This solution is 100% interoperable with Internet Archive and almost every other similar service I found. If I understood some of the most cryptical parts of the documentation, we can confederate each other mirror's on the pywb itself, saving us from NGINX and DNS black magic. Note to my future self: don't fight the documentation, go for the Source! === == = POST EDIT = == === Oukey, the 1015 ban was lifted while I typed this post from my mobile. Back to @bizzehdee, follows a thread that were fetched correctly: https://github.com/bizzehdee/kspforumdata/blob/main/topics/1000-meta.json https://github.com/bizzehdee/kspforumdata/blob/main/topics/1000-1-articles.json Again, the crawler needs some work to workaround Cloudflare's idiosyncrasies (being the http 200 with an empty page the most annoying), but the tool is working almost fine. And this parsed data will be very, very nice to feed a custom search engine! === == = POST EDIT² = == === I found an unexpected beneficial side effect of using a local, python based, crawler - now it's feasible to distribute tasks! Once we establish a circle of trusted collaborators, we can divide the task in chunks and distribute them between the participants. This will lower the load on Forum, save bandwidth for each participant and accelerate the results. As soon as I consolidate the changes and fixes I did during the week on this repo, I will pursue this idea.
  4. Fairly interesting approach. I will give this a peek during the night - you are essentially "competing" with WARC. From my side, I will bite the bullet and insist with pywb, besides not exactly happy with the multidaemon solution they choose (external CDX indexer). The direct alternative, OpenWayback, is deprecated and the Internet Archive tools are yet less user friendly. I found a external crawler, by the way, that I can rely to doing the crawling instead of injecting wombat into the javascript land no the browser. The single binary solution was already gone trough the window, anyway...
  5. It would not be retaliation, it would be damage control. Do you know the phrase "We don't negotiate with terrorists"? It's the same principle. If they budge on any harassment campaign, they will encourage more harassment campaigns in the future. It's simple like that. As a matter of fact, it would be probably what I would do - if this Forum became a magnet for toxic people I would not want my name associated to it... They have other games and other Forums to care about. There're places for harsh measures against bad company policies on a Society - but this, definitively, is not one of them. You can't bully people into caring.
  6. Both @Lonelykermit and @Fizzlebop Smith had already addressed why this would be a terrible idea. My turn is to explain how to rework it in a way that it could help us. First things first, they know they screwed the pooch. Badly, they don't need us to remember them all the time. Being them guilty, responsible or victim of the problem, they are still humans and people don't like to have their borks rubbed in their face all the time. So, what exactly we want? Well, we want to keep Forum running. Who can do that? T2. Now, put yourself in their shoes: if YOU were the one responsible for all this mess, including Forum, how do you would want to people reach you about? Being called all day, as you would be a deadbeat being pursued by collectors? I don't know you, but if something I'm paying for starts to cost me my money and my patience beyound the potential earnings I could get from it, I would shut down the damned thing. So, another line of action is needed. We need to reach a win-win situation - we keep the Forums, and they get something back (or avoid losing something). What, so, bring this question to the table: what we are willing to do to help them helping us on keeping the Forum alive? I'm open to ideas. But one possible idea would be, well, writing a letter. Good, old and out of fashion polite and supporting letter - hand written and sent by postal, paying for the stamp. Or something like that. People writing letters and paying for the stamp shows engagement and interest. Sending emails and electronic messages are low effort, but a properly written letter o the T2's ombudsman or whoever is responsible there for handling customers is something else. Of course, I don't have the slightest idea if this would work or not, but at least this will not hurt neither - what makes it a way better option than peskying them until someone decides that enough is enough and pull the plug to cease the harassment. Just my 2 cents, anyway - perhaps doing nothing would be the best option.
  7. Ugh... You are right, I remember cursing this once but completely forgot about.... Well, I will not forget it again! https://github.com/net-lisias-ksp/DistantObject/issues/43
  8. They didn't They aimed somewhere behind the channel, hoping that some of the V2s would hit London! They managed to be right about 50% of the time!
  9. Yes. This is happening since the first time that horrible perversity called PD-Launcher. Some people just don't grasp the idea that you just can't launch a program from a different directory it was meant to be launched... Private Division didn't helped neither, as they were the first ones to change the CWD on KSP (that were the same for more than 10 years). What add'ons authors are doing (and what I'm going to do with DOE too on the next release) is to just sweep the dirty under the rug, doing some shenanigans to find the right place instead of trusting KSP will tell us the correct place (as it is being fooled by the problem itself), and call it a day. Not because this is a good fix, because it's not - it's terrible, because KSP will still misbehave shoving files on the wrong place - but because no one is going to really fix the problem, and most people just blames the add'on author instead of understanding where the problem really is. Thank you very much, by the way, for being one of the people that really understand the problem instead of shooting the messenger! There's a pretty complete essay on : DOE is the result of the efforts of many people, and in the name of @Rubber Ducky, @MOARdV and @TheDarkBadger (the previous authors before me), I thank you. Cheers!
  10. Oukey, so now I know what to do. This launcher stunt used to work in the past, but PDLauncher screwed even this trick. Currently, the less worst option is to use KSSL. Better solutions are possible, however. But I'm not seeing people willing to implement them, unfortunately.
  11. The <KSP_Root>/GameData/DistantObject/PluginData/Settings.cfg file is a template, it's used only when no working settings is found. The real settings, once you change something and click on the Apply button, can be found on <KSP_Root>/PluginData/DistantObject/Settings.cfg. You not being able to find your KSP.log means that you are launching KSP using that Steam launcher unfortunate hack. Please don't do that, KSP itself dosn't behaves correctly when you do that. Without the KSP.log, my hands are tied anyway. The MiniAVC.log is useless, I do't maintain MiniAVC. In true, you should delete all instances of the MiniAVC.dll file in your GameData. There's just on need for it nowadays. Since I'm guessing you are using the PD-Launcher override (mis)stunt, I think you may find it inside the PDLauncher directory. Check it, please.
  12. This tool is not simple thing, neither. I spent a lot of time just trying to setup the damned thing - but once you do the dirty work, it just works. There're tools to build WARC files from dumped files, but you lose what's most important - the metadata that guarantee the data wasn't tampered. In a way or another, once I manage to have this grimacing on the works, I will share everything (somehow), so you can have them if you want. Yes. Being the reason I decided to "go rogue" and do things the hard way. Good idea. The problem I have with archive.org is that I already detected that some pages are missing from the historic, and every time I tried to add this pages to the crawler, I was greeted with a error message to the point I started to think that TTI issued a take down on them. Good to know I was wrong, but I still have the missing pages problem to cope with. But, still, it's a good idea - they are not mutually exclusive solutions. As a matter of fact, in theory I can add the waybackpack WARCs to be indexed and served by pywb the same. Once I finish to install this Kraken damned tool (see below), I will pursue this venue too. === == = NEWS FROM THE FRONT = == === The tool is working (finally), except by crawling. There was no instructions about how to deploy some browser side dependencies, not to mention that I'm using Firefox that have some javascript shenaningans that demanded some changes while deploying - so, yeah, once this thing is working, some pull requests will be made. Right now, I'm cursing the clouds because a browser side library was migrated to TypeScript, and I'm installing a node.js (blergh) environment to compile the damned thing into javascript and then deploy it. All this work will be available to whoever wants it, I will publish a package with batteries included to make the user's life easier - or less harsh, this tool is a "professional" archiver, it's way less friendly than httrack, for example.
  13. Finally answering your original question, yes. That material is hot. However... Only the application/http documents are saved on the WARC files. I would like to have the images hosted on forum archived too, but whatever. This is easily fixable with the tool I choose, pywb. But there's a catch - the pywb tool apparently doesn't agree with Internet Archive about how to calculate the digests, and so I ended up wasting some time redownloading the damned thing thinking that the download was corrupted somehow (the dumb-ass typing this post only thought on using gzip --test after redownloading the freaking gzipballs). I'm currently reindexing the archive to see if we it will ignore the digest, or if I will need to fix about ~65 GB of http dumps to fix them myself - Kraken save the BTRFS with compression activated - it's saving a lot of I/O here. I will come back to you as soon as I manage to import this data on my current pywb collection.
  14. Please give a peek on this post and this other one. Oh, and on this one too! There's a WARC "dump" up to 2023-05 already on WebArchive. It's about 8G of packed data, but it's already fetched from Forum and, so, there's no need to fetch them again. Currently, I'm working on downloading that thing and then I will create a complementary WARC over the 2023-05 one, and then I will see how to feed these data files on the wild (probably by a torrent). With all these data files on hand, we will be able to do some interesting things - but please read my post above where I discuss about legalities. === BRUTE FORCE POST MERGE === To anyone willing to download the Internet Archive data, this dataset doesn't have a torrent, unfortunately. So I made this little script to download that huge basket of bytes using wget with the option to recover the downloads if things goes south in the process. Worst case scenario, you run the script again, no data loss. #!/usr/bin/env bash for f in forum.kerbalspaceprogram.com-00000.warc.gz forum.kerbalspaceprogram.com-00000.warc.os.cdx.gz forum.kerbalspaceprogram.com-00001.warc.gz forum.kerbalspaceprogram.com-00001.warc.os.cdx.gz forum.kerbalspaceprogram.com-meta.warc.gz forum.kerbalspaceprogram.com-meta.warc.os.cdx.gz forum.kerbalspaceprogram.com_202305.cdx.gz forum.kerbalspaceprogram.com_202305.cdx.idx forum.kerbalspaceprogram.com_202305_files.xml forum.kerbalspaceprogram.com_202305_meta.sqlite forum.kerbalspaceprogram.com_202305_meta.xml ; do wget --continue https://archive.org/download/forum.kerbalspaceprogram.com_202305/$f done
  15. Hi, just to be sure... We had already tackled this down on TweakScale's thread, right?
  16. You have a point. I will not try to sweeten the pill on the subject, so I will just address the following point: Yes, and providing tools and teaching them to do that is my goal. However, doing that indiscriminately will hurt Forum, prompting someone to take down the initiative - so I decided to go WARC on the thing, so we can share between us the archives saving Forum's some bucks in bandwidth. Additionally, since anyone can do its own archive and compare the results, this will keep people (including me) honest. There're legally abiding terms published on this Forum, and any change on some of them would be considered fraud - having more people with the same data is a safety measure for everybody involved, as we can support each other in the case of a dispute. I completely agree that plain mirroring the site is a bad idea. In order to have a chance to survive, the Archives must try their best to be plausibly considered fair use on a Court, not to mention gathering people to support on our case, prompting TTI (or anyone that ends up buying the lemon IP) to consider any earnings on taking the thing down versus the drawbacks on P/R and deciding it's their best interest not to intervene in a destructive way. However, we need to help them to help us (willingly or not). So we need to address some elephants in the room (and, yeah, you are really right on the money): Impersonation Dude, this is absolutely a no-go. Under no circumstances one can republish Forum's data in a way that may lead people to believe that you are them. So you just can't publish a mirror of the thing ipisi literis using a different URL. Plagiarism Ditto! If you change the content in an attempt to prevent the Impersonation, you are... well... changing the IP and publishing a derivative!!! This is piracy, simple like that. Copyright Our only hope of success is to rely on the Copyright loopholes that may allow us to legally do this stunt. Given the above considerations, I concluded that going Internet Archive is the most viable solution. The Look and Feel makes absolutely sure you are not impersonating Forum or TTI, the content is preserved preventing plagiarism and since the Internet Archive managed to legally publish their archives, this is a precedent that we may use to do the same. TTI will always have the right to file a DMCA on anyone publishing such Archive, however. To tell you the true, they can do it even on our personal sites about the franchise (see Nintendo). So let's discourse about what would prompt them to do that: Risk of losing control of the IP Devaluation of the IP Lost of revenue (direct or indirect) Someone on TTI waking up in a bad mood in the morning Going Internet Archive style mitigates the Risk 1 and Risk 2 - as a matter of fact, having this content preserved in case of the worst may even salvage some of the IP's value, as invaluable content to reboot the Community will be still available to anyone owning the Franchise in the future - it's notorious that even Nintendo had to rely on "backup sites" to be able to publish themselves some of the ROMs they sold in cartridges in the past! The Risk 3 is something we don't have to worry about, as Forum doesn't generate direct revenue - and the indirect ones we had covered by mitigating Risks 1 e 2. About the Risk 4, the only defense we have is P/R. They had a huge backslash on the KSP2 drama, and that hurts - right now I'm pretty sure there's someone there overviewing everything to prevent another one. Bad P/R costs them money, huge amounts of money. And they are on the game (pun not intended) for the money. So, as long we manage to help them to help us (willingly or not), we have a reasonable chance to score this stunt. (Ab)using a bit the Game Theory, these are the possible outcomes (as long we stick to the rules I'm trying to delineate): We do the Archiving, the Forum survives: Content preserved. We do the Archiving, the Forum dies: Content preserved. We do not do the Archiving, the Forum survives: Content preserved. We do not do the Archiving, the Forum dies: Content is lost. Since our main (and only) goal is the survival of the Content (as nobody here is going to make any money, directly or indirectly, with it), where are the better chances of saving the Content? Well, doing the Archive ourselves. So the logical decision is doing the Archive. But, by then, we risk being taken down on a DMCA, right? What are the possible outcomes? Forum survives, TTI issues take down on the Archives: Content preserved. Forum survives, TTI ignore the Archives: Content preserved. Forum dies, TTI ignore the Archives: Content preserved. Forum dies, TTI issues take down on the Archives: Content is lost. Again, since our goal is the preservation of the Content, it's our best move to do the Archiving the same. What matters if TTI takes them down in the future, as long Forum is alive? And if Forum ever dies, it's still their best interest to preserve the content as any future reboot of the franchise would benefit from it. Heck, I would not be surprised if someone on TTI ends up making a copy of our Archives for them. === == = POST EDIT = == ===
  17. She caught the katy - the Blue Brothers “Katy” is the nickname for the old Missouri-Kansas-Texas Railroad (MKT). The singer's woman took a train on the MKT railroad, leaving him behind.
  18. Security will be a problem. Granted. I may be wrong, but I think that TTI cut the number of servers for Forum. And the Cloudflare is not configured to cache html content (what would cause some esoteric misbehaviours on a Forum), so if people enough access the Forum at the same time, something breaks (probably the database) and we have a bad gateway due timeout, or due plain quota exhaustion.
  19. <n> (where n > 1) federated hosts registered on a central http server, that does a temporary redirect (http 307) in a round robin to the registered mirrors. A daemon pings the known mirrors and remove them from the pool when offline, and puts them back when back online. I had did it in the past, really piece of cake. Again, we are not replacing the Forum, we are preserving a mirror of it in a Community effort for historical reference. We have still a point of failure, the central http server, but this one is incredibly easy to replace. And, yeah, any source code will be available in a OSI license. Anyone will be able to have their own "Federation", I'm not centralizing anything on me. Well, what I have in mind is a static and stateless mirror for historical reference. Distributed by many hosts from voluntaries, so if one goes down, there're many others to keep the thing online. Now, a full Forum replacement, this is another problem. First things first, however. Great.
  20. This is starting to get interesting... [EDIT: Involuntary offtopic spoiled.]
  21. Some pages are missing, just checked. That internet archive tool doesn't works for what I intended (it was a CLI tool to query their database), but I found a fork from another Internet Archive tool that makes incremental updated if you give a previous backup, https://pywb.readthedocs.io/en/latest/manual/configuring.html It does backups in the very same way the Archive does. I think this is going to save a lot of bandwidth on the long run - I still hope we are overreacting, but as I said before: hoping for the best, but expecting the worst. Now I'm trying to find a way to allow people to salvage their private messages. This is going to be tricky, because I will need to provide some kind of application bundle (fortunately, Python Freeze is our friend) so people could just download the thing and run it after providing their credentials. The second challenge is managing to be transparent and auditable to give people confidence that would be safe to give the tool their credentials without nasty consequences. A message suggesting changing the password before and after using the tool would not hurt neither. @Vanamonde - there's any way to find what would be the best time to run the tool? There're any chart in the management console telling the hours in which the site has the smallest load during the day?
  22. Gloster Meteor F8 Fighter "Prone Figher" variant. When they literally shoved a freaking cockpit in the place of the nose cone! Emphasis for: "cut hood for emergency rescue" - boy, ejecting from this death trap would be interesting... Source: https://planehistoria.com/gloster-meteor-f8/
  23. And USENET as backup. Oh, boy... Xyon is going to have my liver for dinner due this... --- -- - - -- --- Now on the serious side. Apparently, Orbiter Forum is running costing ~240USD a month - or at least, this is the current goal for donations. Perhaps Forum could try a similar stunt?
  24. Once bitten, twice shy. Trust needs to be earnt - and once lost, it's twice as hard to earn it back. It's up to the one that lost that trust to balance the gains and losses of trying to earn trust back. The first step is, well, exercising empathy.
×
×
  • Create New...