Jump to content

Lisias

Members
  • Posts

    7,364
  • Joined

  • Last visited

Everything posted by Lisias

  1. Strange O.S.. The only winning move is not to update. How about a nice boot on Linux? https://www.youtube.com/watch?v=MpmGXeAtWUw
  2. By chance. Would not be that MS engineer get puzzled by that weird CPU peaks, the stunt could had worked. And, mind this, nobody detected the infringing code until after that dude investigated an unexpected misbehaviour. Source code is of little use if nobody is reading (and understanding) it - so my decision to pursue trying to use the easiest programming language that are supported enough to be useful, subject of my original post!
  3. Crowdstrike solved the 502 Bad Gateway issues on Forum!
  4. These privacy concerns are not different from using Google to backup your whatsapp chats. Making backups is ok, giving it to anyone else is not. But since there's nothing preventing the user from saving the pages directly from their browser (as well letting someone read it over their shoulder), I don't see how this could be a legal problem neither. The expectancy of privacy are exactly the same in all cases, it's up to the user themself to respect the legislation, using such tool or doing things by hand. You don't need to use a headless browser to fetch the cookies, you can do everything with wget or curl. Forum doesn't obfuscate content using JavaScript, you don't need a JS runtime to decode things. Unless you find some browser plugin that would do the job for us - this would change everything, as long we cam trust the author! I think that Firefox could be of use here - there's no close source version of Firefox, and the Mozilla Foundation is pretty careful about privacy - if we find a Firefox plugin that could do this job for us, it would remove a lot of weight from our shoulders. You know? Nice idea. I will see if I find something on this line. Being the reason I would prefer to have everything wrote in Python or something that doesn't need to be compiled. The easiest the language, more people can learn enough of it to tell if there's something fishy on the code or not. Don't forget the zx utils supply-chain attack - the injection vector was open and public on github to anyone to see.
  5. That's the idea - providing a tool to people backup their own private messages. The problem is not technological - you can login and handle the cookies with curl if you want and then scrap whatever you want as long your credentials have access. The hard part is to provide a tool that: It's easy to be used by the common joe It's safe (i.e., no credentials and message leaking to anyone else) It's trustworthy (it's not enough to be honest, people need to believe - and verify - you are honest) This idea was originally considered on this post.
  6. C-suits came across the sea They brought us pain and misery They killed our vibes They killed our deeds They took our GAME for his own need We coded it hard We coded it well Out of their plans, we commit them hell But code reviews, make us retrocede Oh, will we ever be set free? Dashing through clouds from barren wastes Engines roaring above the plains Turning the gravity back to their well Gaming them at their own game RUDs for freedom, the push in the back Engineers and pilots and scientists, attack!!! Launch from the hills Launch for your snacks Launch from the hills Launch for your snacks
  7. It was the first thing I thought when I read the news about the AV update.... Nothing is so bad that could not get worse, no?
  8. I think Boeing would reach success on clothing industry... Selling underwear. https://www.theguardian.com/business/article/2024/jul/18/boeing-fresh-safety-questions-engine-fire-flight-scotland (Really... terrible times to own Boeing stocks...)
  9. I'm pretty sure this IS NOT the proper way to add afterburner to a airliner... On a side note... I think Boeing should expand their business into the clothing industry... selling underwear...
  10. NEWS FROM THE FRONT Scraping with pywb is, well, slow. I think the way they do deduplication is terribly unperformant - as the time passes and the redis database grows (and, boy, it grows!), the pages/minute ration drops and drops again. This can't be fixed right now, so I'm just taking the hit. So I decided to prioritize archiving the content without any styling and images, this should reduce a lot the redis hits preventing things from going even more slowly. Of course the constants Bad Gateways are not helping neither - what prompts to quote myself: Currently, I have archived 425.827 pages from Forum, the old WARCs from 2023 have 440.387 - so I'm near the end (hopefully). The Internet Archive's CDX tells me they have 2.164.265 URIs - but it includes imgur images and the older URI scheme too, so most Forum pages are duplicated there. I'm guessing I will conclude the html pages by tomorrow night and then I will focus on the images and styles. The current filesystem usage is: -rw-r--r-- 1 deck deck 43543215443 May 7 2023 forum.kerbalspaceprogram.com-00000.warc -rw-r--r-- 1 deck deck 20403921555 May 7 2023 forum.kerbalspaceprogram.com-00001.warc -rw-r--r-- 1 deck deck 1609457399 Jul 13 09:44 forum.kerbalspaceprogram.com-20240713061810129595.warc -rw-r--r-- 1 deck deck 10000118024 Jul 15 13:31 forum.kerbalspaceprogram.com-20240713124446675422.warc -rw-r--r-- 1 deck deck 10000038580 Jul 19 01:31 forum.kerbalspaceprogram.com-20240715163142475518.warc -rw-r--r-- 1 deck deck 625296777 Jul 19 05:27 forum.kerbalspaceprogram.com-20240719043104692502.warc -rw-r--r-- 1 deck deck 11524830 Jul 19 05:23 imgs.txt -rwxr-xr-x 1 deck deck 183 Jul 16 12:10 uri.sh -rw-r--r-- 1 deck deck 89496075 Jul 19 05:24 uri.txt Given the timetamps on the filenames, I'm managing to scrap about 5GB of text a day on weekends, and 2.5GB on working days. Again, my enemies are the Bad Gateways, due the exponential backoffs (increasing delays on every error), and the increasingly slower deduplication. I'm not planning to archive imgur (or similar services) images, at least not at this point. If the worst happens, they will still be there and we can scrap them later with the most pressuring issues tackled down. Currently, there're approximately 207.389 unique img srcs in my archived pages, being 130.641 from imgur. === == = POST EDIT = == === And, nope, I don't have (yet) the slightest idea about how to archive save personal messages. (as initially discussed here)
  11. The Space Shuttle would, probably, had had better boosters too!! https://www.astrodigital.org/space/stshorse.html
  12. By the krakens!!! As a matter of fact, looks pretty much like some crafts I see published around here... === == = POST EDIT = == === Bônus!!
  13. Hey, it's me! Mar... err... Lisias! Is @Ultimate Steve around?
  14. NEWS FROM THE FRONT Some unexpected events on Day Job© prevented me from properly writing a proper Scraping KSP for Dummies instructions, so for now I'm just uploading the configuration files so someone could start a new dataset if desired. This is the hard part, anyway, unless you had already did it before (obviously) because there're so many ways to setup the thing wrongly... The scraper is still monolithic, I gave some thought on the distributed efforts but didn't coded anything yet. I didn't spent a sec on scraping your MY personal messages neither. Suggestions are welcome. https://github.com/net-lisias-ksp/KSP-Forum-Preservation-Project Dig your way on the files, there're good information in scripts and code too. DO NOT try to accelerate the scraping, currently the thing is doing 1 page/sec, with exponential backoff to up 5 minutes. This is the best compromise I reached to allow the thing to run 24x7 - from Friday night to Saturday Midday you can walk from being a bit more aggressive, but I concluded the earnings weren't good enough to worth reworking the scraper to do differently on this time window. Trying to make things faster for you will make Forum slower for everybody, including you. Please mind the rest of the users. I did a small modification on pywb to allow it to create non compressed WARC files while scraping, as I'm using BTRFS and, surprisingly, the compression ratio is even better using it. Fixes and updates on the documentation will be applied here before a pull request is made to the upstream - any help would be welcome too. https://github.com/Lisias/pywb/tree/dev/lisias Currently, I had archived 284.447 responses (pages et all) from Forum. For the sake of curiosity, IA have 2.164.265 - but it includes imgur images and the older URI scheme too, so the Forum pages are duplicated. There're instructions on my repo above to download the IA's CDX and extract this information from it. -rw-r--r-- 1 deck deck 41G May 7 2023 forum.kerbalspaceprogram.com-00000.warc -rw-r--r-- 1 deck deck 20G May 7 2023 forum.kerbalspaceprogram.com-00001.warc -rw-r--r-- 1 deck deck 1.5G Jul 13 09:44 forum.kerbalspaceprogram.com-20240713061810129595.warc -rw-r--r-- 1 deck deck 9.4G Jul 15 13:31 forum.kerbalspaceprogram.com-20240713124446675422.warc -rw-r--r-- 1 deck deck 4.0G Jul 17 02:27 forum.kerbalspaceprogram.com-20240715163142475518.warc -rw-r--r-- 1 deck deck 802M Jul 16 12:36 ia.forum-kerbalspaceprogram-com.cdx -rw-r--r-- 1 deck deck 277M Jul 16 12:50 ia.uri.txt -rwxr-xr-x 1 deck deck 183 Jul 16 12:10 uri.sh -rw-r--r-- 1 deck deck 32M Jul 17 03:57 uri.txt I'm using BTRFS with zstd:15 compression for this job (too high compression is not advisable for normal use!), and the results are similar at worst with the gzip solution, with the benefit you will not need to recompress the WARC file after scraping, and grepping the WARC files are way more convenient: (130)(deck@steamdeck archive)$ sudo compsize * Processed 9 files, 615225 regular extents (615225 refs), 1 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 12% 9.6G 75G 75G none 100% 531M 531M 531M zstd 12% 9.0G 74G 74G You can change the compression (and level) for a specific file on BTRFS as follows: (deck@steamdeck archive)$ btrfs property set forum.kerbalspaceprogram.com-20240715163142475518.warc compression zstd:15 (deck@steamdeck archive)$ btrfs filesystem defrag -czstd forum.kerbalspaceprogram.com-20240715163142475518.warc But, usually, the best way is to remount the volume before doing the job, and then unmounting and mounting again to reset the settings. sudo mount -o remount,compress=zstd:15 /run/media/deck/MIRROR0/ And, yes, it's compress on the /etc/fstab and compression on the btrfs properties. Go figure it out... The memory consuption for pywb and redis are pretty low, 97M and 170M at this moment. But the scrapy process is eating 2.2G from the host machine - I don't know if this is due the machine having memory to spare and being currently dedicated to the job, but right now this is not a concern. And you can still play some games on the Deck while it does the scraping!!!
  15. More than they already are now? I mean.... Starline stranded on ISS, SpaceX blowing up payloads, Chinese surveillance bases on Cuba, some interesting events on the Gate of Tears, and some other yet more interesting ones in Eastern Europe... Am I missing something?
  16. There's no more convenient time to apply a fix for a critical bug than NOW. I lost another sleeping night due the same error that had bit me last time, that are fixed for almost an year in all the machines but one - that belongs to a client that it's problematic since the first day and, so, someone decided that a scheduled stop to update the damned machine would make us look bad. Well, I believe that we are already looking pretty bad now - for something that it's not happening anywhere else on our stack for 11 months - and that are triggered by some pretty low standard service from a 3rd party we are tied to. Bugs are not a problem. Unfixed bugs are a problem. Undeployed fixed bugs are a huge problem - it's very, very tricky to write a report about something that are happening regularly just because someone doesn't approved scheduling a one hour pause on the service (as it's the minimal billable time window, the update itself takes 45 seconds. On a bad day). I lost my sleep, I'm losing my temper, I'm running out of patience. I'll try to play something relaxing - going back to bed is out of question by now.
  17. I came here essentially once a month- when I reboot the machine and have to login again. The rest of the time I essentially left the browser on this site 24x7
  18. Oh, by the Krakens, thank you very much for it!
  19. Working on it. Once I manage to publish the pywb archive, the next step will be a search engine. Interesting enough, this last step will be the easier - I already have a FTP search engine project for retro-computing working (on a bunch of raspnberry pis!!), and if we dig enough, I'm absolutely sure we will find even better solutions nowadays (mine was a novelty 5 years ago). Discord was an experiment that gone bad. Reddit is less worst, but the site's format is not the best for what we need. I agree, this Forum is the best format. Orbiter-Forum is running with 240USD/month, if we accept the last round of donations as a source this information. I think this Forum, right now, would need something more due the larger workload. What we would really need, assuming this Forum will be decommissioned, would be a Federated model with many voluntary servers running under some kind of distributed operating system. Boy, I miss the times in which Plan9 could be something... This is where I think things would not be so bad. If T2 decides to complain, it would be because they want to do something with the IP - what means that Forum will be alive. We need to keep in focus that we are not working to replace the Forum, we are working to guarantee content preservation and to have a lifeboat available if the ship sinks. It's still perfectly possible that we could be just overreacting, and nothing (still more) bad are going to happen and Forum will be available for a long time. IMHO, if we are going the extra mile and setup a Forum to be used in the unfortunate (and, at this time, hypothetical) absence of this one, we should consider going Open Source the most we can to keep the costs down. I agree that closed/licensed solutions are way more polished, but a non-profit community that will rely on donations (at best) and/or sponsoring (more probably) need to keep the costs down. Voluntary work is cheaper than licensed Software. I think we need to look around and see what are the current alternatives - but something we may be sure: it will not be exactly like this Forum for sure. The problem I see is that the distributed model initially envisioned for the Internet was murdered and buried by commercial interests. The ideal solution would be distributed computing, with many, many, really many small servers volunteered by many, many, really many individual contributors. We are having this problem on Forum exactly due the monolithic nature of the solution (that matches the commercial interest of the owner). This "business model" is unsuited for a non-profit community effort. Granted, I'm unaware of any other widely adopted alternative. I doubt we could go WERC on this one. Sponsorship, IMHO, is going to be the best chance. But how to gather sponsor on a project those existence depends of the failure of this Forum? "Here, we are asking for some donations to keep this new Forum - but it will not be used, unless the main one goes down..." Companies sponsor things for a reason: they want some visibility in exchange, "look at us, we are sponsoring this!". They will not get this counterpart unless the thing goes live for good, aiming to replace this Forum - something that, to the best of my knowledge, is not the aim of all this effort.
  20. I have the tool working since early Saturday, and the think works. Setting it up is a bit of a pain in the SAS, but IMHO worth the pain. I'm gradually building up instructions here: https://github.com/net-lisias-ksp/KSP-Forum-Preservation-Project, and, unsurprisingly forked the pywb project to publish some small fixes I did (or are still doing) here: https://github.com/Lisias/pywb/tree/dev/lisias . There's an additional benefit of my approach (I gone the hard way for a reason - on the UNIX way!!!): the scraper script can be customized for distributed processing - there're about 450K pages around here, we set up a pool of handful and trusted collaboratours, we will be able to keep the mirror updated with relatively low efforts from our site (2 people doing half the job at the same time is way faster) and less load on Forum (as it would be way better than the 2 people doing all the job for themselves). Disclaimer Monday night, after Working Hours, I will further update the github repository with the instructions to fire up the scraping infrastructure. Then, hopefully, we can start discussing how to go multiprocessing with the thing. === == = POST EDIT = == === I think that the best way to distribute the files will be by torrent. Torrent files can be updated, by the way, and so one single torrent will be enough for everybody. Since to serve the files using the WARC.gz format will be probably the best option, the host serving the mirror can also help to distribute the files via torrent - but keep your quotas in check!
  21. Some pearl from the distant past...
  22. Bystanders? Yes, they do. But in the face? No way. Fake attempts aim to the chest and belly, where a good bullet proof vest would take the hit, or in the arms arms or legs where the wound would not be fatal. This one was real. "Interesting" things are going to happen in the next months. Fasten your seat belts.
  23. Announce! New release 2024.07.13.0 for the TweakScale Companion ÜberPaket with everything (and the kitchen's sink) included for the lazy installers !! Updates the Companions: Firespitter to 1.3.0.2 Frameworks to 0.4.0.4 See the project's main page to details. Your attention please Completely remove all the previous contents in the GameData/TweakScaleCompanion directory, or you will trigger some FATALities on TweakScale's Sanity Checks! This thingy needs TweakScale v2.4.7 or superior to work Download here or in the OP. Also available on CurseForge and SpaceDock.
  24. Added or updated: CTTP Community Terrain Texture Pack DART D.A.R.T. (Double Asteroid Redirection Test) Range Challenge GEP Grannus Expansion Pack JFA JebFarAway (?) JNSQ JNSQ (Je Ne Sais Quio) Planet Pack KSRSS Kerbal Size Real Solar System SVE Stock Visual Enhancements SVT Stock Visual Terrain Thanks for @OhioBob and @D4RKN3R! https://github.com/TweakScale/Companion/blob/master/Database/Abbreviations.csv
×
×
  • Create New...