-
Posts
7,364 -
Joined
-
Last visited
Content Type
Profiles
Forums
Developer Articles
KSP2 Release Notes
Everything posted by Lisias
-
Yep, but most of them will be working, studying or commuting - so most of these awaken people would not be browsing Forum neither on a working day. Interesting enough, and again on a working day, the less worst time to hit the Forum's servers would be indeed about 16:00 GMT-5 (EST), because (and assuming my axiom about most people hitting the services between 19:00 and 22:00 local time is correct) at that time, the timezones on the "playingtime" would be on some of the lesser population density areas in the World. So Gargamel is right about the time the IA scrapers hitting the Forum being the less worst, he apparently made a mistake on trying to explain why. Being awake is not enough, people need to have some time to burn in order to hit Forum's servers with significant load. I think you got it right about most people being awake, but I think you made a mistake on assuming that these people would be available to hitting Forum while awake. They have a relatively tiny time window to hit here in volume on their waking hours.
-
Because the contractor's liability is limited to 100K! So if the contractor screws up the job, you will lose 900K USD because their liability is only 100K. That's the very purpose of the LLC! Now it's up to you to take the risk or not. I'm losing you. I found this: https://www.doola.com/blog/llc-liability-protection/ Can you explain to me if this link is correct? Because if it is, it's exactly how Sociedade Limitada (LTDA) works here on Brazil - but with a catch, frauds and crimes (mostly fiscal) "breaks" the limitation and can reach your personal assets, and the link above appears to say the same for LLC.
-
Limited Liability protects the owner's personal assets from the company, granted. But the company needs to have a thingy called share capital ("capital social" in pt-br), being it the max liability the LLC is responsible for. hint: the company's owner needs to foot the share capital - so, if the worst happens, they just lose that money. Now, it's up to the clients to decide to do business to such LLC, right? Would you contract a LLC for a job of 1M USD when the share capital of that company is 100K USD? Well, I would not - neither anyone that know how to do business. So, and still, the LLC owners have their skin on the game - they need to foot the share capital of the LLC, and they are risking lose that money if the company goes kaput.
-
Taking the hint, and elaborating over it (I have some time to burn right now... ). Most people are students or workers, so they are not available for playing between 07:00 and 19:00 (I'm considering commuting) in their local time. Assuming most people enjoys sleeping from 22:00 to 06:00 local time, we have a time window for playing from 19:00 to 22:00 local time. So, assuming late afternoon as being 16:00 GMT-5 (on Summer), that playing time window I mentioned from 19 to 22 local time would be at GMT-8 to GMT+11. Essentially the western parts of USA and Canada, Alaska (Magenta on the map below): https://www.timeanddate.com/time/map/ And these are not exactly the areas with the most population density in the world! 16:00 EST is around 12:00 on Greenwich (UK, Western Europe and Africa) and Dawn to Morning up to India and western China. So, yeah, it's exactly the opposite - most people on the Word are awake, but studying or working (or preparing to) and, so, not hitting this Forum with their browsers.
-
Perhaps we could send them that torrent I'm building? That would save you guys some serious bandwidth... These <insert your favorite non-forum-compliant expletive here> AI companies are literally using our money to make money - for them.
-
I'm trying to explain that for some "authors" around here for months, but all I get are mocking and disdain. Trademarks are a thing. Some people here are going to learn this the hard way.
-
Thursday afternoon (GMT-3), I had noticed Forum was getting yet more 502 Bad Gateways than "normal". I noticed that at that time, in the few times I managed to load the front page, the number of guests were about 6.7K - way more than usual, that it's about 1.3K +/- 1.5k at peak times. That was almost a DDoS attack, almost 5 times the usual guests... Sooo, yeah... I think the problem is not exactly Forum, but the extra load of people trying to scrap Forum by themselves. T2 probably cut down some costs in their infrastructure, but given that right now Forum is allright with 1.2K guests, I think that there was some slack on that infrastructure at first place.
-
[1.4.3 <= KSP <= 1.12.5] KSP Recall - 0.5.0.2- 2024-0521
Lisias replied to Lisias's topic in KSP1 Mod Releases
I will left this note here for future reference, in the case someone do a search for the problem. The problem happened exactly after MM lists the loaded DLLs, and before KSP start to build the part database: [EXC 14:06:08.985] InvalidOperationException: Collection was modified; enumeration operation may not execute. System.ThrowHelper.ThrowInvalidOperationException (System.ExceptionResource resource) (at <9577ac7a62ef43179789031239ba8798>:0) System.Collections.Generic.List`1+Enumerator[T].MoveNextRare () (at <9577ac7a62ef43179789031239ba8798>:0) System.Collections.Generic.List`1+Enumerator[T].MoveNext () (at <9577ac7a62ef43179789031239ba8798>:0) KSPBurst.KSPBurst.FlushMessages () (at <c8b5fe2baff6415cb73ea658bdb44b4c>:0) KSPBurst.KSPBurst+<BurstCompile>d__20.MoveNext () (at <c8b5fe2baff6415cb73ea658bdb44b4c>:0) UnityEngine.SetupCoroutine.InvokeMoveNext (System.Collections.IEnumerator enumerator, System.IntPtr returnValueAddress) (at <12e76cd50cc64cf UnityEngine.DebugLogHandler:LogException(Exception, Object) ModuleManager.UnityLogHandle.InterceptLogHandler:LogException(Exception, Object) UnityEngine.Debug:CallOverridenDebugHandler(Exception, Object) [LOG 14:06:09.164] PartLoader: Creating part database It looks as something Burst Compiler does not knows how to cope with, and not a bug on the US2's assembly. This should be reported for the Burst Compiler guys so they can find out if there's something they can do about, as adding US2 into a ignore list.- 633 replies
-
- 1
-
- survivability
- ksp-recall
-
(and 1 more)
Tagged with:
-
Moderators note: This topic was split off from: Companies care about money. Communities are important while they help securing the income, and it's really simple like that. We are an asset, and we need to learn how to cope with it. This is not necessarily bad, besides uncomfortable - being a sentient asset have some advantages that can be beneficial to us if we learn to play the cards right. Just remember: Companies are not people, they are made of people. Some are good, some are evil, most of them are somewhere in the middle. We need to reach the good ones. Standard INC procedure. You know, LLCs have the upmost interest on knowing exactly what had gone wrong when a big project fails because the Company's owner ultimately have their skin on the game (pun not intended), there's no way out - they will pay for the failure directly from their pockets. PLC and INC companies work different internally, the money's owners are an abstract mass of investors that are not directly dealing with the failure, only paying for it indirectly. So, whoever is in charge, have a special interest in hiding the problems that could tarnish their images and try to elect escape goats instead - and screw the aftermath, who cares if this is gong to be bad on the long run, these dudes only care about their image and how it affects their careers, and they have no problem on pursuing such career on the competition.
-
NEWS FROM THE FRONT I just updated the scraping tool. Now it's scraping images and styles too, and exactly as I intended: html pages on a collection, images on a second one, anything related to CSS and styling on a third one. The anti dupe code is also working fine, preventing the scrapy tool from visiting the same page twice on a session - exactly what had screwed me July 13th, when I first tried it. I'm doing proper logging now, too. pywb allows us to dynamically merge the collections and serve them on a single front-end, as they were just one. Pretty convenient. The rationale for this decision is simple besides not exactly straightforward: images almost never changes, as well styles. Scraping them separately will save a bit of Forum's resources and scraping time while updating the collections, as the images will rarely (if ever) change. Same for styles. So we can just ignore them while refreshing the archive contents. There's an additional benefit on keeping textual info separated from images and styles: whoever owns the IP, owns the images and styles, but not the textual contents. Posts on Forum are almost unrestrictedly and perpetually licensed to the Forum's owner, but they still belong to the original poster. So whoever owns the IP, at least theoretically, have no legal grounds to take down these content - assuming the worst scenario, where this Forum goes titties up and a new owner decides to take down the Forum mirrors, they will be able to do so only for the material they own - images and styling. And these we can easily replace later, forging a new WARC file pretending being that, now lost, content. Ok, ok, on Real Life™ things doesn't work exactly like that. But it costs very little (if any) to take some preventive measures, no? Scrapy tells me that it INFO: Crawled 316615 pages (at 58 pages/min), scraped 31847497 items (at 6128 items/min) at this moment, but the last time the WARC file was touched was Jul 20 20:19. So, apparently, the sum of the older WARC files from 2023 and the new ones I'm building now have all the information since the last time I restarted the tool. Please note that the tool understands as scraped item anything it crawls into, disregarding being a dupe or not, or if it was fetched or ignored. Right now, I'm trying to find my way on the Internet Archive to host the torrent file I will build with the material I already have. I hope it could be a mutable torrent, otherwise I will need to find some other way to host these damned huge files - I intend to have it updated at least once a month, being the reason it needs to be a mutable torrent. People not willing to host anything can also find this stunt useful, as there're tools to extract the WARC contents and built a dump in their hard disks as you would get by using a 'dumb' crawler as HTTrack. So, really, there will be no need for everybody and the kitchen's sink to hit Forum all the time to scraping it. Finally, once I have the torrent hosted somewhere, I will start to find a way to cook a way to scrap the side cooperatively, so many people can share the burden, making thing way faster and saving Forum's resources - once people realize they don't need to scrap things themselves every time, I expect the load on Forum to be way easier. This is what I currently have: -r--r--r-- 1 deck deck 41G May 7 2023 forum.kerbalspaceprogram.com-00000.warc -r--r--r-- 1 deck deck 20G May 7 2023 forum.kerbalspaceprogram.com-00001.warc -r--r--r-- 1 deck deck 24G Jul 25 15:32 forum.kerbalspaceprogram.com-202407.warc I expect future WARC files to be smaller and smaller. At least, but not at last: === == = POST EDIT = == === For the sake of curiosity, some stats I'm fetching from the archives I have at this moment: forum.kerbalspaceprogram.com-00000.warc 318421 Content-Type: application/http; msgtype=request 318421 Content-Type: application/http; msgtype=response 6 Content-Type: application/json 2 Content-Type: application/json 8 Content-Type: application/json 2 Content-Type: application/json;charset=utf-8 2 Content-Type: application/x-www-form-urlencoded 122909 Content-Type: ;charset=UTF-8 24 Content-Type: text/html; charset=UTF-8 195482 Content-Type: text/html;charset=UTF-8 forum.kerbalspaceprogram.com-00001.warc 121990 Content-Type: application/http; msgtype=request 121990 Content-Type: application/http; msgtype=response 40841 Content-Type: ;charset=UTF-8 81149 Content-Type: text/html;charset=UTF-8 forum.kerbalspaceprogram.com-202407.warc 553096 Content-Type: application/http; msgtype=request 553096 Content-Type: application/http; msgtype=response 27 Content-Type: application/json;charset=UTF-8 1 Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;charset=UTF-8 5 Content-Type: application/x-unknown;charset=UTF-8 8 Content-Type: application/zip;charset=UTF-8 342190 Content-Type: ;charset=UTF-8 1497 Content-Type: text/html 7 Content-Type: text/html; charset=UTF-8 208622 Content-Type: text/html;charset=UTF-8 3 Content-Type: text/plain; charset=UTF-8 2 Content-Type: text/plain;charset=UTF-8 93 Content-Type: text/xml;charset=UTF-8 28 Content-Type: video/mp4;charset=UTF-8 3 Content-Type: video/quicktime;charset=UTF-8 As we can see, some videos leaked into the WARC files... I'm working on it. === == = POST2 EDIT = == === This one gave me a run for the money. Forum serves media (like movies) using an interface called /applications/core/interface/file/cfield.php where the file is send on a query. But the crawler was looking only at the path, taking advantage of the fact that Forum doesn't obfuscate the artefacts - what made my life simpler while routing html, image and style files to their respective collections. Until now. Since the crawler didn't parsed the query, it thought it was a content file and stored the freaking videos togheter the html files, screwing me because: These videos are Forum's IP, and so shoving them together the textual content would erode my chances on a hypothetical future copyright takedown attempt. It royally screwed the compression ratio of the WARC file!! As a matter of fact, I thought it was weird that the 2023 WARC files were compressing at 44 to 1, while mine "only" 22 to 1 - they should compress at similar rations, as they would be similar content. This is the reason I made that stats above, already intuiting some already compressed content had leaked on the stream - but I was thinking on some images or even a zip file, not a whole freaking movie file! Anyway, I salvaged the files, removing the image/* content into the image collection, removing about 2G of binary data from the text stream. I'm double checking everything and recompressing the data files, I will pursue torrenting them tomorrow. Cheers! === == = POST3 EDIT = == === Well, I blew it. I made a really, REALLY, REALLY stupid mistake on the spider, that ended up with a memory leaking that was only growing and growing without I'm being aware. Then I finally noticed the problem, tried to salvage the situation (scrapy have a telnet interface, in which you can do whatever you want - including hot code change!!) but had no memory enough available. So I tried terminating one of the proxies (the style one), as it was idle since the start, to have enough memory to work on the system and try adding a swap file on the steam deck. This would life the pressure and allow me to work to try to salvage the session, not to mention stopping grinding my m2 that probably lost half of its lifespan on this stunt... Problem... this dumbass that is typing you this post decided to create the swapfile in the most dumb way possible: dd if=/dev/zero of=/run/media/deck/MIRROR0/SWAP1 bs=1G count=8 I was lazy (and still sleepy) and decided I didn't wanted to take a calculator to see how many 4K blocks I would need to reach 8G, so I told dd to write 8 blocks of 1G each and call it a day. But by doing it, dd tried to malloc a 1G buffer on a system that were fighting for 32KB on the current swapfile. So the kernel decided to kill something, elected the SSHD (probably) and the session was finished. However, SteamOS uses that terrible excuse of INIT called SystemD, and this crap automatically kills all the processes owned by a user when it log offs, what essentially is what happens when you lose a SSH session. And then I had to restart scraping again this morning. I'm currently back at: 2024-07-27 19:15:36 [scrapy.extensions.logstats] INFO: Crawled 32474 pages (at 80 pages/min), scraped 839693 items (at 2080 items/min) There was no loss of data, the redis database is being hosted on another computer (I did some things right, after all), so the current scraping is to be sure I fetched all the pages from Forum, without leaving things behind. Oh, well... Sheet happens! I'm compressing what I have now (will add a new WARC file with whatever is being scraped now later) and will proceed with the creation of the torrent. === == = POST4 EDIT = == === It's about 90 minutes since the log above, and now I have: 2024-07-27 20:46:36 [scrapy.extensions.logstats] INFO: Crawled 39619 pages (at 73 pages/min), scraped 1025463 items (at 1924 items/min) I.e.: 992.989 items scraped, or about 11K items scraped per minute. 7145 pages crawled, or about 79 pages per minute. I know from fellow scrapers that we have about 400K pages, so unless things go faster, I still have 84.38 hours to complete the task. Oukey, this settles the matter. I will publish the torrent tomorrow, and later update it. [edit: I made the same mistake again - I gone Internet Archive way, it's way more pages!] === == = BRUTE FORCE POST MERGING = == === By all means, no apologies! This is a brainstorm, we were essentially throwing... "things" into the wall to see what sticks. Jumping the gun is only a problem when there's no one around to tell you are jumping the gun, so no harm, no foul. And it's good to know that someone is willing to go that extra mile if needed, and this message you passed with success. So, thank you for the offer! Mine too. I'm really hoping for the best - but still in alert expecting the worst. Trying hard to do not cause it! Cheers! (and sorry answering this so late, I had a hellish week...)
-
He didn't said anything, but I think he had said enough. There're still people very, very interested on keeping mouths shut. What this means, is something still to be seen.
-
totm march 2020 So what song is stuck in your head today?
Lisias replied to SmileyTRex's topic in The Lounge
Damn, you ninja'ed me!!! I was going to post this music too! The reason? https://www.youtube.com/live/WLROdWOKO_s?si=Y80vvkQXij5EVEeQ (a live stream of all episodes of Space:1999!!!) -
2/10. You almost got me, T800! Cogito, ergo sum. 2+2=4
-
But not the identifier: SpaceDock: https://spacedock.info/mod/3635/ColdJ Hot Air Balloon NetKan: https://github.com/KSP-CKAN/NetKAN/blob/66fbdae6d2d2c2771a14359a82b68f2ffaa15fa8/NetKAN/HotAirBalloon.netkan#L2 spec_version: v1.34 identifier: HotAirBalloon $kref: '#/ckan/spacedock/3635' license: CC-BY-SA-4.0 <...> And this is what ColdJ is talking about. I think it's time to let @ColdJ decide what to do. We, clearly, aren't converging into a constructive discussion.
-
Yes, I have. Please don't project on me. ColdJ pushed on SpaceDock an add'on called ColdJ Hot Air Balloon. JOT decided to rename the identifier to HotAirBalloon, disregaring ColdJ's opinion on the subject. Apparently a possible adoption in the future was mentioned somewhere. ColdJ is protesting, because apparently they want their mod being identified as ColdJHotAirBalloon - it's how it was being named on SpaceDock, for starters. So you are implying that the author's standpoint is meaningless and can be ignored?
-
Should I understand that CKAN is too much work for the current staff, as the current members of this select group are not being able to carry their duties correctly? Had you consider asking for help from the Community, opening vacancies to be fulfilled by members of the Community? Perhaps some of the Forum's Moderators can help - how about asking for help from them?
-
I considered it. But it didn't got near 44 to 1. These files will be torrent'ed and never recompressed again, so it makes sense to use the best compression possible. It will be done only once! So I made a quick test: -r--r--r-- 1 deck deck 43543215443 May 7 2023 forum.kerbalspaceprogram.com-00000.warc -r--r--r-- 1 deck deck 1427252342 May 7 2023 forum.kerbalspaceprogram.com-00000.warc.zst -r--r--r-- 1 deck deck 986003377 May 7 2023 forum.kerbalspaceprogram.com-00000.warc.lrz The commands I used was: zstd -9 --keep forum.kerbalspaceprogram.com-00000.warc lrzip -z --best --keep forum.kerbalspaceprogram.com-00000.warc The difference is 441.248.965 bytes between the compression tools, about 425M. For a file that will be downloaded, ideally, hundred of times. For someone hosting it on an AWS, that charges you about $0.09 per GB, the difference between zsd and lrz is about $0,03825 per download - for a file that are expected to be available for years. The costs pile up pretty quickly. --- -- - POST EDIT - -- --- I think I detected why it was so slow. I found two bottlenecks: I didn't reindexed the collection I was feeding, and so when the proxy was fetching again a page, it was being fetched from forum instead of the archive. reindexing the collections regularly should alleviate this problem note to myself: redis is for deduplication, where pywb decides if it will store the page or not - it does not prevent a live fetch duh... cdx(j) is for fetching pages from the archive (or not). We need both, up to date, to do the job The scrapy tool was fetching the same pages all the time, as a lot of links are targeting already crawled pages, wasting time hitting the proxy (and Forum) this one I detected after fixing the previous one. Now that I'm thinking about, these are pretty obvious mistakes - but they just occurred to me today... I fixing these problems right now, will try them and then I will update the github project. https://github.com/net-lisias-ksp/KSP-Forum-Preservation-Project
-
NEWS FROM THE FRONT I currently fetched 513.889 pages, so now we are on uncharted territories (the 2023 dump has 440.387). Don't have a clue about how much time I still need, but logic suggests we are near the end of this first phase. -r--r--r-- 1 deck deck 41G May 7 2023 forum.kerbalspaceprogram.com-00000.warc -r--r--r-- 1 deck deck 20G May 7 2023 forum.kerbalspaceprogram.com-00001.warc -r--r--r-- 1 deck deck 1.5G Jul 13 09:44 forum.kerbalspaceprogram.com-20240713061810129595.warc -r--r--r-- 1 deck deck 9.4G Jul 15 13:31 forum.kerbalspaceprogram.com-20240713124446675422.warc -r--r--r-- 1 deck deck 9.4G Jul 19 01:31 forum.kerbalspaceprogram.com-20240715163142475518.warc I also settled on the compression tool to feed the torrent, lrzip. This thing gave me 44 to 1 compression ratio on the best cases, it's amazing - but, also, extremely slow. Really, really slow - hours and hours to compress these big beasts. -r--r--r-- 1 deck deck 941M May 7 2023 forum.kerbalspaceprogram.com-00000.warc.lrz -r--r--r-- 1 deck deck 413M May 7 2023 forum.kerbalspaceprogram.com-00001.warc.lrz -r--r--r-- 1 deck deck 56M Jul 13 09:44 forum.kerbalspaceprogram.com-20240713061810129595.warc.lrz -r--r--r-- 1 deck deck 687M Jul 15 13:31 forum.kerbalspaceprogram.com-20240713124446675422.warc.lrz -r--r--r-- 1 deck deck 536M Jul 19 01:31 forum.kerbalspaceprogram.com-20240715163142475518.warc.lrz But, damn, it's a 44 to 1 compression ratio!!! --- -- - POST EDIT - -- --- I made a mistake! The 2023 WARC files wer fetched using a custom crawler. I'm going the I.A. way, so I was comparing oranges with apples. The CDX I fetched from IA have 2.164.265 pages. This is the benchmark I should be doing. So I'm apparently at 25% of the job.
-
totm march 2020 So what song is stuck in your head today?
Lisias replied to SmileyTRex's topic in The Lounge
This week was this one: But not exactly due the music, but because they are broadcasting continuously the whole series (and movie)!! https://www.youtube.com/live/dEfEhLh8jXE?si=yXXh7KNpScFn1v5b (could not embed the video due an forbidden emoji) -
C-suits came across the sea They brought us pain and misery They killed our vibes They killed our deeds They took our GAME for his own need We coded it hard We coded it well Out of their plans, we commit them hell But code reviews, make us retrocede Oh, will we ever be set free? Dashing through clouds from barren wastes Engines roaring above the plains Turning the gravity back to their well Gaming them at their own game RUDs for freedom, the push in the back Engineers and pilots and scientists, attack!!! Launch from the hills Launch for your snacks Launch from the hills Launch for your snacks Wild blue in the barren wastes Patching and crashing their game Clipping the parts and rerooting the crafts The only good savegames are stock Selling them mods and taking their posts Updating their games and screwing the logs!! Launch from the hills Launch for your snacks Launch from the hills Launch for your snacks Launch from the hills Launch for your snacks Launch from the hills Launch for your snacks Launch from the hills Launch for your snacks Launch from the hills Launch for your snacks