Jump to content

Lisias

Members
  • Posts

    6,928
  • Joined

  • Last visited

Reputation

8,976 Excellent

Profile Information

  • About me
    Boldly crashing what no Kerbal has crashed before!
  • Location
    Universe ! Virgo ! Milkway ! OrionArm ! SolarSystem ! Earth ! America ! SouthAmerica ! Brazil ! SãoPaulo ! Capital ! Home ! LivingRoom ! MyChair
  • Interests
    I felt a great disturbance in the Force, as if millions of lines of code cried out in Null Reference Exceptions and were suddenly flooding the KSP.log...

Recent Profile Visitors

37,389 profile views
  1. Perhaps we could send them that torrent I'm building? That would save you guys some serious bandwidth... These <insert your favorite non-forum-compliant expletive here> AI companies are literally using our money to make money - for them.
  2. If a server crashes in the weekend, and no user is connected, it still makes a 502 Bad Gateway?
  3. I'm trying to explain that for some "authors" around here for months, but all I get are mocking and disdain. Trademarks are a thing. Some people here are going to learn this the hard way.
  4. Thursday afternoon (GMT-3), I had noticed Forum was getting yet more 502 Bad Gateways than "normal". I noticed that at that time, in the few times I managed to load the front page, the number of guests were about 6.7K - way more than usual, that it's about 1.3K +/- 1.5k at peak times. That was almost a DDoS attack, almost 5 times the usual guests... Sooo, yeah... I think the problem is not exactly Forum, but the extra load of people trying to scrap Forum by themselves. T2 probably cut down some costs in their infrastructure, but given that right now Forum is allright with 1.2K guests, I think that there was some slack on that infrastructure at first place.
  5. I will left this note here for future reference, in the case someone do a search for the problem. The problem happened exactly after MM lists the loaded DLLs, and before KSP start to build the part database: [EXC 14:06:08.985] InvalidOperationException: Collection was modified; enumeration operation may not execute. System.ThrowHelper.ThrowInvalidOperationException (System.ExceptionResource resource) (at <9577ac7a62ef43179789031239ba8798>:0) System.Collections.Generic.List`1+Enumerator[T].MoveNextRare () (at <9577ac7a62ef43179789031239ba8798>:0) System.Collections.Generic.List`1+Enumerator[T].MoveNext () (at <9577ac7a62ef43179789031239ba8798>:0) KSPBurst.KSPBurst.FlushMessages () (at <c8b5fe2baff6415cb73ea658bdb44b4c>:0) KSPBurst.KSPBurst+<BurstCompile>d__20.MoveNext () (at <c8b5fe2baff6415cb73ea658bdb44b4c>:0) UnityEngine.SetupCoroutine.InvokeMoveNext (System.Collections.IEnumerator enumerator, System.IntPtr returnValueAddress) (at <12e76cd50cc64cf UnityEngine.DebugLogHandler:LogException(Exception, Object) ModuleManager.UnityLogHandle.InterceptLogHandler:LogException(Exception, Object) UnityEngine.Debug:CallOverridenDebugHandler(Exception, Object) [LOG 14:06:09.164] PartLoader: Creating part database It looks as something Burst Compiler does not knows how to cope with, and not a bug on the US2's assembly. This should be reported for the Burst Compiler guys so they can find out if there's something they can do about, as adding US2 into a ignore list.
  6. Companies care about money. Communities are important while they help securing the income, and it's really simple like that. We are an asset, and we need to learn how to cope with it. This is not necessarily bad, besides uncomfortable - being a sentient asset have some advantages that can be beneficial to us if we learn to play the cards right. Just remember: Companies are not people, they are made of people. Some are good, some are evil, most of them are somewhere in the middle. We need to reach the good ones. Standard INC procedure. You know, LLCs have the upmost interest on knowing exactly what had gone wrong when a big project fails because the Company's owner ultimately have their skin on the game (pun not intended), there's no way out - they will pay for the failure directly from their pockets. PLC and INC companies work different internally, the money's owners are an abstract mass of investors that are not directly dealing with the failure, only paying for it indirectly. So, whoever is in charge, have a special interest in hiding the problems that could tarnish their images and try to elect escape goats instead - and screw the aftermath, who cares if this is gong to be bad on the long run, these dudes only care about their image and how it affects their careers, and they have no problem on pursuing such career on the competition.
  7. NEWS FROM THE FRONT I just updated the scrapping tool. Now it's scrapping images and styles too, and exactly as I intended: html pages on a collection, images on a second one, anything related to CSS and styling on a third one. The anti dupe code is also working fine, preventing the scrapy tool from visiting the same page twice on a session - exactly what had screwed me July 13th, when I first tried it. I'm doing proper logging now, too. pywb allows us to dynamically merge the collections and serve them on a single front-end, as they were just one. Pretty convenient. The rationale for this decision is simple besides not exactly straightforward: images almost never changes, as well styles. Scrapping them separately will save a bit of Forum's resources and scrapping time while updating the collections, as the images will rarely (if ever) change. Same for styles. So we can just ignore them while refreshing the archive contents. There's an additional benefit on keeping textual info separated from images and styles: whoever owns the IP, owns the images and styles, but not the textual contents. Posts on Forum are almost unrestrictedly and perpetually licensed to the Forum's owner, but they still belong to the original poster. So whoever owns the IP, at least theoretically, have no legal grounds to take down these content - assuming the worst scenario, where this Forum goes titties up and a new owner decides to take down the Forum mirrors, they will be able to do so only for the material they own - images and styling. And these we can easily replace later, forging a new WARC file pretending being that, now lost, content. Ok, ok, on Real Life™ things doesn't work exactly like that. But it costs very little (if any) to take some preventive measures, no? Scrapy tells me that it INFO: Crawled 316615 pages (at 58 pages/min), scraped 31847497 items (at 6128 items/min) at this moment, but the last time the WARC file was touched was Jul 20 20:19. So, apparently, the sum of the older WARC files from 2023 and the new ones I'm building now have all the information since the last time I restarted the tool. Please note that the tool understands as scraped item anything it crawls into, disregarding being a dupe or not, or if it was fetched or ignored. Right now, I'm trying to find my way on the Internet Archive to host the torrent file I will build with the material I already have. I hope it could be a mutable torrent, otherwise I will need to find some other way to host these damned huge files - I intend to have it updated at least once a month, being the reason it needs to be a mutable torrent. People not willing to host anything can also find this stunt useful, as there're tools to extract the WARC contents and built a dump in their hard disks as you would get by using a 'dumb' crawler as HTTrack. So, really, there will be no need for everybody and the kitchen's sink to hit Forum all the time to scraping it. Finally, once I have the torrent hosted somewhere, I will start to find a way to cook a way to scrap the side cooperatively, so many people can share the burden, making thing way faster and saving Forum's resources - once people realize they don't need to scrap things themselves every time, I expect the load on Forum to be way easier. This is what I currently have: -r--r--r-- 1 deck deck 41G May 7 2023 forum.kerbalspaceprogram.com-00000.warc -r--r--r-- 1 deck deck 20G May 7 2023 forum.kerbalspaceprogram.com-00001.warc -r--r--r-- 1 deck deck 24G Jul 25 15:32 forum.kerbalspaceprogram.com-202407.warc I expect future WARC files to be smaller and smaller. At least, but not at last: === == = POST EDIT = == === For the sake of curiosity, some stats I'm fetching from the archives I have at this moment: forum.kerbalspaceprogram.com-00000.warc 318421 Content-Type: application/http; msgtype=request 318421 Content-Type: application/http; msgtype=response 6 Content-Type: application/json 2 Content-Type: application/json 8 Content-Type: application/json 2 Content-Type: application/json;charset=utf-8 2 Content-Type: application/x-www-form-urlencoded 122909 Content-Type: ;charset=UTF-8 24 Content-Type: text/html; charset=UTF-8 195482 Content-Type: text/html;charset=UTF-8 forum.kerbalspaceprogram.com-00001.warc 121990 Content-Type: application/http; msgtype=request 121990 Content-Type: application/http; msgtype=response 40841 Content-Type: ;charset=UTF-8 81149 Content-Type: text/html;charset=UTF-8 forum.kerbalspaceprogram.com-202407.warc 553096 Content-Type: application/http; msgtype=request 553096 Content-Type: application/http; msgtype=response 27 Content-Type: application/json;charset=UTF-8 1 Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;charset=UTF-8 5 Content-Type: application/x-unknown;charset=UTF-8 8 Content-Type: application/zip;charset=UTF-8 342190 Content-Type: ;charset=UTF-8 1497 Content-Type: text/html 7 Content-Type: text/html; charset=UTF-8 208622 Content-Type: text/html;charset=UTF-8 3 Content-Type: text/plain; charset=UTF-8 2 Content-Type: text/plain;charset=UTF-8 93 Content-Type: text/xml;charset=UTF-8 28 Content-Type: video/mp4;charset=UTF-8 3 Content-Type: video/quicktime;charset=UTF-8 As we can see, some videos leaked into the WARC files... I'm working on it. === == = POST2 EDIT = == === This one gave me a run for the money. Forum serves media (like movies) using an interface called /applications/core/interface/file/cfield.php where the file is send on a query. But the crawler was looking only at the path, taking advantage of the fact that Forum doesn't obfuscate the artefacts - what made my life simpler while routing html, image and style files to their respective collections. Until now. Since the crawler didn't parsed the query, it thought it was a content file and stored the freaking videos togheter the html files, screwing me because: These videos are Forum's IP, and so shoving them together the textual content would erode my chances on a hypothetical future copyright takedown attempt. It royally screwed the compression ratio of the WARC file!! As a matter of fact, I thought it was weird that the 2023 WARC files were compressing at 44 to 1, while mine "only" 22 to 1 - they should compress at similar rations, as they would be similar content. This is the reason I made that stats above, already intuiting some already compressed content had leaked on the stream - but I was thinking on some images or even a zip file, not a whole freaking movie file! Anyway, I salvaged the files, removing the image/* content into the image collection, removing about 2G of binary data from the text stream. I'm double checking everything and recompressing the data files, I will pursue torrenting them tomorrow. Cheers!
  8. He didn't said anything, but I think he had said enough. There're still people very, very interested on keeping mouths shut. What this means, is something still to be seen.
  9. Damn, you ninja'ed me!!! I was going to post this music too! The reason? https://www.youtube.com/live/WLROdWOKO_s?si=Y80vvkQXij5EVEeQ (a live stream of all episodes of Space:1999!!!)
  10. 2/10. You almost got me, T800! Cogito, ergo sum. 2+2=4
  11. Yes, it was arbitrary, but no, it's not easily reversed as explained below:
  12. But not the identifier: SpaceDock: https://spacedock.info/mod/3635/ColdJ Hot Air Balloon NetKan: https://github.com/KSP-CKAN/NetKAN/blob/66fbdae6d2d2c2771a14359a82b68f2ffaa15fa8/NetKAN/HotAirBalloon.netkan#L2 spec_version: v1.34 identifier: HotAirBalloon $kref: '#/ckan/spacedock/3635' license: CC-BY-SA-4.0 <...> And this is what ColdJ is talking about. I think it's time to let @ColdJ decide what to do. We, clearly, aren't converging into a constructive discussion.
  13. Yes, I have. Please don't project on me. ColdJ pushed on SpaceDock an add'on called ColdJ Hot Air Balloon. JOT decided to rename the identifier to HotAirBalloon, disregaring ColdJ's opinion on the subject. Apparently a possible adoption in the future was mentioned somewhere. ColdJ is protesting, because apparently they want their mod being identified as ColdJHotAirBalloon - it's how it was being named on SpaceDock, for starters. So you are implying that the author's standpoint is meaningless and can be ignored?
  14. Being the reason, so, that one should not change it without the consent from the author!
  15. Should I understand that CKAN is too much work for the current staff, as the current members of this select group are not being able to carry their duties correctly? Had you consider asking for help from the Community, opening vacancies to be fulfilled by members of the Community? Perhaps some of the Forum's Moderators can help - how about asking for help from them?
×
×
  • Create New...