calabus2 Posted September 4 Share Posted September 4 Posting this from another reply I made to a different post: I never understood the want or need for a forum backup. The only outcry from a few folks about the forum going away was the loss of mod content. Why? Every bit of that content is hosted or duplicated elsewhere like Github or a few other mod sites. Reddit has subs dedicated to KSP modding among other things. If modders want an outlet for their work there are already options available to them. It doesn't need to be this or some other forum-like site. Reddit is free, less hassle making a sub as apposed to a new website. I looked at the active user count here and on the KSP main Reddit sub. The counts were pretty much equal. It certainly isn't a stretch to say that if this forum callaped tomorrow, folks would simply utilize KSP resources elsewhere. I see no need for a new forum. Quote Link to comment Share on other sites More sharing options...
Lisias Posted September 5 Share Posted September 5 (edited) On 9/4/2024 at 6:20 PM, calabus2 said: I never understood the want or need for a forum backup. The only outcry from a few folks about the forum going away was the loss of mod content. Why? Every bit of that content is hosted or duplicated elsewhere like Github or a few other mod sites. Reddit has subs dedicated to KSP modding among other things. Historical and legal purposes. The code is only part of the solution, why the code was written that way is yet more important than the code, because this knowledge will allow to make a better code without breaking whatever is already published. And we have a lot of IP transfer happening here. Without the legal proof that the IP was transferred to the new maintainer, you can bet your booster than half the add'ons published around here will be considered piracy. End users only see 20% of the total efforts spent on developing software. Forum has the other 80%. On 9/4/2024 at 6:20 PM, calabus2 said: I looked at the active user count here and on the KSP main Reddit sub. The counts were pretty much equal. It certainly isn't a stretch to say that if this forum callaped tomorrow, folks would simply utilize KSP resources elsewhere. I see no need for a new forum. If this forum collapses tomorrow, the Mod Scene will sink into a nietzschean dystopia where the end users, as usual, will be in the receiving end. Make no mistake, the only thing preventing end users from being abused are the licenses we use around here - but these licenses demands legal evidences to be valid, and some of these evidences are here on Forum, and not on Reddit et all. === == = POST EDIT = == == Of course, I didn't mentioned (and I should had did it) the importance of Forum as a tool to gather people and maintain a Community. The long running threads (like what did you do today) is a nearly unique feature on this Forum. You need a past to acquire the sense of belonging. People became friends here, and friendship is forged over memories. Edited October 3 by Lisias Typos, tyops, tyops everywhere!!! Quote Link to comment Share on other sites More sharing options...
Lisias Posted September 9 Share Posted September 9 (edited) NEWS FROM THE FRONT Scrapping is progressing slowly, but consistently. I didn't scraped the whole week, spent most of the time working on progress reports and data managing, but still did some new work. The reports I wrote tells me that: We have 1.456.167 URLs recorded images, posts, redirects, you name it. as long is hosted by Forum or its AWS's basket. We have 98.32420180964789% of the Forum's topics recorded at least once (using the bizeehdee's dataset as reference) We have 36.66643980126591% of the Forum's profiles recorded at least once (ditto) I had consolidated the week's result into a "partial" warc file and I will update the IA's torrent with it - if anything happens, at least the job already done is preserved. As it's being usual, early next moth the data will be properly consolidated and republished. I will update this post when the IA's torrent is ready. Ready: https://archive.org/details/KSP-Forum-Preservation-Project Edited September 12 by Lisias URLs from AWS are *not* being accounted. Quote Link to comment Share on other sites More sharing options...
Lisias Posted September 24 Share Posted September 24 NEWS FROM THE FRONT We are not dead yet. Torrent was updated. Quote Link to comment Share on other sites More sharing options...
KSP2 Alumni Dakota Posted September 29 KSP2 Alumni Share Posted September 29 If there's anything I can do to help this initiative, please let me know. Quote Link to comment Share on other sites More sharing options...
Lisias Posted October 3 Share Posted October 3 (edited) NEWS FROM THE FRONT Release for the 2024-09 is online, the AI torrent was updated. All partial warc 202409* files were replaced by the proper, consolidated and sanitized ones. ALL_URLS.csv was updated accordingly. I do not intend to do partials anymore. It's 4 times the workload of a monthly update And I had already coded the tools I need, and they are stable by now - no need to probe them all the time, not even as a testbed as there's no need for so many tests anymore. It expands my "surface of attack" I had noticed that things get gradually harsher for me on every update on the IA archive. I don't think it's healthy to speculate about this. Forum's Archive is more or less feature complete. All known topics up to October 1st were scrapped at least once, but it still miss about 60% of the profiles. From now on, the efforts will focus on: Scrapping missing profiles; Scrapping new topics; update previously scrapped topics when needed (low priority). New revisits warc kind, preserving the recorded… well… revisits. revisits are not needed for playback, but will play a role now that topics and profiles should be updated regularly. revisits and redirects from 202407 and 202408 were reworked (and replaced). Better sanitizing processes were applied on them. https://archive.org/details/KSP-Forum-Preservation-Project There's a missing file, movies, that I didn't uploaded yet. I don't know what in hell Forum is hosting buried deep somewhere here (didn't bored to find the reference post yet!), but there're 16 (SIXTEEN!!) Gigabytes of videos on this file (what by Kraken's sake are you uploading here, guys?? ). This is going to hurt... And it's the reason the torrent wasn't updated yesterday, I left the thing uploading and gone to bed and... well... obviously Murphy noticed it and screwed the upload lefting 5 or 6% to finish. I'm uploading everything again now, but the movies. I will update the torrent with the movies as soon the current uploads are merged into the torrent (Internet Archive takes some time to do it). Please check the torrent link by night - everything but that pesky movies are already online. [edit: It's there already - finally!!] Additionally... I'm not willing to beat a dead horse, but the problems I depicted on are starting to bite, and it's one of the reasons I didn't published (or coded), yet, some code about collaborative scrapping. I need to find a secure way to distribute the jobs to whoever would be willing to help me with the scrappings. See below. What lead me to the next problem: on every single release of the scrappings, something changes around here making my life harsher. It happened in 202407, it happened on 202408 and lately on 20240901* (the first September partial). And it happened in less than 12 hours after the publishing, you can easily correlate it with the graphs I'm keeping about the Forum's health during the process. So, now, we have a new problem to tackle down before distributing the work: protecting the supporters, preventing them from being hit the way I was (being in the bad side of CloudFlare will make your life significantly harsher on any other site handled by CloudFlare). It would not bother me if things would be improving around here, but they aren't: At least, for me. So whatever it was done that hindered my efforts on archiving the site (that would be done by this time otherwise), it's not improving the current situation (kindly assuming this is the reason these measures were taken). And, yes, @Dakota, this is the reason I didn't cooked a way for collaborative archiving (yet). As I published some source code or updated the torrent, something changed around here and made the task harder, as well caused me some drawbacks on my connections with CloudFlare's shielded sites. There were a day that I almost didn't managed to visit half the sites I use to attend... Anyone helping me directly will suffer the same reprisals unless we keep the data private, but - by then - why doing it this way at first place? In a way or another, I managed to carry on the task until now: I have, essentially, all topics up to 226017 were scrapped at least once. I'm working on reports to check if all pages from each topic were archived correctly But only about 38% of the profiles were archived, I intent to focus on this on October. Edited October 6 by Lisias Internet Archive torrent fully updated. Quote Link to comment Share on other sites More sharing options...
Lisias Posted October 14 Share Posted October 14 (edited) NEWS FROM THE FRONT Internet Archive was hacked last 9th, and some of their services is currently down at this moment. Adding offense to the insult, they also suffered some DDoS attacks (two, as it appears) since then. But some of their services are back, so they are not dead yet. I suggest everybody that uses that service to check https://haveibeenpwned.com/ . Given the nature of the hack, it's not impossible (besides improbable) that my assets there would be compromised. Well, I signed that files for a reason. First thing I will do as soon as I have access again to my IA account is to download and verify everything. Just in case. Edited October 14 by Lisias Better link. Quote Link to comment Share on other sites More sharing options...
Lisias Posted October 29 Share Posted October 29 NEWS FROM THE FRONT Forum was down from 2024-10-15 to 2024-10-28 (today). I had took a "sabbatical" this month's first week, so I had very little to publish but I did it anyway. Check https://github.com/net-lisias-ksp/KSP-Forum-Preservation-Project for the download links - since Internet Archive is still on Read/Only mode, I couldn't update its torrent, but I mirrored the data on another file sharing service. Links on the repo's README. Quote Link to comment Share on other sites More sharing options...
Lisias Posted November 7 Share Posted November 7 Transferring the discussion from another thread: 13 hours ago, Bobbejans said: how many gigabytes/terabytes does all of the forum use? if you can say that Forum's raw data (in WARC files) is currently "costing" me about 160G of uncompressed data. Check the links on the README of the preservation project's repo for links to download them - but keep in mind that it's still work in progress. Keep in mind: it's just data hosted by this Forum (posts, threads, profiles, forum hosted images, some uploaded files by privileged users). Absolutely no downloads for add'ons are present, as well images hosted from 3rd parties as imgur or discord. Quote Link to comment Share on other sites More sharing options...
Bobbejans Posted November 7 Share Posted November 7 ok thanks that is al i wanted to know hope this dosen't shut down Quote Link to comment Share on other sites More sharing options...
Lisias Posted November 20 Share Posted November 20 Transfering this discussion from another thread 1 hour ago, PDCWolf said: Yeah, if the new owner is some VCA who knows nothing about KSP, they could absolutely come here and say "sorry guys, forum is not in our plans, join us at Discord.gg/* see you there". So every day I'm leaning more and more towards the fact that we need a functional mirror of the forum, or to migrate to the subreddit. Sadly without the DB that mirror thing is not happening, and obviously we aren't getting the original DB ever, so all we can have from this forum in case someone decides to pull the plug, is what Lisias and others have managed to back up (mostly everything) in static, plain-ish text. And these are just the things we're slowly realizing as the franchise gets yet another new owner... I wonder what are the things going on behind the scenes about what to keep and what to throw away. Even with the DB, it would be illegal to mirror the Forum without explicit authorization from the new owners. And not only due the IP itself, but due the copyrights of the posts themselves - the poster are still the owner and the copyright holder of every post on this Forum, and they had granted a non revocable , perpetual and transferable right to the Forum to do whatever they want with the posts (like publishing a book). Since it's unfeasible - to tell you the true, plain impossible since some posters deleted the account and the post is now "abandonware" - to get permission from every poster of this forum, we would need to obtain such permission from the new owner and, by this time, why would they do it? Forum outside their control is a business liability. Additionally, under the cold letter of the law, our best chances of content survival is to rely on Fair Use - what would be possible on this case if we go Internet Archive style, i.e., storing the http requests themselves so prevent the creation of a derivative, where the Fair Use is more strict. Anyone (but the IP Owner) publishing a derivative, i.e., anything new using the content, will be in copyright infringement, Anyway, a wall of text with this rationale is in: Quote Link to comment Share on other sites More sharing options...
PDCWolf Posted November 20 Share Posted November 20 2 hours ago, Lisias said: Transfering this discussion from another thread Even with the DB, it would be illegal to mirror the Forum without explicit authorization from the new owners. And not only due the IP itself, but due the copyrights of the posts themselves - the poster are still the owner and the copyright holder of every post on this Forum, and they had granted a non revocable , perpetual and transferable right to the Forum to do whatever they want with the posts (like publishing a book). Since it's unfeasible - to tell you the true, plain impossible since some posters deleted the account and the post is now "abandonware" - to get permission from every poster of this forum, we would need to obtain such permission from the new owner and, by this time, why would they do it? Forum outside their control is a business liability. Additionally, under the cold letter of the law, our best chances of content survival is to rely on Fair Use - what would be possible on this case if we go Internet Archive style, i.e., storing the http requests themselves so prevent the creation of a derivative, where the Fair Use is more strict. Anyone (but the IP Owner) publishing a derivative, i.e., anything new using the content, will be in copyright infringement, Anyway, a wall of text with this rationale is in: I mean, I know the legal side perfectly... I was one of the first to mention it when this discussion came up that I gave nobody but t2 the right to publicly host my posts from this forum. Still, my post on that other thread was not about so much archival, as it was for ensured continued existence of this forum. Archiving the knowledge is one thing, keeping it up in case the new owners decide to take it away is another thing. Quote Link to comment Share on other sites More sharing options...
Lisias Posted November 20 Share Posted November 20 (edited) 1 hour ago, PDCWolf said: I mean, I know the legal side perfectly... I was one of the first to mention it when this discussion came up that I gave nobody but t2 the right to publicly host my posts from this forum. I forgot, sorry. But there're more people around here that may not be aware of that, being the reason I am sometimes repetitive - to be sure that not only my direct interlocutor is aware of the situation. 1 hour ago, PDCWolf said: Still, my post on that other thread was not about so much archival, as it was for ensured continued existence of this forum. Archiving the knowledge is one thing, keeping it up in case the new owners decide to take it away is another thing. For personal archival, anything goes. You are allowed to keep personal backups of data, no matter what they say - the whole DMCA drama was created in the 2000s exactly because "they" (big copyright holders) didn't managed to overturn this law and, so, had to criminalize decrypting data so they could have an edge on personal backups. My whole efforts on doing it the "hard way" is exactly to allow publishing this data in the case this Forum goes titties up. There're some thoughts about it here: https://github.com/net-lisias-ksp/KSP-Forum-Preservation-Project/issues/14#issuecomment-2445076588 The most interesting, IMHO, would be a service to be used on documentation and links on 3rd parties sites (like SpaceDock), and this service would automatically http 302 temporary redirect to the Archive (or one of the mirrors, I don't want a single point of failure) if Forum is down, otherwise would http 302 to forum itself. Edited November 20 by Lisias Entertaining grammars made slightely less entertaining... Quote Link to comment Share on other sites More sharing options...
Lisias Posted November 24 Share Posted November 24 (edited) I found something weird on the Forum's content. I found these two URLS on my "ALL" report this month (not meaning they weren't there before, I just noticed them today): https://forum.kerbalspaceprogram.com/%7B___base_url___%7D/index.php?/profile/128696-killashley/ https://forum.kerbalspaceprogram.com/%7B___base_url___%7D/index.php?/profile/42312-alexsheff/ Note the "%7B___base_url___%7D" substring, that unencoded gives us "{___base_url___}". Obviously, it's a typo somewhere in the code (believe me, I'm expert on typos! ). Almost surely is a missing "$" after the opening curly braces. Curious about the issue, and knowing that this kind of issue reproduce like rabbits I coded a quick report for all the occurrences on the current (and WIP) WARCs , but there're too many to list them here, so https://github.com/net-lisias-ksp/KSP-Forum-Preservation-Project/issues/15 The earliest thread with the problem is 278, and the biggest id is 209425. Follow some of them: https://forum.kerbalspaceprogram.com/topic/278-this-is-a-topic-for-all-them-crazy-dutchbelgian-people/ (2011) https://forum.kerbalspaceprogram.com/topic/181547-181-1-please-fork-me-kopernicus-kittopiatech/ (2016) https://forum.kerbalspaceprogram.com/topic/209425-ksp1-computer-buildingbuying-megathread/ (2013!!) Curiously, the thread ID 209425 is way older than 181547. NEWS FROM THE FRONT Internet Archive is fully functional again! So... Release for the 2024-10 is finally online into the Internet Archive torrent. https://archive.org/details/KSP-Forum-Preservation-Project Edited November 24 by Lisias brute force post merge Quote Link to comment Share on other sites More sharing options...
Fizzlebop Smith Posted November 24 Share Posted November 24 2 hours ago, Lisias said: I found something weird on the Forum's content. I found these two URLS on my "ALL" report this month (not meaning they weren't there before, I just noticed them today): https://forum.kerbalspaceprogram.com/%7B___base_url___%7D/index.php?/profile/128696-killashley/ https://forum.kerbalspaceprogram.com/%7B___base_url___%7D/index.php?/profile/42312-alexsheff/ Note the "%7B___base_url___%7D" substring, that unencoded gives us "{___base_url___}". Obviously, it's a typo somewhere in the code (believe me, I'm expert on typos! ). Almost surely is a missing "$" after the opening curly braces. Curious about the issue, and knowing that this kind of issue reproduce like rabbits I coded a quick report for all the occurrences on the current (and WIP) WARCs , but there're too many to list them here, so https://github.com/net-lisias-ksp/KSP-Forum-Preservation-Project/issues/15 The earliest thread with the problem is 278, and the biggest id is 209425. Follow some of them: https://forum.kerbalspaceprogram.com/topic/278-this-is-a-topic-for-all-them-crazy-dutchbelgian-people/ (2011) https://forum.kerbalspaceprogram.com/topic/181547-181-1-please-fork-me-kopernicus-kittopiatech/ (2016) https://forum.kerbalspaceprogram.com/topic/209425-ksp1-computer-buildingbuying-megathread/ (2013!!) Curiously, the thread ID 209425 is way older than 181547. NEWS FROM THE FRONT Internet Archive is fully functional again! So... Release for the 2024-10 is finally online into the Internet Archive torrent. https://archive.org/details/KSP-Forum-Preservation-Project Apologies, but i am rather ignorant in most of this stuff. I still read along with this for posterity sake. Should anything untoward happen, I would like to be able to still troubleshoot issues with my modded install. Could you explain the significance of the URL appearances? I went to the indicated threads and am not technically knowledgeable enough to immediately spot something amiss. Quote Link to comment Share on other sites More sharing options...
Lisias Posted November 24 Share Posted November 24 (edited) 11 hours ago, Fizzlebop Smith said: Apologies, but i am rather ignorant in most of this stuff. I still read along with this for posterity sake. Easily fixable. See below! 11 hours ago, Fizzlebop Smith said: Should anything untoward happen, I would like to be able to still troubleshoot issues with my modded install. Unsure if anything SHOULD be made about. It's possible that the root cause is already fixed, but only by inspecting the real date of every affected page (what will demand a bit more of code from my side) to be certain. What I know for sure is that there're occurrences in 2011, 2013 and 2016, and that there's currently a maximum of 5450 hits (unaccurate, I didn't sanitized the November data, so there're some double hits, and it's not impossible that a page could have more than one occurrence, and I'm only counting one per page). All but 4 hrefs are related to link to a profile, being the reason nobody detected it before I think. These 4 occurrences, on the other hand, are pretty ugly (really messed up html code) but is not related to topics or posts, so are harmless for our needs. In a universe of 1.78M of urls (until this moment), it's less than a drop on the ocean statistically. So, definitively, it's not a MUST be done. But something CAN be done, if needed. 11 hours ago, Fizzlebop Smith said: Could you explain the significance of the URL appearances? This is a simple html page: <html> <head> <title>Hello World</title> </head> <body> <h1>Hello World!!</h1> <p>Hi Bob! <a href="https://for-all-mankind.fandom.com/wiki/Hi_Bob">Click Me!</a></p> </body> </html> This will render an web page titled "Hello World" (the string that appears on the browser's tab), and a blank page with "Hello Word!!" in big letters, and with "Hi Bob!" em normal letters below, followed by "Click me" that it's a link that once clicked will open the page in fandom. Now, I want to create a program that would generate a page like that, but using different values for the salute and for the link - but instead of creating manually one page for each possible entry, I use a template engine with a... well... template for the page and a database with the values and then iterate these values on the template engine, that spits the html code to NGINX, that so sends the page to the user's browser. So, imagine a database with the following dataset: SALUTE URL Hi Bob! https://for-all-mankind.fandom.com/wiki/Hi_Bob Hello there children! https://www.urbandictionary.com/define.php?term=hello%20there%20children And the following template: <html> <head> <title>Hello World</title> </head> <body> <h1>Hello World!!</h1> <p>${SALUTE} <a href="${URL}">Click Me!</a></p> </body> </html> And them I write a "Forum" that reads all the lines of the database one by one, and apply the data on the template, generating a different page by replacing ${SALUTE} and $URL with the values found on the database, selected by something in the user's browser address bar. And now I can change the page layout on the template, and all I need to do is to rerun the template engine again to regenerate the new pages. It's, literally, the programmatically equivalent of a Word Processor's Find & Replace. Now, what happens if I do a typo, and instead of typing "${URL}", I do just "{URL}"? Well... <html> <head> <title>Hello World</title> </head> <body> <h1>Hello World!!</h1> <p>Hello there children! <a href="{URL}">Click Me!</a></p> </body> </html> Because the template was looking for "${URL}", it didn't replaced the text on the href, and then when the user clicks on the link on the rendered page, an error occurs. What happened on Forum is similar. Some template was (is?) wrong. Instead of "<a href=${___base_url___"/${user_profile}>${user_name}</a>", someone made a mistake and typed "<a href={___base_url___"/${user_profile}>${user_name}</a>". 11 hours ago, Fizzlebop Smith said: I went to the indicated threads and am not technically knowledgeable enough to immediately spot something amiss. Go to one of the indicated topics (or threads - I'm using the code's terminology here), and ask the Browser to see the page's Source. On the source, look for "___base_url___". You will notice that this weird string should be replaced by "https://forum.kerbalspaceprogram.com" to make things work. It's a typo on the template used to generate the content, it's missing a "$" before the "{" on the source code , so instead of replacing the text with the desired value, it's handled like content and spit it ipsi literis into the html. Edited November 24 by Lisias Clicked "Save" too soon, Quote Link to comment Share on other sites More sharing options...
Lisias Posted December 2 Share Posted December 2 NEWS FROM THE FRONT Release for the 2024-11 is online. https://archive.org/details/KSP-Forum-Preservation-Project Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.