Jump to content

KSP Forums Archival Options.


Recommended Posts

Posting this from another reply I made to a different post:

I never understood the want or need for a forum backup. The only outcry from a few folks about the forum going away was the loss of mod content. Why? Every bit of that content is hosted or duplicated elsewhere like Github or a few other mod sites. Reddit has subs dedicated to KSP modding among other things. If modders want an outlet for their work there are already options available to them. It doesn't need to be this or some other forum-like site. Reddit is free, less hassle making a sub as apposed to a new website. 

I looked at the active user count here and on the KSP main Reddit sub. The counts were pretty much equal. It certainly isn't a stretch to say that if this forum callaped tomorrow, folks would simply utilize KSP resources elsewhere. I see no need for a new forum.

Link to comment
Share on other sites

On 9/4/2024 at 6:20 PM, calabus2 said:

I never understood the want or need for a forum backup. The only outcry from a few folks about the forum going away was the loss of mod content. Why? Every bit of that content is hosted or duplicated elsewhere like Github or a few other mod sites. Reddit has subs dedicated to KSP modding among other things.

Historical and legal purposes.

The code is only part of the solution, why the code was written that way is yet more important than the code, because this knowledge will allow to make a better code without breaking whatever is already published.

And we have a lot of IP transfer happening here. Without the legal proof that the IP was transferred to the new maintainer, you can bet your booster than half the add'ons published around here will be considered piracy.

End users only see 20% of the total efforts spent on developing software. Forum has the other 80%.

 

On 9/4/2024 at 6:20 PM, calabus2 said:

I looked at the active user count here and on the KSP main Reddit sub. The counts were pretty much equal. It certainly isn't a stretch to say that if this forum callaped tomorrow, folks would simply utilize KSP resources elsewhere. I see no need for a new forum.

If this forum collapses tomorrow, the Mod Scene will sink into a nietzschean dystopia where the end users, as usual, will be in the receiving end.

Make no mistake, the only thing preventing end users from being abused are the licenses we use around here - but these licenses demands legal evidences to be valid, and some of these evidences are here on Forum, and not on Reddit et all.

 

=== == = POST EDIT = == ==

Of course, I didn't mentioned (and I should had did it) the importance of Forum as a tool to gather people and maintain a Community.

The long running threads (like what did you do today) is a nearly unique feature on this Forum.

You need a past to acquire the sense of belonging.

People became friends here, and friendship is forged over memories.

 

Edited by Lisias
Typos, tyops, tyops everywhere!!!
Link to comment
Share on other sites

NEWS FROM THE FRONT

Scrapping is progressing slowly, but consistently. I didn't scraped the whole week, spent most of the time working on progress reports and data managing, but still did some new work.

The reports I wrote tells me that:

  • We have 1.456.167 URLs recorded
    • images, posts, redirects, you name it.
      • as long is hosted by Forum or its AWS's basket.
  • We have 98.32420180964789% of the Forum's topics recorded at least once (using the bizeehdee's dataset as reference)
  • We have 36.66643980126591% of the Forum's profiles recorded at least once (ditto)

I had consolidated the week's result into a "partial" warc file and I will update the IA's torrent with it - if anything happens, at least the job already done is preserved. As it's being usual, early next moth the data will be properly consolidated and republished.

I will update this post when the IA's torrent is ready.

Ready: https://archive.org/details/KSP-Forum-Preservation-Project

Edited by Lisias
URLs from AWS are *not* being accounted.
Link to comment
Share on other sites

  • 3 weeks later...

NEWS FROM THE FRONT

Release for the 2024-09 is online, the AI torrent was updated.

  • All partial warc 202409* files were replaced by the proper, consolidated and sanitized ones. ALL_URLS.csv was updated accordingly.
    • I do not intend to do partials anymore.
      • It's 4 times the workload of a monthly update
        • And I had already coded the tools I need, and they are stable by now - no need to probe them all the time, not even as a testbed as there's no need for so many tests anymore.
      • It expands my "surface of attack"
        • I had noticed that things get gradually harsher for me on every update on the IA archive.
        • I don't think it's healthy to speculate about this.
  • Forum's Archive is more or less feature complete.
    • All known topics up to October 1st were scrapped at least once, but it still miss about 60% of the profiles.
    • From now on, the efforts will focus on:
      • Scrapping missing profiles;
      • Scrapping new topics;
      • update previously scrapped topics when needed (low priority).
  • New revisits warc kind, preserving the recorded… well… revisits. :)
    • revisits are not needed for playback, but will play a role now that topics and profiles should be updated regularly.
  • revisits and redirects from 202407 and 202408 were reworked (and replaced).
    • Better sanitizing processes were applied on them.


https://archive.org/details/KSP-Forum-Preservation-Project

 

There's a missing file, movies, that I didn't uploaded yet. I don't know what in hell Forum is hosting buried deep somewhere here (didn't bored to find the reference post yet!), but there're 16 (SIXTEEN!!) Gigabytes of videos on this file (what by Kraken's sake are you uploading here, guys?? :D).

 

This is going to hurt... And it's the reason the torrent wasn't updated yesterday, I left the thing uploading and gone to bed and... well... obviously Murphy noticed it and screwed the upload lefting 5 or 6% to finish. :sticktongue:

 

I'm uploading everything again now, but the movies. I will update the torrent with the movies as soon the current uploads are merged into the torrent (Internet Archive takes some time to do it). Please check the torrent link by night - everything but that pesky movies are already online.

[edit: It's there already - finally!!]

 

Additionally... I'm not willing to beat a dead horse, but the problems I depicted on

are starting to bite, and it's one of the reasons I didn't published (or coded), yet, some code about collaborative scrapping. I need to find a secure way to distribute the jobs to whoever would be willing to help me with the scrappings. See below.

 

What lead me to the next problem: on every single release of the scrappings, something changes around here making my life harsher. It happened in 202407, it happened on 202408 and lately on 20240901* (the first September partial). And it happened in less than 12 hours after the publishing, you can easily correlate it with the graphs I'm keeping about the Forum's health during the process.

 

So, now, we have a new problem to tackle down before distributing the work: protecting the supporters, preventing them from being hit the way I was (being in the bad side of CloudFlare will make your life significantly harsher on any other site handled by CloudFlare).

 

It would not bother me if things would be improving around here, but they aren't:

At least, for me.

 

So whatever it was done that hindered my efforts on archiving the site (that would be done by this time otherwise), it's not improving the current situation (kindly assuming this is the reason these measures were taken).

 

And, yes, @Dakota, this is the reason I didn't cooked a way for collaborative archiving (yet). As I published some source code or updated the torrent, something changed around here and made the task harder, as well caused me some drawbacks on my connections with CloudFlare's shielded sites. There were a day that I almost didn't managed to visit half the sites I use to attend...

 

Anyone helping me directly will suffer the same reprisals unless we keep the data private, but - by then - why doing it this way at first place?

 

In a way or another, I managed to carry on the task until now:

 

  • I have, essentially, all topics up to 226017 were scrapped at least once.
    • I'm working on reports to check if all pages from each topic were archived correctly
  • But only about 38% of the profiles were archived, I intent to focus on this on October.

 

Keep Walking

 

Edited by Lisias
Internet Archive torrent fully updated.
Link to comment
Share on other sites

  • 2 weeks later...

NEWS FROM THE FRONT

Internet Archive was hacked last 9th, and some of their services is currently down at this moment. Adding offense to the insult, they also suffered some DDoS attacks (two, as it appears) since then.

But some of their services are back, so they are not dead yet.

I suggest everybody that uses that service to check https://haveibeenpwned.com/ .

Given the nature of the hack, it's not impossible (besides improbable) that my assets there would be compromised. Well, I signed that files for a reason. First thing I will do as soon as I have access again to my IA account is to download and verify everything.

Just in case.

Edited by Lisias
Better link.
Link to comment
Share on other sites

  • 2 weeks later...

NEWS FROM THE FRONT

Forum was down from 2024-10-15 to 2024-10-28 (today).

I had took a "sabbatical" this month's first week, so I had very little to publish but I did it anyway.

Check https://github.com/net-lisias-ksp/KSP-Forum-Preservation-Project  for the download links - since Internet Archive is still on Read/Only mode, I couldn't update its torrent, but I mirrored the data on another file sharing service. Links on the repo's README.

 

Link to comment
Share on other sites

  • 2 weeks later...

Transferring the discussion from another thread:

13 hours ago, Bobbejans said:

how many gigabytes/terabytes does all of the forum use? if you can say that

Forum's raw data (in WARC files) is currently "costing" me about 160G of uncompressed data. Check the links on the README of the preservation project's repo for links to download them - but keep in mind that it's still work in progress.

Keep in mind: it's just data hosted by this Forum (posts, threads, profiles, forum hosted images, some uploaded files by privileged users). Absolutely no downloads for add'ons are present, as well images hosted from 3rd parties as imgur or discord.

Link to comment
Share on other sites

  • 2 weeks later...

Transfering this discussion from another thread

 

1 hour ago, PDCWolf said:

Yeah, if the new owner is some VCA who knows nothing about KSP, they could absolutely come here and say "sorry guys, forum is not in our plans, join us at Discord.gg/* see you there". So every day I'm leaning more and more towards the fact that we need a functional mirror of the forum, or to migrate to the subreddit. Sadly without the DB that mirror thing is not happening, and obviously we aren't getting the original DB ever, so all we can have from this forum in case someone decides to pull the plug, is what Lisias and others have managed to back up (mostly everything) in static, plain-ish text.

And these are just the things we're slowly realizing as the franchise gets yet another new owner... I wonder what are the things going on behind the scenes about what to keep and what to throw away.

Even with the DB, it would be illegal to mirror the Forum without explicit authorization from the new owners. And not only due the IP itself, but due the copyrights of the posts themselves - the poster are still the owner and the copyright holder of every post on this Forum, and they had granted a non revocable , perpetual and transferable right to the Forum to do whatever they want with the posts (like publishing a book).

Since it's unfeasible - to tell you the true, plain impossible since some posters deleted the account and the post is now "abandonware" - to get permission from every poster of this forum, we would need to obtain such permission from the new owner and, by this time, why would they do it? Forum outside their control is a business liability.

Additionally, under the cold letter of the law, our best chances of content survival is to rely on Fair Use - what would be possible on this case if we go Internet Archive style, i.e., storing the http requests themselves so prevent the creation of a derivative, where the Fair Use is more strict.

Anyone (but the IP Owner) publishing a derivative, i.e., anything new using the content, will be in copyright infringement,

Anyway, a wall of text with this rationale is in:

 

Link to comment
Share on other sites

2 hours ago, Lisias said:

Transfering this discussion from another thread

 

Even with the DB, it would be illegal to mirror the Forum without explicit authorization from the new owners. And not only due the IP itself, but due the copyrights of the posts themselves - the poster are still the owner and the copyright holder of every post on this Forum, and they had granted a non revocable , perpetual and transferable right to the Forum to do whatever they want with the posts (like publishing a book).

Since it's unfeasible - to tell you the true, plain impossible since some posters deleted the account and the post is now "abandonware" - to get permission from every poster of this forum, we would need to obtain such permission from the new owner and, by this time, why would they do it? Forum outside their control is a business liability.

Additionally, under the cold letter of the law, our best chances of content survival is to rely on Fair Use - what would be possible on this case if we go Internet Archive style, i.e., storing the http requests themselves so prevent the creation of a derivative, where the Fair Use is more strict.

Anyone (but the IP Owner) publishing a derivative, i.e., anything new using the content, will be in copyright infringement,

Anyway, a wall of text with this rationale is in:

 

I mean, I know the legal side perfectly... I was one of the first to mention it when this discussion came up that I gave nobody but t2 the right to publicly host my posts from this forum. Still, my post on that other thread was not about so much archival, as it was for ensured continued existence of this forum. Archiving the knowledge is one thing, keeping it up in case the new owners decide to take it away is another thing.

Link to comment
Share on other sites

1 hour ago, PDCWolf said:

I mean, I know the legal side perfectly... I was one of the first to mention it when this discussion came up that I gave nobody but t2 the right to publicly host my posts from this forum.

I forgot, sorry.

But there're more people around here that may not be aware of that, being the reason I am sometimes repetitive - to be sure that not only my direct interlocutor is aware of the situation.

 

1 hour ago, PDCWolf said:

Still, my post on that other thread was not about so much archival, as it was for ensured continued existence of this forum. Archiving the knowledge is one thing, keeping it up in case the new owners decide to take it away is another thing.

For personal archival, anything goes. You are allowed to keep personal backups of data, no matter what they say - the whole DMCA drama was created in the 2000s exactly because "they" (big copyright holders) didn't managed to overturn this law and, so, had to criminalize decrypting data so they could have an edge on personal backups.

My whole efforts on doing it the "hard way" is exactly to allow publishing this data in the case this Forum goes titties up.

There're some thoughts about it here: https://github.com/net-lisias-ksp/KSP-Forum-Preservation-Project/issues/14#issuecomment-2445076588

The most interesting, IMHO, would be a service to be used on documentation and links on 3rd parties sites (like SpaceDock), and this service would automatically http 302 temporary redirect to the Archive (or one of the mirrors, I don't want a single point of failure) if Forum is down, otherwise would http 302 to forum itself.

Edited by Lisias
Entertaining grammars made slightely less entertaining...
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...