Jump to content

KSP Forums Archival Options.


Recommended Posts

Posting this from another reply I made to a different post:

I never understood the want or need for a forum backup. The only outcry from a few folks about the forum going away was the loss of mod content. Why? Every bit of that content is hosted or duplicated elsewhere like Github or a few other mod sites. Reddit has subs dedicated to KSP modding among other things. If modders want an outlet for their work there are already options available to them. It doesn't need to be this or some other forum-like site. Reddit is free, less hassle making a sub as apposed to a new website. 

I looked at the active user count here and on the KSP main Reddit sub. The counts were pretty much equal. It certainly isn't a stretch to say that if this forum callaped tomorrow, folks would simply utilize KSP resources elsewhere. I see no need for a new forum.

Link to comment
Share on other sites

On 9/4/2024 at 6:20 PM, calabus2 said:

I never understood the want or need for a forum backup. The only outcry from a few folks about the forum going away was the loss of mod content. Why? Every bit of that content is hosted or duplicated elsewhere like Github or a few other mod sites. Reddit has subs dedicated to KSP modding among other things.

Historical and legal purposes.

The code is only part of the solution, why the code was written that way is yet more important than the code, because this knowledge will allow to make a better code without breaking whatever is already published.

And we have a lot of IP transfer happening here. Without the legal proof that the IP was transferred to the new maintainer, you can bet your booster than half the add'ons published around here will be considered piracy.

End users only see 20% of the total efforts spent on developing software. Forum has the other 80%.

 

On 9/4/2024 at 6:20 PM, calabus2 said:

I looked at the active user count here and on the KSP main Reddit sub. The counts were pretty much equal. It certainly isn't a stretch to say that if this forum callaped tomorrow, folks would simply utilize KSP resources elsewhere. I see no need for a new forum.

If this forum collapses tomorrow, the Mod Scene will sink into a nietzschean dystopia where the end users, as usual, will be in the receiving end.

Make no mistake, the only thing preventing end users from being abused are the licenses we use around here - but these licenses demands legal evidences to be valid, and some of these evidences are here on Forum, and not on Reddit et all.

 

=== == = POST EDIT = == ==

Of course, I didn't mentioned (and I should had did it) the importance of Forum as a tool to gather people and maintain a Community.

The long running threads (like what did you do today) is a nearly unique feature on this Forum.

You need a past to acquire the sense of belonging.

People became friends here, and friendship is forged over memories.

 

Edited by Lisias
Typos, tyops, tyops everywhere!!!
Link to comment
Share on other sites

NEWS FROM THE FRONT

Scrapping is progressing slowly, but consistently. I didn't scraped the whole week, spent most of the time working on progress reports and data managing, but still did some new work.

The reports I wrote tells me that:

  • We have 1.456.167 URLs recorded
    • images, posts, redirects, you name it.
      • as long is hosted by Forum or its AWS's basket.
  • We have 98.32420180964789% of the Forum's topics recorded at least once (using the bizeehdee's dataset as reference)
  • We have 36.66643980126591% of the Forum's profiles recorded at least once (ditto)

I had consolidated the week's result into a "partial" warc file and I will update the IA's torrent with it - if anything happens, at least the job already done is preserved. As it's being usual, early next moth the data will be properly consolidated and republished.

I will update this post when the IA's torrent is ready.

Ready: https://archive.org/details/KSP-Forum-Preservation-Project

Edited by Lisias
URLs from AWS are *not* being accounted.
Link to comment
Share on other sites

  • 3 weeks later...

NEWS FROM THE FRONT

Release for the 2024-09 is online, the AI torrent was updated.

  • All partial warc 202409* files were replaced by the proper, consolidated and sanitized ones. ALL_URLS.csv was updated accordingly.
    • I do not intend to do partials anymore.
      • It's 4 times the workload of a monthly update
        • And I had already coded the tools I need, and they are stable by now - no need to probe them all the time, not even as a testbed as there's no need for so many tests anymore.
      • It expands my "surface of attack"
        • I had noticed that things get gradually harsher for me on every update on the IA archive.
        • I don't think it's healthy to speculate about this.
  • Forum's Archive is more or less feature complete.
    • All known topics up to October 1st were scrapped at least once, but it still miss about 60% of the profiles.
    • From now on, the efforts will focus on:
      • Scrapping missing profiles;
      • Scrapping new topics;
      • update previously scrapped topics when needed (low priority).
  • New revisits warc kind, preserving the recorded… well… revisits. :)
    • revisits are not needed for playback, but will play a role now that topics and profiles should be updated regularly.
  • revisits and redirects from 202407 and 202408 were reworked (and replaced).
    • Better sanitizing processes were applied on them.


https://archive.org/details/KSP-Forum-Preservation-Project

 

There's a missing file, movies, that I didn't uploaded yet. I don't know what in hell Forum is hosting buried deep somewhere here (didn't bored to find the reference post yet!), but there're 16 (SIXTEEN!!) Gigabytes of videos on this file (what by Kraken's sake are you uploading here, guys?? :D).

 

This is going to hurt... And it's the reason the torrent wasn't updated yesterday, I left the thing uploading and gone to bed and... well... obviously Murphy noticed it and screwed the upload lefting 5 or 6% to finish. :sticktongue:

 

I'm uploading everything again now, but the movies. I will update the torrent with the movies as soon the current uploads are merged into the torrent (Internet Archive takes some time to do it). Please check the torrent link by night - everything but that pesky movies are already online.

[edit: It's there already - finally!!]

 

Additionally... I'm not willing to beat a dead horse, but the problems I depicted on

are starting to bite, and it's one of the reasons I didn't published (or coded), yet, some code about collaborative scrapping. I need to find a secure way to distribute the jobs to whoever would be willing to help me with the scrappings. See below.

 

What lead me to the next problem: on every single release of the scrappings, something changes around here making my life harsher. It happened in 202407, it happened on 202408 and lately on 20240901* (the first September partial). And it happened in less than 12 hours after the publishing, you can easily correlate it with the graphs I'm keeping about the Forum's health during the process.

 

So, now, we have a new problem to tackle down before distributing the work: protecting the supporters, preventing them from being hit the way I was (being in the bad side of CloudFlare will make your life significantly harsher on any other site handled by CloudFlare).

 

It would not bother me if things would be improving around here, but they aren't:

At least, for me.

 

So whatever it was done that hindered my efforts on archiving the site (that would be done by this time otherwise), it's not improving the current situation (kindly assuming this is the reason these measures were taken).

 

And, yes, @Dakota, this is the reason I didn't cooked a way for collaborative archiving (yet). As I published some source code or updated the torrent, something changed around here and made the task harder, as well caused me some drawbacks on my connections with CloudFlare's shielded sites. There were a day that I almost didn't managed to visit half the sites I use to attend...

 

Anyone helping me directly will suffer the same reprisals unless we keep the data private, but - by then - why doing it this way at first place?

 

In a way or another, I managed to carry on the task until now:

 

  • I have, essentially, all topics up to 226017 were scrapped at least once.
    • I'm working on reports to check if all pages from each topic were archived correctly
  • But only about 38% of the profiles were archived, I intent to focus on this on October.

 

Keep Walking

 

Edited by Lisias
Internet Archive torrent fully updated.
Link to comment
Share on other sites

  • 2 weeks later...

NEWS FROM THE FRONT

Internet Archive was hacked last 9th, and some of their services is currently down at this moment. Adding offense to the insult, they also suffered some DDoS attacks (two, as it appears) since then.

But some of their services are back, so they are not dead yet.

I suggest everybody that uses that service to check https://haveibeenpwned.com/ .

Given the nature of the hack, it's not impossible (besides improbable) that my assets there would be compromised. Well, I signed that files for a reason. First thing I will do as soon as I have access again to my IA account is to download and verify everything.

Just in case.

Edited by Lisias
Better link.
Link to comment
Share on other sites

  • 2 weeks later...

NEWS FROM THE FRONT

Forum was down from 2024-10-15 to 2024-10-28 (today).

I had took a "sabbatical" this month's first week, so I had very little to publish but I did it anyway.

Check https://github.com/net-lisias-ksp/KSP-Forum-Preservation-Project  for the download links - since Internet Archive is still on Read/Only mode, I couldn't update its torrent, but I mirrored the data on another file sharing service. Links on the repo's README.

 

Link to comment
Share on other sites

  • 2 weeks later...

Transferring the discussion from another thread:

13 hours ago, Bobbejans said:

how many gigabytes/terabytes does all of the forum use? if you can say that

Forum's raw data (in WARC files) is currently "costing" me about 160G of uncompressed data. Check the links on the README of the preservation project's repo for links to download them - but keep in mind that it's still work in progress.

Keep in mind: it's just data hosted by this Forum (posts, threads, profiles, forum hosted images, some uploaded files by privileged users). Absolutely no downloads for add'ons are present, as well images hosted from 3rd parties as imgur or discord.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...