Kevin_kerman Posted Wednesday at 09:04 PM Share Posted Wednesday at 09:04 PM 17 minutes ago, Mr. Kerbin said: yeah sadly I kinda have to agree Same, it is really sad tbh Link to comment Share on other sites More sharing options...
Linkageless Posted Wednesday at 09:06 PM Share Posted Wednesday at 09:06 PM Well, once again that was a worrying outage, but I had to remind myself that someone, somewhere was probably working to get the cloudflare up again. I reckoned it was just another one of those bumps in the road caused by handover, and I'm glad it's out of the way now. 2 hours ago, Lisias said: Forum is alive, and that's what matters - and the day I realize that all I managed to accomplish with my backup is wasting time and money, believe me, it will be a very, very happy day in my life. I'm with you, but it is definitely still a worthwhile endeavour. 2 hours ago, Lisias said: But if we manage to reach a consensus that it's possible to preemptively publish this material without risking any undesirable consequences on the long run both for Forum and "my" Archives, I will do whatever I can to assist, including with hosting services (as long I would not be the only one - no single point of failures, please). IMHO I think it's definitely too soon. I'd like to see at least the chance of some dialogue with whoever is taking the reigns first. That doesn't mean we shouldn't all be getting on top of your torrent releases, so we can share the burden as you suggest. Link to comment Share on other sites More sharing options...
AlamoVampire Posted Wednesday at 09:24 PM Share Posted Wednesday at 09:24 PM Im going to be honest. Between the bad gateways and these outages im fully expecting this forum to not be here much longer. That outage ended my optimism. Im not leaving of course and will stay until the lights go out, but my hope of longevity is gone. 152401222025 Link to comment Share on other sites More sharing options...
Royale37 Posted Wednesday at 09:29 PM Share Posted Wednesday at 09:29 PM 4 minutes ago, AlamoVampire said: Im going to be honest. Between the bad gateways and these outages im fully expecting this forum to not be here much longer. That outage ended my optimism. Im not leaving of course and will stay until the lights go out, but my hope of longevity is gone. 152401222025 I had that outage this morning, but I haven't had any bad gateways in the last 2 weeks+ Maybe a good sign? Link to comment Share on other sites More sharing options...
Linkageless Posted Wednesday at 09:31 PM Share Posted Wednesday at 09:31 PM It's true that every outage chips away at the likelihood people will return. However, I'm now wondering how many other ways the forum could possibly break are left. (That's not an invitation to get morbidly creative!) Link to comment Share on other sites More sharing options...
AlamoVampire Posted Wednesday at 09:43 PM Share Posted Wednesday at 09:43 PM 5 minutes ago, Royale37 said: I had that outage this morning, but I haven't had any bad gateways in the last 2 weeks+ I noticed the outage last night around 1915 local to me. As for bad gateways ive had 4-5 in the last 2 weeks. The only good sign would be whomever holds the string to the Sword of Damocles to get on here and make a public statement about the future of KSP1, KSP2, and the Forum that either all 3 and done and dusted or that updates (for ksp1 anything to stabilize it as tech gets better) KSP1 will resume, that KSP2 either see the current version be completed as sold to us and have its specs become reasonable or that a new KSP2 will rise, and that our forum will continue for another 10+ years. Anything short of that leaving us to speculate as far as im concerned is the former and not the latter. Am I bitter? Yes. Is this fatalistic? No. Pragmatic. Im just going where the evidence seems to point. And right now its pointing at total systems failure in 60-90 days at best. 154301222025 Link to comment Share on other sites More sharing options...
Lisias Posted Wednesday at 10:32 PM Share Posted Wednesday at 10:32 PM (edited) 1 hour ago, Linkageless said: It's true that every outage chips away at the likelihood people will return. However, I'm now wondering how many other ways the forum could possibly break are left. (That's not an invitation to get morbidly creative!) I was going to make a joke, but then I considered that people are still recovering from a near funeral event, and decided to laugh alone... But... You have a point. I was decommissioning my Forum related services when I noticed it was back (nice thing to have such tools, I have the exact moment Forum was gone, and the exact moment it was back!), but kept them down since them. Handovers are problematic, and since we have concrete evidence that they are working to keep Forum alive, there's nothing to be lost if I stop the services until tomorrow to prevent being a hinder if anything else would be in need of tackle down. With the new infra, I will need to carefully benchmark the services again to reach an acceptable equilibrium between the need to archive the pages quickly enough and not causing trouble (what would defeat the scraping anyway). If I get out of reach around here for sometime, it's because I realized the point in which Cloudflare would kick my balls - be patient, I will be back once the ban is lifted! 55 minutes ago, AlamoVampire said: Is this fatalistic? No. Pragmatic. Im just going where the evidence seems to point. And right now its pointing at total systems failure in 60-90 days at best. Your reasoning is sound, but your axioms are insufficient to uphold it. I want to tell you a(nother) history: YahooGroups. It's an interesting history because, you see, the same Company involved on that mess is (indirectly) involved on this mess. What happened - Yahoo was failing, it was a matter of time until they would throw the towel, and managed to get a deal with Apollo Inc. The new management sanitize the Company, rationally closing up any and whatever product or service that wasn't paying their costs and screw the non financial consequences. One of the deficitary services was YahooGroups. And they announced they would be closing it at end of the month (or something like that), whoever wanted their data should worry and extract it with a tool they provided. The tool wasn't perfect, and the load of everybody and the kitchen's sink dumping their data made the target date unfeasible. Tremendous outcry, they extended the deadline to the end of the year IIRC. And then they pulled the server's plug, and that's it. I will not discus the huge loss the Internet had due this, it's out of the scope of this post - but it's necessary and sufficient to say that they did it to save some bucks, and they don't regret it. And what does this YahooGroups thingy have to do with the subject at hand? Modus Operandi. They didn't invested resources on YahooGroups at that time, other then the data dump tool. Things were breaking and not fixed, they just left the servers in God's hands, may the Lord provide and protect us. And then they pulled the plug, but the homepages of the dead services were redirected to Mortuarium placeholders. And this is how they shutdown unwanted services. What we had here on Forum, on the other hand, under no circumstances are something a Company like them would be doing is they didn't had some kind of plan (even than as unpleasant one) for it. See: They renewed Invision's license, even knowing that letting the license expire would just incur on lack of support and updates; Forum got down in September. It took 2 weeks, but they got it back; Forum was getting a lot of http 5xx errors since before, and some time after the (first) outage. They took 2 months**, but they fixed it; They screwed up the P.D. login and customer's downloads. They took 2 days**, but they fixed it. You see... On every event above, money and resources were used - this would not be happened if they were intending to ditch Forum. These guys live and breath for one thing: money. And they don't throw money on things they are going to trash. Spoiler So, when I found Forum unconscious, I promptly though on some of unexpected outage. And since I spend the day diagnosing weird problems on Day Job© services (Clients not communicating, VPN outages, Bitbucket going down, Teams failing on login - but this last is already routine, it didn't counts to much), I immediately considered some kind of outage on Cloudflare itself instead of something on Forum. Digging on the DNS, I diagnosed the problem as being Cloudflare related for sure, but since apparently they handle subdomains in a way not exactly compatible with what I was taught long time ago, I wrongly concluded it was a mishap on Cloudflare itself, and concluded a possible endemic situation, and fired up our Doom' Day Plan on Night Job© (as, being convicted of a possible endemic outage, decided to stay around to act promptly if needed). In a way or another, until someone else on Reddit called me up for a mistake on how Cloudflare works, I was 100% convict - due the reasons above - that this was something involuntary, and that Forum would be back eventually. As we can see, I was right. I'm not trying to sugar the pill - sooner or later they will do something to maximize their gains that will liquid us off. But this day is not today. Notes ** Rhetorical - I thought it would be nice to use "2" to match the other problems Edited Wednesday at 10:39 PM by Lisias Hit "Save" too soon Link to comment Share on other sites More sharing options...
Lisias Posted yesterday at 12:17 AM Share Posted yesterday at 12:17 AM (edited) I'm probably the only one able to post on Forum right now... This is what happening: A few hours ago, I could reach Forum on my Mobile network. Now I can't on my cable-tv network All of this strongly suggests it's again a DNS problem. So, let's dig into the problem: <borked dig removed, I had just did a monstrous typo on this one! Jesus Christ...> Below is the last "dig" from the last time forum was working: Quote > dig forum.kerbalspaceprogram.com ; <<>> DiG 9.20.3 <<>> forum.kerbalspaceprogram.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29233 ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;forum.kerbalspaceprogram.com. IN A ;; ANSWER SECTION: forum.kerbalspaceprogram.com. 3515 IN CNAME forum.kerbalspaceprogram.com.cdn.cloudflare.net. forum.kerbalspaceprogram.com.cdn.cloudflare.net. 300 IN A 172.64.145.232 forum.kerbalspaceprogram.com.cdn.cloudflare.net. 300 IN A 104.18.42.24 ;; Query time: 16 msec ;; SERVER: 192.168.200.1#53(192.168.200.1) (UDP) ;; WHEN: Wed Jan 22 17:49:50 -03 2025 ;; MSG SIZE rcvd: 150 It ends up that I know the Cloudflare IPs serving Forum. So I edited my /etc/hosts file : ## # Host Database # # localhost is used to configure the loopback interface # when the system is booting. Do not change this entry. ## 127.0.0.1<----->localhost 255.255.255.255>broadcasthost ::1 localhost 172.64.145.232><------>forum.kerbalspaceprogram.com And here I am!!! Not being satisfied, I did another dig on a different network: > dig forum.kerbalspaceprogram.com. ; <<>> DiG 9.16.12 <<>> forum.kerbalspaceprogram.com. ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 52563 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;forum.kerbalspaceprogram.com. IN A ;; ANSWER SECTION: forum.kerbalspaceprogram.com. 300 IN CNAME sp-forum-elb-2033387385.us-west-2.elb.amazonaws.com. ;; AUTHORITY SECTION: us-west-2.elb.amazonaws.com. 8 IN SOA ns-332.awsdns-41.com. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 60 ;; Query time: 9 msec ;; SERVER: 10.0.0.2#53(10.0.0.2) ;; WHEN: Thu Jan 23 00:10:45 UTC 2025 ;; MSG SIZE rcvd: 197 And voilà... The new owners are moving Forum to AWS. In a few hours the new DNS entry will propagate and fix everything. --- POST EDIT --- Now forum.kerbalspaceprogram.com is being resolved on my cable-tv network too: > dig forum.kerbalspaceprogram.com ; <<>> DiG 9.20.3 <<>> forum.kerbalspaceprogram.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 15297 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;forum.kerbalspaceprogram.com. IN A ;; ANSWER SECTION: forum.kerbalspaceprogram.com. 3600 IN CNAME sp-forum-elb-2033387385.us-west-2.elb.amazonaws.com. ;; AUTHORITY SECTION: us-west-2.elb.amazonaws.com. 60 IN SOA ns-332.awsdns-41.com. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 60 ;; Query time: 124 msec ;; SERVER: 192.168.200.1#53(192.168.200.1) (UDP) ;; WHEN: Wed Jan 22 23:39:33 -03 2025 ;; MSG SIZE rcvd: 197 But sp-forum-elb-2033387385.us-west-2.elb.amazonaws.com still doesn't - not on my home network, neither on my remote one, so this is the reason Forum is out of reach to the World. > dig sp-forum-elb-2033387385.us-west-2.elb.amazonaws.com ; <<>> DiG 9.20.3 <<>> sp-forum-elb-2033387385.us-west-2.elb.amazonaws.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 52641 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1280 ;; QUESTION SECTION: ;sp-forum-elb-2033387385.us-west-2.elb.amazonaws.com. IN A ;; Query time: 0 msec ;; SERVER: 192.168.200.1#53(192.168.200.1) (UDP) ;; WHEN: Wed Jan 22 23:39:55 -03 2025 ;; MSG SIZE rcvd: 80 Oukey, lets wait a couple hours days more. Edited 6 hours ago by Lisias POST EDIT. Link to comment Share on other sites More sharing options...
CrazyMagicPickle Posted yesterday at 01:31 AM Share Posted yesterday at 01:31 AM Quote It ends up that I know the Cloudflare IPs serving Forum. So I edited my /etc/hosts file : How did you happen to know the Cloudflare forum IP? Link to comment Share on other sites More sharing options...
MarkedZero Posted 20 hours ago Share Posted 20 hours ago Too bad, I have just get my access to the forum a MINUTE ago before this post. On 1/23/2025 at 8:17 AM, Lisias said: I'm probably the only one able to post on Forum right now... This is what happening: A few hours ago, I could reach Forum on my Mobile network. Now I can't on my cable-tv network All of this strongly suggests it's again a DNS problem. So, let's dig: > dig kerbal.spaceprogram.com ; <<>> DiG 9.20.3 <<>> kerbal.spaceprogram.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 27471 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; COOKIE: 8380423e91abd6cf3f557df967918929ddbf0d90114d0bd1 (good) ;; QUESTION SECTION: ;kerbal.spaceprogram.com. IN A ;; AUTHORITY SECTION: spaceprogram.com. 1800 IN SOA dane.ns.cloudflare.com. dns.cloudflare.com. 2362775337 10000 2400 604800 1800 ;; Query time: 26 msec ;; SERVER: 192.168.200.1#53(192.168.200.1) (UDP) ;; WHEN: Wed Jan 22 21:11:21 -03 2025 ;; MSG SIZE rcvd: 139 Oukey, no ANSWER SECTION. The entry was deleted. Since I had did a dig this afternoon and registered the data: It ends up that I know the Cloudflare IPs serving Forum. So I edited my /etc/hosts file : ## # Host Database # # localhost is used to configure the loopback interface # when the system is booting. Do not change this entry. ## 127.0.0.1<----->localhost 255.255.255.255>broadcasthost ::1 localhost 172.64.145.232><------>forum.kerbalspaceprogram.com And here I am!!! Not being satisfied, I did another dig on a different network: > dig forum.kerbalspaceprogram.com. ; <<>> DiG 9.16.12 <<>> forum.kerbalspaceprogram.com. ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 52563 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;forum.kerbalspaceprogram.com. IN A ;; ANSWER SECTION: forum.kerbalspaceprogram.com. 300 IN CNAME sp-forum-elb-2033387385.us-west-2.elb.amazonaws.com. ;; AUTHORITY SECTION: us-west-2.elb.amazonaws.com. 8 IN SOA ns-332.awsdns-41.com. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 60 ;; Query time: 9 msec ;; SERVER: 10.0.0.2#53(10.0.0.2) ;; WHEN: Thu Jan 23 00:10:45 UTC 2025 ;; MSG SIZE rcvd: 197 And voilà... The new owners are moving Forum to AWS. In a few hours the new DNS entry will propagate and fix everything. --- POST EDIT --- Now forum.kerbalspaceprogram.com is being resolved on my cable-tv network too: > dig forum.kerbalspaceprogram.com ; <<>> DiG 9.20.3 <<>> forum.kerbalspaceprogram.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 15297 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;forum.kerbalspaceprogram.com. IN A ;; ANSWER SECTION: forum.kerbalspaceprogram.com. 3600 IN CNAME sp-forum-elb-2033387385.us-west-2.elb.amazonaws.com. ;; AUTHORITY SECTION: us-west-2.elb.amazonaws.com. 60 IN SOA ns-332.awsdns-41.com. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 60 ;; Query time: 124 msec ;; SERVER: 192.168.200.1#53(192.168.200.1) (UDP) ;; WHEN: Wed Jan 22 23:39:33 -03 2025 ;; MSG SIZE rcvd: 197 But sp-forum-elb-2033387385.us-west-2.elb.amazonaws.com still doesn't - not on my home network, neither on my remote one, so this is the reason Forum is out of reach to the World. > dig sp-forum-elb-2033387385.us-west-2.elb.amazonaws.com ; <<>> DiG 9.20.3 <<>> sp-forum-elb-2033387385.us-west-2.elb.amazonaws.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 52641 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1280 ;; QUESTION SECTION: ;sp-forum-elb-2033387385.us-west-2.elb.amazonaws.com. IN A ;; Query time: 0 msec ;; SERVER: 192.168.200.1#53(192.168.200.1) (UDP) ;; WHEN: Wed Jan 22 23:39:55 -03 2025 ;; MSG SIZE rcvd: 80 Oukey, lets wait a couple hours more. Yes, it keeps on bad requesting and other issue. Link to comment Share on other sites More sharing options...
Kerbeloper Posted 12 hours ago Share Posted 12 hours ago Thank you @Lisias for your investigation. So the easy guide to access KSP forum bypassing the current DNS issues on Windows is: Press start and search for the Notepad Right click on it and execute it as administrator File -> Open Paste this path on the File Explorer's address bar: C:\Windows\System32\drivers\etc If you see an empty folder, use the filetype selector on the bottom right and chose "All Files (*.*)" Select the "hosts" file and click Open Paste this line at the bottom of the file (use enter to add a line if necessary): 172.64.145.232 forum.kerbalspaceprogram.com Save the file Close and reopen the whole browser Now you can access the forum I hope this post can help non-tech people who will see this through secondary boards, for example i've seen screenshots of @Lisias's post on reddit Remember to remove the added line from your hosts file once the technical problems are fixed Good luck to the system admins who are working on the problem Link to comment Share on other sites More sharing options...
FTLparachute Posted 6 hours ago Share Posted 6 hours ago Just got access now - back again Link to comment Share on other sites More sharing options...
Eclipse 32 Posted 6 hours ago Share Posted 6 hours ago just returned. if it is being moved to a different server, then the question is why? just flickered down for me Link to comment Share on other sites More sharing options...
Mr. Kerbin Posted 6 hours ago Share Posted 6 hours ago yaayyayyaayayayayayayayyay Link to comment Share on other sites More sharing options...
Lisias Posted 6 hours ago Share Posted 6 hours ago 17 minutes ago, Eclipse 32 said: just returned. if it is being moved to a different server, then the question is why? just flickered down for me Change of ownership, change of center of cost, change of accounts. Or perhaps... There's a chance that they just ditched CloudFlare in favor of Elastic Load Balance, and that Forum was always hosted on AWS - that M.O. we had in the past, where we could double post on a http 5xx event, suggests that they had a outbound quota but not an inbound quota, and this is exactly how AWS operates. Granted, not only them, but still... Link to comment Share on other sites More sharing options...
Eclipse 32 Posted 5 hours ago Share Posted 5 hours ago 8 minutes ago, Lisias said: Change of ownership, change of center of cost, change of accounts. Or perhaps... There's a chance that they just ditched CloudFlare in favor of Elastic Load Balance, and that Forum was always hosted on AWS - that M.O. we had in the past, where we could double post on a http 5xx event, suggests that they had a outbound quota but not an inbound quota, and this is exactly how AWS operates. Granted, not only them, but still... in dumb person terms, please? Link to comment Share on other sites More sharing options...
MinimumSky5 Posted 5 hours ago Share Posted 5 hours ago 5 minutes ago, Eclipse 32 said: in dumb person terms, please? The new owners could simply use AWS for their own existing services, and so want to avoid maintaining Cloudflare and AWS, or they might have just shopped around and found a better deal with AWS. There is nothing suspect about this move. Link to comment Share on other sites More sharing options...
Eclipse 32 Posted 5 hours ago Share Posted 5 hours ago in i-can't-code-a-flying-potato dumb person terms? Link to comment Share on other sites More sharing options...
glibbo Posted 5 hours ago Share Posted 5 hours ago Bless you for your efforts Llsias, much appreciated Link to comment Share on other sites More sharing options...
Scarecrow71 Posted 5 hours ago Share Posted 5 hours ago 11 minutes ago, Eclipse 32 said: in i-can't-code-a-flying-potato dumb person terms? They don't own the servers that the site was originally hosted on, so to save costs they moved it to servers they do own. Or at least have a financial stake in. Link to comment Share on other sites More sharing options...
Lisias Posted 5 hours ago Share Posted 5 hours ago (edited) 47 minutes ago, Eclipse 32 said: in dumb person terms, please? Well... When your browsers calls a site, it uses a thingy called URL. In our specific case, this one: https://forum.kerbalspaceprogram.com/topic/226141-so-we-had-some-kind-of-technical-problem/page/10/#comment-4438600 "https" is the name of a protocol. It's a collection of rules and guidelines to allow two systems to understand each other, in this case your browsers and a HTTPd server hosted somewhere in the World. "forum.kerbalspaceprogram.com" is a server name. It must be something that should unequivocally identify a machine in the Whole Wide West, I mean, Web. Anything after (and including) the first dash ("/") is called a Resource Name, and it's a message your browsers send to the HTTPd server asking for this exactly page we are seeing now (the "#comment-443860" thingy is called "Anchor" and serves to identify a point in the document, in that example, my last post). Problem: the Internet doesn't know squat about http or https, it only understands a thingy called TCP (and other called UDP), and TCP doesn't know squat about server names, it only understands IP address (right now, 44.240.13.95 for Forum). So someone need to translate forum.kerbalspaceprogram.com into 44.240.13.95. The dude that does that is called DNS Resolver. So your browsers ask the DNS Resolver "Dude, who is forum.kerbalspaceprogram.com?", and the resolver answer back "44.240.13.95". And so the TCP layer can reach Forum, sending to the server an https message asking for this page. Problem: if people enough decides to send messages to the server at the same time, it will drown in excessive load and will misbehave. We had witnessed exactly that in the recent past, with that pesky and persistent "http 5xxx" errors. To prevent this problem, the DNS Resolver doesn't give you the IP address of the server, but the IP address of a thingy called CDN (Content Delivery Network) or, as it being done right now, a thing called Load Balancer. The first is a dude that tries to keep a copy of everything on servers scattered around the World, and then decides what would be the one nearest you so you would be served quickly. When the CDN doesn't have a copy of the resource you are asking, then it (and only it) will reach Forum asking for that Resource, and then keep a copy for itself for the case anyone else asks for it too. These copies have an "expiring date", so old copies are refreshed regularly to avoid serving you deprecated content. A CDN can give you some additional services. as blocking bad guys from trying to screw you. Or at least most of them. Right now, I think that Forum is being served by a Load Balancer. A Load Balancer is a completely different beast. A LB keeps a pool (or list) of many servers, and when you reach the LB asking for something, it elects a server from that pool to serve the content you are asking for. If they have 10 servers, then all the load the LB is getting from many users are shared between all of them. This arrangement is better to serve content that changes all the time, defeating the purpose of a cache at first place. The first time Forum had borked, there was two DNS entries for Forum: forum.kerbalspaceprogram.com resolving to forum.kerbalspaceprogram.com.cdn.cloudflare.com forum.kerbalspaceprogram.com.cdn.cloudflare.com resolving to an IP address on the Cloudflare network We do these tricks when we want someone else to resolve the final IP address, and only want to keep control of the main server name. Then someone deleted the "forum.kerbalspaceprogram.com.cdn.cloudflare.com" entry, and so "forum.kerbalspaceprogram.com" started to resolve to something that didn't existed anymore, and your browsers could not reach the CDN servers anymore! That's the reason for the first borkage. Restoring the second DNS entry fixed the problem for us for a while. The second borkage happened when they reconfigured "forum.kerbalspaceprogram.com " to resolve into the new ELB (Elastic Load Balancer - it's how AWS call their LB) server, but for a reason still unknown they did not created the DNS Entry resolving the ELB name into AWS's IPs. And we ended in the same situation as before, but with Forum pinpointing a different CNAME this time (CNAME is how we call the "other name"). Today they finally created that missing DNS Entry, is all is working again. 32 minutes ago, glibbo said: Bless you for your efforts Llsias, much appreciated I just didn't wanted people to get scared (again), as it happened last time. Even me, on a first moment, was almost throwing the towel by some time. But then I reconsidered, because this is not how "they" usually handle things (see my post about YahooGroups above), and started to dig (pun really intended ) on the matter. Edited 5 hours ago by Lisias brute force post merge Link to comment Share on other sites More sharing options...
The Aziz Posted 5 hours ago Share Posted 5 hours ago (edited) Oh look it works. Now, on the other hand, the entire kerbalspaceprogram.com doesn't. It redirects to private division website which is also down, now it throws 504, yesterday it screamed about expired certificate. Edited 5 hours ago by The Aziz Link to comment Share on other sites More sharing options...
Eclipse 32 Posted 5 hours ago Share Posted 5 hours ago 8 minutes ago, Lisias said: Problem: if people enough decides to send messages to the server at the same time, it will drown in excessive load and will misbehave. We had witnessed exactly that in the recent past, with that pesky and persistent "http 5xxx" errors. is that what caused the Great April Derp of 2013? Link to comment Share on other sites More sharing options...
DAL59 Posted 5 hours ago Share Posted 5 hours ago How much does it cost to run the forum per month? Could we crowdfun it? Link to comment Share on other sites More sharing options...
Lisias Posted 4 hours ago Share Posted 4 hours ago 14 minutes ago, Eclipse 32 said: is that what caused the Great April Derp of 2013? It was way before my time! If I understood correctly what I had read, there happened a software update that day. Instead of changing networks or servers, they changed the program that makes Forum "be" the Forum. It's yet another hop on that long chain of events above - when the HTTPd receives the https message, it must decide what to do with that thing, and someone told it "hey, ask this program here what to do", being "this program" the Invision Forum (Invision is the company that created, maintain and sell the software used by this Forum). How the Invision Forum software operates is beyound me, but since I worked on similar products in the past, I infer that while migrating data from the old software to the new version of it (while "migrating" is a type of copy, but changing bits because the target like some things in a different format), something got wrong and data wasn't migrated, was corrupted in the process (becoming unusable) or was plain lost. But... it was before my time, there're other reasons that could explain what happened, I'm only explaining to you what I think it was more likely to had happened (assuming I had all the facts correctly)! Link to comment Share on other sites More sharing options...
Recommended Posts