Why aren't crash reports auto-submitted?

kfsone · November 14, 2015

[The original post asked the question sloppily so I rewrote it in hopes that communicating the question better might nix some of the confusion apparent in the thread. I'd posted this as a separate thread but they got merged as of page 3]

This is not about getting devs to fix bugs, this is not about trying to store each and every crash report, it is certainly not about trying to violate your privacy and sneakily submit data behind your back. It's about producing stats on crashes (fingerprints) at their end hooked in to the dialog you get first time ("mind if I send some progress data") you run ksp, and the dialog ("the game has crashed, please consider sending the crash report to the game developer [OK]") that pops when you crash the game.

If that sounds too magical or not useful enough, read through

ZOMG That's impossible because ... (otherwise skip to TL;DR)

I realize that the crash reports for KSP are polluted with module crashes, gfx driver crashes, and so forth. However the stability of the game itself would still rise significantly above such background noise, and in particular, around release time, Squad should want to see the *shape* of the crash reports to determine if they've unleashed something crazy that's broken everyone but them. For instance, if 30% of your players are crashing because of an AMD driver conflict, you might want to know.

To do this, you'd need the client to offer the user the chance to send a few data points:

- Client version (a 4-byte or 32-bit number),

- Client Platform (a 1-byte or 8-bit number),

- Exception type (a 4-byte or 32-bit number),

- Exception address (an 8-byte or 64-bit number, you only need 48 bits today and 52 bits by 2020 but lets not fuss about 16 bits)

Most users are just going to check the box to let these be submitted automatically, as long as they are shown a simple example of what gets sent, and so no, it doesn't have to become an extra thing to do every time you crash.

Or it could be: The default unity crash handler already pops-up when the game crashes and implores you to send your crash report to the developers. It could be tuned to display "You should post your crash report on the forums if it keeps occurring. Would you like to send a crash summary to Squad? [OK] [Cancel]". That's instead of the box you have to click OK in, not in addition to.

It's a huge amount of data!

I'm not sure what you're thinking of. People can only crash so fast and there's only so many people playing the game. Lets say over the period of 14 days 250,000 players crash 30 times a day. Thats 14 * 30 * 25000 * (17 + 8) where the +8 is to store a timestamp. That's 262,500,000bytes. Ooo, that's a lot! Oh wait, that's 262Mb. That's quarter of a gig over a month. Your niece's website probably stores more data to log all the webcrawlers that visit it.

So yeah, you might be talking 3Gb of data over a year, but after about 2-3 months you only need the daily or hourly counts, and I'm pretty sure that the above is a worst-case scenario. It's going to take you a really, really long time to reach 1/10th of a Tb if you do this even moderately smartly.

It's a trivial amount of data that gives a game developer a huge insight into the state and stability of their product and can easily be tied into ops-stream reporting to correlate things like "we put out a patch and everyone is crashing". And the non-module crashes will be distinct because they're going to be specific types of crashes with higher frequencies than the module-related crashes.

So there's good signal there.

It's not hard to do - you simply submit a "GET" request with a parameterized URL, REST style if you want, or a POST with parameters. Beyond that, it's pretty simple.

Processing all those crashes is a staggering amount of CPU ...

No, it isn't. It's a simple web-handler that logs five values to a database. The crash is already in a machine-readable form on the client, it uses that to generate the human readable format (output.log) you're thinking of. So we're not talking about building some super-AI that actualy reads the crash report, *sigh*.

But maybe you're still thinking that web-servers to handle those crashes are expensive. Nope, probably 2-3 amazon instances would handle it fine - the requests are incredibly light weight and the workload is almost non-existent - write to a database.

That seems kind of crap and useless

I'm not here to teach you how to be a game developer, but this is the kind of insight that once you have you can't understand how you survived without. Find a game producer of a successful live-development game on Twitter or Facebook and ask them how they'd feel if you took away their automatic crash logging/stats?

Now you could also iterate this into something more fancy, like a number of games, and when a particular crash has had 2-3 reports you have the back-end ask the client for the full crash report. Again, user control over whether that happens, but if it's a high-frequency crash, then sooner or later someone will say ok.

Once you have more than 2-3 crash reports, you can stop taking reports for that particular crash. But the thing is that now, if you have an interest in a particular crash, you also have some handy crash report instances without having to go to the forums/community.

After a few weeks, you delete crash reports that didn't get any traction, keeping the ones that had bugs/tickets assigned to them or were used by developers.

It's not "hard" to do, it probably takes a few days to implement the whole thing, and maybe a week or so to tune and get the internal front-end (for viewing the stats) into a useful state, but you can use a lot of off-the-shelf and open-source stuff to implement the vast bulk of it. It's not business criticial so you don't care if some reports get missed, the stats don't have to be perfect, just representative.

TL;DR: The question

As someone who has worked on/with a number of crash reporting systems, front- and back-end, on a number of games, I'm curious why KSP doesn't auto-submit even the simplest crash-fingerprints for statistics at their end?

Is Squad relatively new and just unaware of the value of such a system, or are they just a really small team that doesn't have the resources to develop something so simple (or investigate just how simple it actually is, a lot of people will immediately go to the "that's a ....-ton of data, nobody can handle that" mindset, which is mind-blowingly ridiculous once you actually understand whats involved), or is it just that KSPers aren't going to allow any auto-submission of anything so that it's just not worth doing?

Edited November 17, 2015 by kfsone

Wallygator · November 14, 2015

If a tree falls and there is no one around, does it make a noise?

I like the idea, but have no idea of knowing if the dev team is geared up to handle the volumes. The process and organisation required support automated crash reporting is likely not in place and there are also probably no plans or strategy defined at the moment to implement any of it.

I wait in hope...

Mat2ch · November 14, 2015

Aaaaand I don't want a game to submit anything over the internet without my permission.

Wallygator · November 14, 2015

Aaaaand I don't want a game to submit anything over the internet without my permission.

If properly implemented, the user is allowed to opt out.

But, I still agree.

Edited November 14, 2015 by Wallygator
out not up ffs = jeeze learnt type already!

monophonic · November 14, 2015

If properly implemented, the user is allowed to opt out.
But, I still agree.

Opt out is not proper implementation. Opt in works in all relevant legislations.

Gargamel · November 14, 2015

Just because it is easy to do, doesn't mean you should. See above for reasons why.

pandaman · November 14, 2015

Opt out is not proper implementation. Opt in works in all relevant legislations.

How about...

Each time you 'fire up' KSP it checks to see if there is a crash log saved and then asks if you want to 'send all crash logs to Squad?'

Or, failing that, include it in the 'send anonymous game data' thing when first starting a new install.

EladDv · November 14, 2015

There are a few reasons not to do so.

1) Crashes happen for multitude of reasons- from PC issues to OS specific problems , modding , random luck , Unity issues , ram running out.

2) the sheer volume of reports that would come in would make it impossible to be useful.

3) the big servers they would need to sort them and store them.

and those are the ones i am thinking of right now

kfsone · November 14, 2015

If a tree falls and there is no one around, does it make a noise?
I like the idea, but have no idea of knowing if the dev team is geared up to handle the volumes. The process and organisation required support automated crash reporting is likely not in place and there are also probably no plans or strategy defined at the moment to implement any of it.
I wait in hope...

Apples and oranges: you don't not go to the grocery store because they sell too many things.

Current situation: Dev team has no clue how many of any given crash there is, or what bugs they have introduced, because people don't write bug reports or submit their bugs very often. When a crash *is* identified, it might take a while for them to get good data on it.

Alternate situation: After any given period of time, the devs have lots of simple crash data (what type of crash, what memory address, no personally identifying data). So after a release goes out, they can quickly get a sense of "holy crap, everyone is crashing", vs "yeah, there's a few of these", and so forth. When a specific crash becomes a problem, they are highly likely to have a small number of detailed crash reports for it ready to be looked at.

This doesn't swamp a dev team, it liberates them.

You don't email these reports *to* the dev team, they don't *have* to read them, but not having the data ... is just mind blowing, to me.

- - - Updated - - -

There are a few reasons not to do so.
1) Crashes happen for multitude of reasons- from PC issues to OS specific problems , modding , random luck , Unity issues , ram running out.
2) the sheer volume of reports that would come in would make it impossible to be useful.
3) the big servers they would need to sort them and store them.
and those are the ones i am thinking of right now

#1 that's why you want insight into them,

#2 no,

#3 no.

0/3. Did you even realize I was talking about the crashes that KSP's current crash-handler catches with the little "hey, send my crash log" messagebox?

- - - Updated - - -

How about...
Each time you 'fire up' KSP it checks to see if there is a crash log saved and then asks if you want to 'send all crash logs to Squad?'
Or, failing that, include it in the 'send anonymous game data' thing when first starting a new install.

Right now, when the game crashes, it writes everything to a file and then displays a little MessageBox telling you to send in the crash report.

So instead of displaying that, that's where you write a traditional "hey, I'd like to send this crash report for you, here's the data I'd send".

And in the majority of cases you're just going to send a few utterly unidentifiable data points - exception type, address, possibly the a register dump, but probably not.

Just the sequence of:

and a counter for that so that at a glance you can see what types and numbers of crashes you're getting can be incredibly helpful. If you're getting crashes all over the place then it's still useful data for someone working the producer role.

If you're not, then you have valuable data telling you what issues and what degree you have to help you balance debug vs develop.

- - - Updated - - -

Aaaaand I don't want a game to submit anything over the internet without my permission.

Taken as read, sorry for not stipulating.

Edited November 14, 2015 by kfsone

Wallygator · November 14, 2015

Opt out is not proper implementation. Opt in works in all relevant legislations.

Your are correct, Sir!

I doff my hat to you!

kfsone · November 14, 2015

Opt out is not proper implementation. Opt in works in all relevant legislations.

It doesn't have to be automatic, which makes it not opt-in or opt-out but actionable. The data submitted would generally be trivial, just enough for them to track "degree of breakage" - how often are we getting crash reports, are they the same crash, etc.

Some folks will say "oh no, too many crash types" - no; that's not how it works. If you're getting a million of one crash or 100 of 10,000 crashes, that's incredibly useful data. If you're seeing a million different crashes, you have a stomp.

That's why you do either the http or smtp thing, as a filtering process; the server logic decides whether the simple crash submission was enough.

So you get prompted:

"Hey, I crashed, can I tell Squad this: <1.0.5>/<pc>/<divide by zero>/<0x7fffffffff>?"

And sometimes it will say "Great! Squad would like a little more info to help them when they look at this crash, and You've Been Selected. We'll try very hard not to include any personally identifiable data but if you'd just renamed Jeb to Miss Princesses' Little Bunnywabbit we can't guarantee that the text might not appear in the data.

Here's what we'd like to send:

<more detailed crash logs with some system info, carefully vetted>"

They probably only need 3-4 for a given, recurring crash, but then you also don't want it the first time a particular type/addr happens.

Edited November 14, 2015 by kfsone

monophonic · November 14, 2015

How about...
Each time you 'fire up' KSP it checks to see if there is a crash log saved and then asks if you want to 'send all crash logs to Squad?'
Or, failing that, include it in the 'send anonymous game data' thing when first starting a new install.

Both work, as long as you don't transmit anything you haven't asked and got permission for.

kfsone · November 14, 2015

Both work, as long as you don't transmit anything you haven't asked and got permission for.

So, we're not talking about ninja-privacy-violating automatic, we're talking about the system visibly and with user agreement sending the crash reports, and you're not going to lose your job because the corporate firewall catches you submitting a ksp crash dump that you didn't know about.

Wallygator · November 14, 2015

Apples and oranges: you don't not go to the grocery store because they sell too many things.
Current situation: Dev team has no clue how many of any given crash there is, or what bugs they have introduced, because people don't write bug reports or submit their bugs very often. When a crash *is* identified, it might take a while for them to get good data on it.
Alternate situation: After any given period of time, the devs have lots of simple crash data (what type of crash, what memory address, no personally identifying data). So after a release goes out, they can quickly get a sense of "holy crap, everyone is crashing", vs "yeah, there's a few of these", and so forth. When a specific crash becomes a problem, they are highly likely to have a small number of detailed crash reports for it ready to be looked at.
This doesn't swamp a dev team, it liberates them.
You don't email these reports *to* the dev team, they don't *have* to read them, but not having the data ... is just mind blowing, to me.
- - - Updated - - -
#1 that's why you want insight into them,
#2 no,
#3 no.
0/3.
- - - Updated - - -
Right now, when the game crashes, it writes everything to a file and then displays a little MessageBox telling you to send in the crash report.
So instead of displaying that, that's where you write a traditional "hey, I'd like to send this crash report for you, here's the data I'd send".
And in the majority of cases you're just going to send a few utterly unidentifiable data points - exception type, address, possibly the a register dump, but probably not.
Just the sequence of:
<version>/<platform>/<crash type>/<address>
and a counter for that so that at a glance you can see what types and numbers of crashes you're getting can be incredibly helpful. If you're getting crashes all over the place then it's still useful data for someone working the producer role.
If you're not, then you have valuable data telling you what issues and what degree you have to help you balance debug vs develop.
- - - Updated - - -
Taken as read, sorry for not stipulating.

OK, I'm not going to attempt to educate the "masses" and please don't take this as personal since I have no idea of any forum member's (or your) level of experience pertaining to this particular topic.

I (perhaps wrongly) assume that folks contributing to this thread have a reasonable level of experience addressing core elements of IT and service support.

(Disclamer completed)

Regardless of any preferred method of "Reporting errors": I STILL FEEL UNCONVINCED that our beloved (and in my opinion still over stretched) dev team has the capacity, insight, strategy or process in place to effectively and efficiently comprehend and address this potential plethora of automated crash reports that will overwhelm them following implementation of such capability.

"Liberation" assumes that the appropriate processes, governance and prioritisation capabilities are in place and validated well ahead of the "tsunami of crash reports" that will ultimately ensue. And, If folks (you the reader) don't comprehend the necessary elements of human intervention that automated crash reporting implies, then I kindly suggest the community should start code-ifing it all now. Let me know how you get on.

I genuinely encourage anyyone who dreams of automated crash reporting (that confers comprehension of deep problem management and mitigation) to wake up and join the present day IT profession.

TL;DR: This is not simple.

NOTE: Apologies to Squad if the above is already incorporated into their master plan.

kfsone · November 14, 2015

"Liberation" assumes that the appropriate processes, governance and prioritisation capabilities are in place and validated well ahead of the "tsunami of crash reports" that will ultimately ensue. And, If folks (you the reader) don't comprehend the necessary elements of human intervention that automated crash reporting implies, then I kindly suggest the community should start code-ifing it all now. Let me know how you get on.
I genuinely encourage anyyone who dreams of automated crash reporting (that confers comprehension of deep problem management and mitigation) to wake up and join the present day IT profession.

The fact that you imagine it to be harder than it is only speaks to the fact you've not had to deal with it yet. I'm pretty sure that if you were tasked with implementing it you'd quickly realize it's a lot less work than it sounds. You're basically adding mass to the problem and then complaining about the dV required to lift it off the ground.

So I'll give you a few starters.

- The worst-case RPS is clearly defined: count(players) / maxCrashPerPlayerPerSecond.

- The logic is trivial.

-- UPDATE crashes SET count = count + 1 WHERE platform = ? AND version = ? AND execption = ? AND pc = ?;

-- SELECT IF(count > BETWEEN ? and ?,302,200) AS result WHERE platform = ? AND version = ? AND exception = ? AND pc = ?;

- The vast majority of the time you're just handling a GET of "/<platform>/<version>/<exception>/<pc>" based on the parameters in one of the error log files.

But how likely is it that you'll experience the critical absolute worst-case scenario? Maybe after patch some large proportion of your players will all play at the same time, but it's rare that more than 10% of the players of a game be playing at the same time.

One guy earlier said "there are lots of types of crashes", we're only interested in the types that produce an error.log and then we use the well-defined, well enumerated values that are turned into text in that file. They are: exception type ("KSP.exe caused an Access Violation (0xc0000005)" where c..5 is the exception enumeration) and the program counter ("in module KSP.exe at 0023:005cf31c." where a 64-bit number will handle all the address ranges you need, but you might want to distinguish 32 from 64 bit.

I can only speak to how easy it's been to implement at the places I've worked on games in the last 14 years out of my 27 years as a software developer.

What I will credit as being "hard" is getting someone on any given team to put together the thing that lets you see the current stats

Edited November 14, 2015 by kfsone

Majorjim! · November 14, 2015

Because there are so many and they are so frequent that there in not enough storage on the entire Earth to house them.

Wallygator · November 14, 2015

The fact that you imagine it to be harder than it is only speaks to the fact you've not had to deal with it yet. I'm pretty sure that if you were tasked with implementing it you'd quickly realize it's a lot less work than it sounds. You're basically adding mass to the problem and then complaining about the dV required to lift it off the ground.
So I'll give you a few starters.
- The worst-case RPS is clearly defined: count(players) / maxCrashPerPlayerPerSecond.
- The logic is trivial.
-- UPDATE crashes SET count = count + 1 WHERE platform = ? AND version = ? AND execption = ? AND pc = ?;
-- SELECT IF(count > BETWEEN ? and ?,302,200) AS result WHERE platform = ? AND version = ? AND exception = ? AND pc = ?;
- The vast majority of the time you're just handling a GET of "/<platform>/<version>/<exception>/<pc>" based on the parameters in one of the error log files.
But how likely is it that you'll experience the critical absolute worst-case scenario? Maybe after patch some large proportion of your players will all play at the same time, but it's rare that more than 10% of the players of a game be playing at the same time.
One guy earlier said "there are lots of types of crashes", we're only interested in the types that produce an error.log and then we use the well-defined, well enumerated values that are turned into text in that file. They are: exception type ("KSP.exe caused an Access Violation (0xc0000005)" where c..5 is the exception enumeration) and the program counter ("in module KSP.exe at 0023:005cf31c." where a 64-bit number will handle all the address ranges you need, but you might want to distinguish 32 from 64 bit.
I can only speak to how easy it's been to implement at the places I've worked on games in the last 14 years out of my 27 years as a software developer.
What I will credit as being "hard" is getting someone on any given team to put together the thing that lets you see the current stats

Collecting and having stats available is straightforward, yes. I cannot disagree with you. But code alone will not deliver a viable service capability.

You are also quite perceptive, I do not have decades of experience creating or managing digital game support systems - rather decades of experience creating and managing a wide variety of business systems and adjunct support processes including codified elements. Perhaps my fundamental mistake is assuming there are common principles. I apologise if this is an incorrect assumption. That said, I do not in anyway discount anyone's experience on this particulate subject.

To terminate my contribution to this thread in a constructive manner --> I believe everything you said sounds great. I await the moment where i'm convinced that THIS particular game and THIS particular dev team can do it.

Perhaps someone could volunteer time make this all happen seeing as it is so straight forward. That would be a really good (actually great ) outcome. In good conscience, I would even purchase another game license to support the effort, knowing that such an effort was underway.

Over and out.

Edited November 14, 2015 by Wallygator

Temeter · November 14, 2015

Probably because it just doesn't make much sense; KSP got countless crashs because of memory or mods. Sorting auto-reports would be a huge amount of work and mostly a waste of time.

How about...
Each time you 'fire up' KSP it checks to see if there is a crash log saved and then asks if you want to 'send all crash logs to Squad?'
Or, failing that, include it in the 'send anonymous game data' thing when first starting a new install.

That would be utterly annoying. Not only crashes the game quite easily (if not always by squads fault), but now you have to klick away some silly message? I'm already bothered by the 'this is for 1.0.x' mod messages.

kfsone · November 15, 2015

Because there are so many and they are so frequent that there in not enough storage on the entire Earth to house them.

Must be taken up by the other games that do this, then? (EverQuest, DAoC, WWII Online, Blizzard, Eve, have been doing this for over a decade... It's really not hard, and it shouldn't have taken much reading of what I wrote to understand you don't have to store *everything*)

If you have a million crash reports, with a platform id, version id, 64-bit address, 64-bit timestamp, and a 64-bit program counter and a 32-bit incident counter... that's 36 bytes per record, so even if you have a million crash reports for a particular build, <drum roll> that's a tiny amount of data by today's standards. 36 million bytes is 34Mb. I'm pretty sure that's not more storage than the entire earth. If you had a billion crash reports for a single build, (a) you're doing it wrong, ( that's 36Gb. Still not more than the entire earth, although it might be more than your PC.

Edited November 15, 2015 by kfsone

kfsone · November 15, 2015

Collecting and having stats available is straightforward, yes. I cannot disagree with you. But code alone will not deliver a viable service capability.
You are also quite perceptive, I do not have decades of experience creating or managing digital game support systems - rather decades of experience creating and managing a wide variety of business systems and adjunct support processes including codified elements. Perhaps my fundamental mistake is assuming there are common principles. I apologise if this is an incorrect assumption. That said, I do not in anyway discount anyone's experience on this particulate subject.

No, you're just trying to implement too complex a solution for the problem. The problem: no visibility of what kind of patterns of crash behavior the development team is causing. Solution: Count them.

It's not true that there's an infinite number of addresses that a given build can crash at - it'll only crash at so many locations, even with a stomp or a leak.

So - as I teased above, the number of counters you need for a given build is very small. One very, very large MMO I'm familiar with uses less than 50Gb of storage for 11 years of many millions of players automated crash reports. The trick is, they don't store the whole thing every time. They only sample a few so that they have *some* data, and the mass of the rest they just bump a counter.

But in order for that to happen, the client has to have the ability to send the crash report.

The game already does everything but send the report, and they're using Unity so ... it's a previously solved problem (http://www.jenkinssoftware.com/raknet/manual/crashreporter.html just off the top of my head). Gathering the data is extremely trivial - remember, we're talking summary

UPDATE crash_counts SET counter = counter + 1 WHERE platform = ? AND version = ? AND pcr = ?

or thereabouts.

Perhaps someone could volunteer time make this all happen seeing as it is so straight forward. That would be a really good (actually great ) outcome. In good conscience, I would even purchase another game license to support the effort, knowing that such an effort was underway.

1. Write a stored procedure that takes platform#, version#, exception type, and program counter,

2. Have the SP create or increment the plat/ver/exc/pcr counter,

3. Select a cloud and/or web service to host a trivial service that can host a lightweight get-based requester,

4. On receiving a GET and doing basic validation of the request, call the SP,

Your choices are: simple, granular counters (e.g. roll counters at the hour) or simply record each crash report as a line item (it's never going to be a lot of data but I'm not getting paid to argue the case, so *shrug*)

You could split it into a couple of components: an http request handler and a counter/writer that services a zmq request channel... (Or just write it in go).


# HTTP GET handler


import json
import zmq


class Context(object):
    ctx = zmq.Context()
    sock = ctx.socket(zmq.PUSH)


    def __init__(self):
        self.sock.connect("tcp://counter-master", 10234) # needs timeout


    def send_count(self, plat, ver, exc, addr):
        self.sock.send_json({
            'cmd':'count', 'p':plat, 'v':ver, 'e':exc, 'a':addr
        })
        return True


def http_get_handler(context, request, keys):
    try:
        platform  = keys['p']  # 0 = pc, 1 = mac, 2 = linux, etc...
        version   = keys['v']  # 32-bit numeric value
        exception = keys['e']  # exception code (32-bit value)
        addr      = keys['a']  # program counter (address of crash)
    except KeyError as e:
        raise HTTP500("Bugger off, hacker. " + str(e))


    try:
        platform = int(platform)
        version  = int(version)
        exception = int(exception)
        addr = int(addr)
    except ValueError as e:
        raise HTTP500("You bad hacker, you. " + str(e))


    try:
        if platform < 0 or platform > MAX_PLATFORM:
            raise ValueError("Invlaid platform")
        if version < 0x01000000 or version > MAX_VERSION:
            raise ValueError("Invalid version")
        if exception <= 0 or exception >= 1 << 32:
            raise ValueError("Invalid exception")
        if addr >= 1 << 52:
            raise ValueError("Valid program counter - if it was 2025")
    except ValueError as e:
        raise HTTP500("You're just terribad at this hacking. " + str(e))


    if context.send_count() is not True:
        raise HTTP500("Internal error. You win this time.")


    return 200, "OK"


class Service:
    ctx = zmq.Context()
    sock = ctx.socket(zmq.PULL)
    counters = default(dict) # count by version for various reasons


    def __init__(self):
        self.sock.bind("tcp://counter-master", 10234)


    def run(self):
        while True:
            msg = self.sock.recv_json()

            if msg['cmd'] == 'count':
                self.count(msg)
            elif msg['cmd'] == 'flush':
                self.flush(msg['version']


    def count(self, msg):
        plat, ver, exc, addr = msg['p'], msg['v'], msg['e'], msg['a']
        versions = self.counters.get(ver, None)
        if versions is None:
            versions = self.counters[ver] = {}
        verPlats = versions.get(plat, None)
        if verPlats is None:
            verPlats = versions[plat] = defaultdict(int)
        verPlats['{}.{}'.format(exc, addr)] += 1


        self.update_database()


    def update_database(self):
        # you probably want to mark the record as dirty,
        # and perform the underlying database commit in a few seconds
        # rather than immediately, allowing you to batch writes.
        pass


    def flush(self, flushVersion):
        # optimistic approach
        self.counters = {
            k: v for k, v in self.counters
            if k > flushVersion
        }

I didn't spell check that but basically something along those lines.

- - - Updated - - -

Probably because it just doesn't make much sense; KSP got countless crashs because of memory or mods. Sorting auto-reports would be a huge amount of work and mostly a waste of time.
That would be utterly annoying. Not only crashes the game quite easily (if not always by squads fault), but now you have to klick away some silly message? I'm already bothered by the 'this is for 1.0.x' mod messages.

Just about every other game on the planet does this, several game services provide this ability as a built-in.

I've not done a good job of addressing my question so that everyone is on the same page: that is, I'm not talking about getting bugs fixed, I'm not talking about sending each and every crash report automatically, nor to some person's mailbox.

Indeed, mostly what I'm talking about is using the crash reporter to collect stats. The fact that you could also then collect some small subset of each crash that happens more than N times would mean that the developers would have data on-hand when they need it.

As to the size of that data, it's tiny. Radically less than the web-logs for most major websites. And it's not the business end of the system so it doesn't matter if you don't keep every last octet of data.

But not having any of it is a problem. What we're talking about is when a new patch goes out.

"But mods" - no, you're wrong. Mods will change the crashes, and ultimately what they will want to care about are crashes they introduce during testing and immediately during release.

Some of you are thinking of this as a hard direct-to-bug-solving thing, but it's not, it's a "production" issue.

What we're talking about is submitting the four key values (platform, version, exception type and pcr) so that they can draw graphs, so that they can tell how stable the current build is, and so forth.

Data: In the order of gigabytes for most games after 10 years of collection,

Use : Monitoring, not bug fixing - although some games also use it for bug fixing when something spikes really hard, e.g. WoW,

Cost: Less than the staff coffee, over time.

Difficulty: Well, maybe because I've written something like this for two different games, as well as for two different gaming middleware products, I don't think it's "difficult", but then my wife doesn't thinking it's "difficult" to operate our feking toaster, and I swear she's a witch.

kfsone · November 16, 2015

This is not about getting devs to fix bugs, this is not about trying to store each and every crash report, it is certainly not about trying to violate your privacy and sneakily submit data behind your back. It's about producing stats on crashes (fingerprints) at their end hooked in to the dialog you get first time ("mind if I send some progress data") you run ksp, and the dialog ("the game has crashed, please consider sending the crash report to the game developer [OK]") that pops when you crash the game.

If that sounds too magical or not useful enough, read through

[B][I]ZOMG That's impossible because ... [/I][/B](otherwise skip to [I][B]TL;DR[/B][/I])

I realize that the crash reports for KSP are polluted with module crashes, gfx driver crashes, and so forth. However the stability of the game itself would still rise significantly above such background noise, and in particular, around release time, Squad should want to see the *shape* of the crash reports to determine if they've unleashed something crazy that's broken everyone but them. For instance, if 30% of your players are crashing because of an AMD driver conflict, you might want to know.

To do this, you'd need the client to offer the user the chance to send a few data points:

- Client version (a 4-byte or 32-bit number),
- Client Platform (a 1-byte or 8-bit number),
- Exception type (a 4-byte or 32-bit number),
- Exception address (an 8-byte or 64-bit number, you only need 48 bits today and 52 bits by 2020 but lets not fuss about 16 bits)

Most users are just going to check the box to let these be submitted automatically, as long as they are shown a simple example of what gets sent, and so no, it doesn't have to become an extra thing to do every time you crash.

Or it could be: The default unity crash handler already pops-up when the game crashes and implores you to send your crash report to the developers. It could be tuned to display "You should post your crash report on the forums if it keeps occurring. Would you like to send a crash summary to Squad? [OK] [Cancel]". That's instead of the box you have to click OK in, not in addition to.

[B][I]It's a huge amount of data!

[/I][/B]I'm not sure what you're thinking of. People can only crash so fast and there's only so many people playing the game. Lets say over the period of 14 days 250,000 players crash 30 times a day. Thats 14 * 30 * 25000 * (17 + 8) where the +8 is to store a timestamp. That's 262,500,000bytes. Ooo, that's a lot! Oh wait, that's 262Mb. That's quarter of a gig over a month. Your niece's website probably stores more data to log all the webcrawlers that visit it.

So yeah, you might be talking 3Gb of data over a year, but after about 2-3 months you only need the daily or hourly counts, and I'm pretty sure that the above is a worst-case scenario. It's going to take you a really, really long time to reach 1/10th of a Tb if you do this even moderately smartly.

It's a trivial amount of data that gives a game developer a huge insight into the state and stability of their product and can easily be tied into ops-stream reporting to correlate things like "we put out a patch and everyone is crashing". And the non-module crashes will be distinct because they're going to be specific types of crashes with higher frequencies than the module-related crashes.

So there's good signal there.

It's not hard to do - you simply submit a "GET" request with a parameterized URL, REST style if you want, or a POST with parameters. Beyond that, it's pretty simple.

[B][I]Processing all those crashes is a staggering amount of CPU ...
[/I][/B]
No, it isn't. It's a simple web-handler that logs five values to a database. The crash is already in a machine-readable form on the client, it uses that to [I]generate [/I]the human readable format (output.log) you're thinking of. So we're not talking about building some super-AI that actualy reads the crash report, *sigh*.

But maybe you're still thinking that web-servers to handle those crashes are expensive. Nope, probably 2-3 amazon instances would handle it fine - the requests are incredibly light weight and the workload is almost non-existent - write to a database.

[B][I]That seems kind of crap and useless
[/I][/B]
I'm not here to teach you how to be a game developer, but this is the kind of insight that once you have you can't understand how you survived without. Find a game producer of a successful live-development game on Twitter or Facebook and ask them how they'd feel if you took away their automatic crash logging/stats?

Now you could also iterate this into something more fancy, like a number of games, and when a particular crash has had 2-3 reports you have the back-end ask the client for the full crash report. Again, user control over whether that happens, but if it's a high-frequency crash, then sooner or later someone will say ok.

Once you have more than 2-3 crash reports, you can stop taking reports for that particular crash. But the thing is that now, if you have an interest in a particular crash, you also have some handy crash report instances without having to go to the forums/community.

After a few weeks, you delete crash reports that didn't get any traction, keeping the ones that had bugs/tickets assigned to them or were used by developers.

It's not "hard" to do, it probably takes a few days to implement the whole thing, and maybe a week or so to tune and get the internal front-end (for viewing the stats) into a useful state, but you can use a lot of off-the-shelf and open-source stuff to implement the vast bulk of it. It's not business criticial so you don't care if some reports get missed, the stats don't have to be perfect, just representative.

[B][I]TL;DR: The question[/I][/B]

As someone who has worked on/with a number of crash reporting systems, front- and back-end, on a number of games, I'm curious why KSP doesn't auto-submit even the simplest crash-fingerprints for statistics at their end?

Is Squad relatively new and just unaware of the value of such a system, or are they just a really small team that doesn't have the resources to develop something so simple (or investigate just how simple it actually is, a lot of people will immediately go to the "that's a ....-ton of data, nobody can handle that" mindset, which is mind-blowingly ridiculous once you actually understand whats involved), or is it just that KSPers aren't going to allow any auto-submission of anything so that it's just not worth doing?

sal_vager · November 16, 2015

Shame there isn't a tag for both a suggestion and a discussion, moved to development anyway.

The devs [I]do[/I] look here.

The crash report system is old, it goes back a long way and hasn't been changed, so I guess to do what you're asking for would be a lot of work when there already exists a bug tracker for players issues.

Sure they could do more with this, but yes they are a small team, yes they are new to game development, and as someone who sees enough crash reports to choke a black hole I don't think they are as useful as you think they are.

Generally they show that the Unity3D implementation of Mono crashed with a write to an invalid memory location, there's not a lot anyone but Unity can do about that however.

Yes I wish Squad handled the reports better as well, there is already a system to collate player data which might be appropriated for this, but sometimes the logs can be several hundred megabytes in size, so maybe not.

Also, merged threads. Edited November 16, 2015 by sal_vager

klesh · November 16, 2015

Lots of info in that post and probably very well thought out and proper... which leads me to my question, is your name Oliver by any chance?

Pecan · November 16, 2015

Someone [URL="http://forum.kerbalspaceprogram.com/threads/139433-Why-aren-t-crash-reports-auto-submitted"]made a thread[/URL] about exactly the same thing just 2 days ago.
Oh, it was you. Is it really that important to you that it warrants a second thread as well?

Kerbart · November 16, 2015

[quote name='Temeter']That would be utterly annoying. Not only crashes the game quite easily (if not always by squads fault), but now you have to klick away some silly message? I'm already bothered by the 'this is for 1.0.x' mod messages.[/QUOTE]

In your settings you'd be able to choose from three options:[list]
[*]Do not send crash reports
[*]When a crash report is generated, ask me if it should be sent
[*]Automatically send crash report[/list]

When the game is started for the first time (the flag resets with each update) the user is invited to review his choices and clicking the "review" button opens the settings dialog and shows the above options.

That's how this can be done with a minimal of interaction from the player.

Why aren't crash reports auto-submitted?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation