Statistics question - predicting class rank from grades

sevenperforce · January 21, 2020

Law school is all about class rank, not about absolute grades. However, my school only calculates class rank at the end of the year; the midterm grades are reported as a bloc without being tied to specific people.

This means I have the total grade spread from each of my classes, as well as my own personal grades. My guess is that you can create a bell curve that predicts where a given GPA puts a student in terms of class rank; I'm just not sure how.

Any ideas?

Edited January 21, 2020 by sevenperforce

kerbiloid · January 21, 2020

Maybe something like Elo rating for chess? Based on the past results, predicts awaited ones.
https://en.wikipedia.org/wiki/Elo_rating_system

pincushionman · January 23, 2020

You say “total grade spread” but what statistics does the school actually provide you? If they present the median score (ideally; but if they give you the mean instead you might have to assume it’s close enough to the same thing) and the standard deviation (or the variance), you could assume a normal distribution (bell curve) and use that to estimate the rank of your score within the distribution.

sevenperforce · January 23, 2020

2 hours ago, pincushionman said:

You say “total grade spread” but what statistics does the school actually provide you? If they present the median score (ideally; but if they give you the mean instead you might have to assume it’s close enough to the same thing) and the standard deviation (or the variance), you could assume a normal distribution (bell curve) and use that to estimate the rank of your score within the distribution.

Sure, that would work, but I actually have much more data than that.

I know what I made in all four classes, and I know the actual discrete scores in each class. For example (these are not the real numbers because it would probably be some sort of crime to post them online):

grades/class
	Class 1	Class 2	Class 3	Class 4
As	3	2	4	1
Bs	12	16	13	5
Cs	11	9	11	18
Ds	2	1	0	3
Fs	0	0	0	1
(my grades)	A	B	A	C

I could simply take the average grade and make a bell curve, but I think that having the actual grades above would make a placement/percentile distribution curve more accurate. As an added complication, Classes 1 and 3 are 6-hour classes while classes 2 and 4 are 4-hour classes, so they each count 50% more.

January 24, 2020

Calculate the weighed arithmetic mean and the weighed standard deviation, then plug them into the equation for Gaussian curve. That's the easiest way of doing this. You should also compute mode and median as a sanity check. If they're the same as mean (or very close to it), then the bell curve should work. If they're not, this means your distribution has nonzero skew, which usually complicates things.

pincushionman · January 24, 2020

Oh, if you are given the actual counts of scores, then you already have your rank in each class, or at least a lower bound, if the real results are as granular as in the example. Your rank is literally just a count of how many scores are better than yours, or more pessimistically, how many are better or as good as yours. In this example you are (at worst) 3rd in class 1, 18th in class 2, 4th in 3, and 24th in 4. If the score categories were finer, we could get a better estimate.

As to an overall rank, I would take a mean weighted by the credit-hours of each class. If all classes are weighed equally, the estimate would be (3+18+4+24)/4=13th or better. I would hazard a guess that you are on the higher side of the large “B” and “C” sub-populations in classes 2 and 4, simply because of the presence of “A”s in the other two classes. But that’s just based on the assumption that people are consistent; the data can’t really support that.

As for predicting the final grades? Well…mmmphmyh. You’d have to look at historical trends there.

sevenperforce · January 24, 2020

14 hours ago, Dragon01 said:

Calculate the weighed arithmetic mean and the weighed standard deviation, then plug them into the equation for Gaussian curve. That's the easiest way of doing this. You should also compute mode and median as a sanity check. If they're the same as mean (or very close to it), then the bell curve should work. If they're not, this means your distribution has nonzero skew, which usually complicates things.

That's the part I'm trying to figure out how to do.

13 hours ago, pincushionman said:

As for predicting the final grades? Well…mmmphmyh. You’d have to look at historical trends there.

Nah, no need to predict final grades. I am satisfied if I can predict my current rank.

sevenperforce · February 11, 2020

Revisiting this now that I have more data.

The ranks of each class are now known.

In Class A, I am ranked # 2 (⁺/_-1) of 40
In Class B, I am ranked #10 (⁺/_-3) of 40
In Class C, I am ranked #14 (⁺/_-2) of 39
In Class D, I am ranked #2 (⁺/_-0) of 40

Based on statistics, and the assumption that most students have a similar grade distribution to my own, what is my estimated overall class rank?

mikegarrison · February 11, 2020

Exactly how are the rankings done, anyway? Are they based on A, B, C etc.? Are they based on numerical scores (eg. A is anything from 94-100, but your actual score is 96.32)? Are all classes weighted evenly or are some classes 4 units, some 3 units, some 2 units, etc. and the final ranking is weighted by unit?

What if you have a 94 score and a #2 ranking in one class and a 86 score and a #16 ranking in another class? Is your combined ranking the average of your rankings (#9) or is it the average of your scores (90) compared to the average of everybody else's scores?

Does everybody take the same classes?

There are a huge number of ways this could be done, and the results you are interested in will be different depending on which methods are actually being used.

sevenperforce · February 11, 2020

3 hours ago, mikegarrison said:

Exactly how are the rankings done, anyway? Are they based on A, B, C etc.? Are they based on numerical scores (eg. A is anything from 94-100, but your actual score is 96.32)? Are all classes weighted evenly or are some classes 4 units, some 3 units, some 2 units, etc. and the final ranking is weighted by unit?

What if you have a 94 score and a #2 ranking in one class and a 86 score and a #16 ranking in another class? Is your combined ranking the average of your rankings (#9) or is it the average of your scores (90) compared to the average of everybody else's scores?

Does everybody take the same classes?

There are a huge number of ways this could be done, and the results you are interested in will be different depending on which methods are actually being used.

The class rankings I've posted are derived from more complete data -- in some cases, the full known distribution of grades; in others, a bucketed distribution of grades. For example, I have one of only 3 As in Class A, with 40 total students, so my expected class ranking is 2 +/- 1 of 40. In class D, the professor simply told me outright that I am #2 in the class.

Since everyone takes the same classes, end-year class rankings are based on a raw total grade point score. Ranking in each individual class is a decent proxy because each class is curved to the same mean, so a #4 ranking in Class A should correspond to the same grade as a #4 ranking in Class B, and so on. The classes are weighted differently, but because of my grade distribution it will yield the same results if you weight them all equally.

For these purposes I think you can treat each rank as a discrete number of points. So #2 would be 39 points and #10 would be 31 points and so forth. My gross point total would be 135. I'm not sure how to estimate what percentile that corresponds to, though. There COULD be someone who is #1 in every class but it is very unlikely....most people probably have the same standard deviation (6.697) as me.

mikegarrison · February 11, 2020

If all the classes are weighted equally, and all the students take all the classes, and the way that the final ranking is determined is by 40 points for #1, 39 points for #2, etc. in every class, then I would just assume a normal distribution based around a mean of the number of classes times half the number of students. So the mean is 80, and you have 135.

The problem comes in estimating the standard deviation. We really only have one data point, your total. The distribution of your rankings for your individual classes doesn't really tell us a lot about the distribution of the students in your classes.

I would estimate maybe top 5. Can't be too much worse than that with those two #2s and nothing below #14.

sevenperforce · February 11, 2020

13 hours ago, mikegarrison said:

The problem comes in estimating the standard deviation. We really only have one data point, your total. The distribution of your rankings for your individual classes doesn't really tell us a lot about the distribution of the students in your classes.

If all students have the same standard deviation of 6.697 in their original scores (which, again, is only one data point but is all the data we have, and probably is pretty representative) then that should be able to tell us something about the shape of the curve.

mikegarrison · February 11, 2020

1 hour ago, sevenperforce said:

If all students have the same standard deviation of 6.697 in their original scores (which, again, is only one data point but is all the data we have, and probably is pretty representative) then that should be able to tell us something about the shape of the curve.

Not as much as you would think.

The statistics here are somewhat complicated because these rankings are constrained. You can't have everybody ranked #20. They have to fall in a #1-40 distribution.

Maybe the best way to think about it is that if you took an infinite number of classes, then your ranking in those classes would fall into a close-to Gaussian shape around some mean, and that mean would be your class rank (because infinite weighting from an infinite number of classes). So you can assume your class rank probability is currently described by a distribution (not a true normal distribution because the tails are limited -- can't be better than #1 or worse than #40) that surrounds your current average. Your current average is about #7 (135 points/4 = almost 34). Unfortunately for you, there is more room to get worse than to get better, so it's a little more likely that the true value is worse than #7 than better. Still, that's in the top 20%.

Or to put it another way, the obvious answer (your current average rank is your current class rank) is the most likely one. It's possible, of course, that the people who got #1 in each class also got #40 in another class, and so on, so that you are the only student significantly far away from the overall average of #20. But that's not very likely.

In practical terms, this is all pointless. Just do your best, and if it's good enough you'll end up with a high rank. (I am assuming your class ranking outcomes are more deterministic than random.)

Edited February 11, 2020 by mikegarrison

sevenperforce · February 11, 2020

Incidentally I must have added it up wrong -- the stdev is 6 and my score is 136. But nbd.

Constrained rankings do pose a challenge. But maybe there's a way around it. Suppose an infinite number of students, each having a certain score S = A+B+C+D where each stdev is 6 and the mean of all S is 80. That would generate a Gaussian distribution. I could then find the percent of students with a total score greater than 136, which gives me my percentile and thus can be used to determine rank.

Statistics question - predicting class rank from grades

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation