A Level Playing Field

Once a valid application has been submitted, a minimum of five expert reviewers will be assigned to score each submission. Those expert reviewers will offer both scores and comments against each of four distinct traits. Each trait will be scored on a 0-5 point scale, in increments of 0.1. Those scores will combine to produce a total normalized score. Examples of possible scores for a trait are: 1.4, 3.7, etc.

The most straightforward way to ensure that everyone is treated by the same set of standards would be to have the same expert reviewers score every application; unfortunately, due to the number of applications that we may receive, that is not possible.

Since the same expert reviewers will not score every application, we have carefully crafted an approach to ensure that each application will be treated fairly. One expert reviewer scoring an application may take a more critical view, giving any assigned candidate a range of scores only between 1.0 and 2.0, as an example; meanwhile, another expert reviewers may be more generous and want to score every submission between 4.0 and 5.0.

For illustrative purposes, let’s look at the scores from two expert reviewers:

The first expert reviewer is far more generous, as a scorer, than the second expert reviewer, who gives much lower scores. If your application was rated by the first expert reviewer, it would earn a much higher total score than if it was assigned to the second expert reviewer.
‍
We have a way to address this issue. We ensure that no matter which expert reviewer are assigned to you, each application will be treated fairly. To do this, we utilize a mathematical technique relying on two measures of distribution, the mean and the standard deviation.The mean takes all the scores assigned by a expert reviewer, adds them up, and divides them by the number of scores assigned, giving an average score.

Formally, we denote the mean like this:

The standard deviation measures the “spread” of a expert reviewer's scores. As an example, imagine that two expert reviewers both give the same mean (average) score, but one gives many zeros and fives, while the other gives more ones and fours. It wouldn't be fair, if we didn’t consider this difference.

Formally, we denote the standard deviation like this:

To ensure that the judging process is fair, we rescale all the scores to match the judging population. In order to do this, we measure the mean and the standard deviation of all scores across all expert reviewers. Then, we change the mean score and the standard deviation of each expert reviewer to match.

We rescale the standard deviation like this:

Then, we rescale mean like this:

Basically, we are finding the difference between both distributions for a single expert reviewer and those for all of the expert reviewers combined, then adjusting each score so that no one is treated unfairly according to which expert reviewers they are assigned.

If we apply this rescaling process to the same two expert reviewers in the example above, we can see the outcome of the final resolved and normalized scores. They appear more similar, because they are now aligned with typical distributions across the total judging population.

We are pleased to answer any questions you have about the scoring process. You are able to ask questions related to the scoring process on the discussion forums once you register and begin developing your application.

Fairness

Join us in supporting these promising solutions.