posted: 2023-12-08

On Rating and Bias

I am not a perfect human with a perfect scoring system. I change my mind on a semi-regular basis, I am constantly learning new things, experiencing new things, and deriving new conclusions as I go about life. I do not have a truly rigorous scoring system that I use to give a numerical score to games. I do find these systems to be somewhat reductive. A number-based rating system only communicates so much information. A lot of information tends to be "folded into the rating" and as a result, some of the nuance of a rating choice is lost. In my context, this is something I sourced from my experiences with [[Yume Nikki]], where my rating for it is like a 78. I really enjoy it as an art piece, but as a game it's not terribly complicated or interesting to play for along period of time.

I rate games on a pseudo-continuous scale from 0 to 100. I could've gone from 0 to 10 and allowed decimals, but then I figured I could just multiply my scores by 10 to give the whole scale more meaning. A scale of 0 - 10 is too coarse, If I have so many games rated as a 7/10, there's an illusion that I feel similarly about all of them, when in reality it's just not enough to capture every nuance in the dataset. A scale from 0 - 1000 gets a little too fine, what game could I possibly justify a rating of 792 / 1000? Why 792 instead of just 790? See what I mean? It becomes an over-defined parameter, and in the process becomes even more arbitrary in my opinion.

I'm not going to beat around the bush, the fact of the matter is that my median rating is an 81.00 exactly as of time of writing (December 8, 2023). This means that 50% of my game scores are between 81 and 100. I suppose this leans into the question: "but how do you justify a rating?". Truth is I usually don't have a truly rigorous answer. Jacob Geller in his video "The Best Ten Games of 2023" poses the question "where do you start? Do you start playing a game with the assumption that it's perfect... and then subtract points for each of your criticisms? Or do you start at zero and award a game every time it does something that impresses or delights you?". The truth is, I don't have a starting point or a way to give out points. I suppose I try to remain fairly neutral going in and then attempt to quantify more positive or negative, or neutral feelings once I've reached some arbitrary threshold.

At some point, I started doing a more list-like comparison where I would compare the current game against other games with the same rating, while asking "does it compare?". There are therefore some anchor points: I know my favorite five games, and as such they take up spots with Ratings: 100, 99, 98, 97, and 96 respectively. Once we reach ratings of 95, however, we start to see some ties. This is how the system works: "Yes, I do think I like all of these games about the same, but they're not as well-liked as a game with a 96 rating". And how do I know that these are equivalent in some capacity? Strictly on vibes. I try my best to take into account some general sense of objectivity, but at the same time, what makes a game "Good"? This is a question I may eventually return to at some point, but the fact is my favorite game of all time has a Metacritic Metascore of 80 and a user score of 8.6, so of course there's going to be areas where I disagree with the faceless masses. So, objectively some near "perfect games" will get scores more around a high 80, and that doesn't mean I don't think they're these instant mega-classic games. I just didn't like them as much as I did with games to which I gave higher scores.

The 0 to 100 scale is pretty easily understood by outside parties. It can be correlated to percentages and thus academic grades, for example. However, this analogy to letter grades, that a game with a score above a 90 gets an "A" or a game below a 60 gets an "F" is hardly accurate. As stated before, I have given scores in the mid-to-high 80s to game I had a really good time with that are objectively good games (see Okami, and Super Metroid, both at 88). I've also given scores in the lower 60s to games that are still really well made games that I didn't enjoy as much (see Super Mario Sunshine, 61). If I finish a game in the first place that's enough for me to give the game a score of 70 -75 (I am doing my best to quantify "average positive experience" around these values), which in the United States grading standard, is a "C". A "C" is a good and fine grade that will allow you to graduate, but will somewhat limit your opportunities beyond the academic institution on the basis of being "average". This soapbox rant about grades is better spoken elsewhere, but I would hardly consider games like New Super Mario Bros. or Pokemon Ultra Sun, both of which owning a 75 score, to be games I didn't enjoy. They were pretty good, not my favorites by any stretch of the word, but I wouldn't say I didn't enjoy them. It's when scores fall below 60 that it's possible to say I have a truly mixed-to-negative opinion of a game. It's the games that I didn't quite finish, but played enough to form an opinion of, or games that I played through out of perceived necessity or obligation, or games I played that were bad clearly but I didn't know any better until reflecting upon them years later.

This adds another wrinkle to some of these ratings: many are done with the power of hindsight. I have often applied retroactive scores to games I have not played in a very long time, games like DK King of Swing, a game I haven't played since Thanksgiving 2011. All I can go on are the memories I have in my head, there is a "Real Gap" in the fossil record here. I have no notes on my gameplay experiences with DK King of Swing and so one might ask about the legitimacy or justification of the score of "50". It's difficult, but I can only make the fallacious argument that "if I liked it more, I would've come back to it as an adult". There is, of course, the seemingly forever-open window of opportunity to play DK King of Swing at some point down the road, but then again, why do that when I could play something that seems more interesting? It's a roundabout argument that doesn't go anywhere from an objective basis.

The other case to consider is that there are too many games to play in a person's life. As I previously noted, the fact that I finish a game in the first place is ground enough usually to warrant at least a score of 70. I would love it if my distribution one day was quantified such that a score of 90 was above one standard deviation from the mean rating, though I feel that in order to do that I would have to play games that I don't like, and let's be honest with ourselves, why play games you don't enjoy? This is a major source of bias in my distribution as a result. I've been doing my best to include some of the more incidental experiences with games, one-time plays from the FPGA-tour channel in the Action Button Gonlin Bunker, for example. There are definitely plenty of games I've played once and forgotten about from my time keeping track of everything that would easily populate the lower levels of the entire chronology plot. But I've forgotten them over the course of time, no notes or records ever kept. I am also doing my best to move forward in this venture.

The justifications for each game get more rigorous as the number of games I play increases, or at least that's what I tell myself. I think about Stathead's "Approximate Value" for NFL players, their attempt to put a single number on the seasonal value of a player at any position from any year, when I assign a number. Can I really give Sonic Mania and Rhythm Heaven Fever the same score (82), despite them being radically different in terms of gameplay to the point where it's almost like comparing the quality of an apple to the quality of a watermelon on the basis that they're both fruits? Yeah. Sure, I can. It's my list and I don't need to be completely objective. In just as much sense as the timing of when I played a game tells a story, so too does the score I give it. These scores are entirely biased and are not rigorous beyond a simple comparison based on "vibes", so take them with a grain of salt. I didn't enjoy Kingdom Hearts, but I know someone else did. Does that make my personal ranking any less valid? I should hope not. I can only hope the reasons why I came to the ranking it gets are "valid". Ultimately, a score is a reductive encapsulation of an experience, but at the same time, it's a quantitative way to explore how one experience generally compares to another. It is therefore up to me to explain my scores, to open the box and let out all the snakes, to shed a little more light on my self and my experiences.