The Problem with Beer Scores

Discussion in 'Article Comments' started by BeerAdvocate, Nov 14, 2017.

Thread Status:
Not open for further replies.
  1. HorseheadsHophead

    HorseheadsHophead Grand Pooh-Bah (3,732) Sep 15, 2014 Colorado
    Pooh-Bah Trader

    Personally, I've recently gotten so that I don't really like even giving a rating. Why or why not should I give a beer I really like 4.5 instead of a 4.75? Or should I give it a 5? Or--if I examine its flaws--should I give it a 4? I can't decide on these numbers. Numerical gradings are arbitrary. I just want to write reviews and describe what I like or what I don't like about a beer.
     
    #41 HorseheadsHophead, Nov 20, 2017
    Last edited: Nov 20, 2017
    mudbug, rgordon, Squire and 2 others like this.
  2. dbl_delta

    dbl_delta Grand Pooh-Bah (4,001) Sep 22, 2012 Pennsylvania
    Pooh-Bah Trader

    I like numerical ratings because it shows me where a beer stands in MY ratings. Do I like this IPA better than the stout I had last week? If so, my rating should reflect that preference. Then I can sort MY beers by MY ratings and see a best-to-worst list. I couldn't do that if I only wrote reviews.
    That ability to sort also gives me some context for my numerical rating, so I can edit the review numbers to place it at the right spot in my list. (I generally don't have to do that, but occasionally I tweak a number or two to put a beer in its proper place.)
    Guess it depends on whether you're writing reviews for yourself or for others. I'm using my personal ratings as a reference for my preferences.
     
    bubseymour and HorseheadsHophead like this.
  3. bbtkd

    bbtkd Grand High Pooh-Bah (7,790) Sep 20, 2015 South Dakota
    BA4LYFE Society Pooh-Bah Trader

    The problem with beer scores is that humans are involved. At least I presume you guys are humans. You all aren't robots are you?

    This carbon unit (me) has a tendency to give the benefit of the doubt (handicap) scoring. If I know a beer is highly proclaimed but I don't like it, I write it off to my inexperience and score it higher than I would otherwise. I don't generally like the bitterness of IPA's but I still try and review the more highly acclaimed examples, thinking maybe there is one I'll like. Some have been OKish, but most just aren't my thing - but why should I punish them for my dislike of a style I'm forcing myself to try? I very rarely give a score under 3.5 or over 4.25.

    The other night I even reviewed an IPA I knew I was allergic to. I love blackberries, but am allergic to them. My wife said I was stupid to consume something I knew I was allergic to, but I figured some itchiness would be worth it if the beer was good. It was pretty good.
     
    HorseheadsHophead likes this.
  4. invertalon

    invertalon Pooh-Bah (2,249) Jan 27, 2009 Ohio
    Pooh-Bah Trader

    Lately, I have stopped rating anything on Untappd, at least. I will still write detailed reviews on here, but it's not that common.

    The only time I will rate something anymore is either (a), it is truly exceptional… A prime example of a particular style, which I rate towards. Or, (b), if something is really poorly executed. For everything else, I leave alone rating wise. The reason for this is because as a homebrewer, I get to try my beers on tap daily if I really wanted to. It’s crazy how different the same beer can taste day to day. Changes in flavor based on what you have eaten/drank or just want you are in the mood for. Being able to see how much a beer change day to day, from the same batch, served from the same keg, in the same glass (with no variance in temperature, serving style, etc…) it’s hard to be critical of any well made beers anymore. I may give something a 4 today that may be a 3 or 5 the next day. You just don’t know.

    Then, tastes change. This year, I have been loving Dopplebocks for example as where in the past, I may have rated them lower because I just wasn’t much of a fan at the time. Or a particular style of IPA, sour, porter, etc… My big focus right now is German lagers for example, I just can’t get enough of well made Helles, Dunkel, Festbier, Rauchbier, etc… Honestly, hand me a liter of a well made Dunkel or a can of fresh Julius, I’d take the Dunkel. It’s far more difficult to find well made, clean lagers than hop slurries (which I love, don’t get me wrong)…

    I will still leave tasting notes these days, but have been leaving out the ratings. I buy what I have a taste for now, not what is rare/trending/expensive/etc… Been disappointed WAY too many times chasing beers, so I’m over all that.

    Ratings were great in the beginning so I could remember what beers I like and didn’t like, especially style wise as I was still learning. But now? Not so much, it’s more of an annoyance and takes away from just enjoying the beer!
     
    zid and TongoRad like this.
  5. drtth

    drtth Initiate (0) Nov 25, 2007 Pennsylvania
    In Memoriam

    Or you could say that since there is no control for style bias that the two sets of numbers don't mean the same thing.
     
  6. drtth

    drtth Initiate (0) Nov 25, 2007 Pennsylvania
    In Memoriam

    Since this is a known problem for rating scale systems which had been understood for years there are ways to control for it and reduce of eliminate the size of the problem. It may never be as precise as some physical measurements but done properly it can be as accurate. (Also notice that in reading meters, etc. humans are involved both in settling upon what the meter assesses in the first place and in reading out the values on the meters. Lots of assumptions are made by humans in doing all this.)
     
  7. drtth

    drtth Initiate (0) Nov 25, 2007 Pennsylvania
    In Memoriam

    But what is going in in this case is not garbage, it just isn't processed in some of the ways it should be for the conclusions some folks want to draw from the Out.
     
    #47 drtth, Nov 21, 2017
    Last edited: Nov 21, 2017
    TongoRad likes this.
  8. TongoRad

    TongoRad Grand Pooh-Bah (3,884) Jun 3, 2004 New Jersey
    Society Pooh-Bah Trader

    It's not just style bias, though. I bet that the vast majority of reviews are being done by novices with a very limited perspective, and mostly being done as a means to keep tabs for their own use- so it's subjectivity within subjectivity.

    Like I said, you can maybe see some trends about what that crowd will enjoy but nothing too meaningful beyond that.
     
    drtth likes this.
  9. drtth

    drtth Initiate (0) Nov 25, 2007 Pennsylvania
    In Memoriam

    You can actually ignore most of those biases since for any given style you'll have reviewed ranging from the conscientious to casual and with enough reviews those folks will balance each other out giving us a pretty good number to work with on a single beer.

    As for the style bias the only thing to do there is not even try to compare numbers across styles and only think within styles. The numbers for RIS are simply on a different scale than the numbers for a Pils. Neither is "better" than the other. This bias could be compensated for in the number crunching, but since it is not...
     
    TongoRad likes this.
  10. TongoRad

    TongoRad Grand Pooh-Bah (3,884) Jun 3, 2004 New Jersey
    Society Pooh-Bah Trader

    Even if we stipulate that the rankings within style aren't half bad, that still reflects that a good proportion of the reviewers here are more experienced. Use that approach at Untappd and I bet the results will be different.

    Let's look at this another way. If you wanted some great Indian or Szechuan food for lunch would you trust the Yelp scores? Or would you dig deeper to see which reviewers knew their stuff? Personally, I've found credible and seriously good kathi rolls and mapo dofu in touristy areas, but only by digging through the text.

    At a certain level crowd sourcing works best as a conversation of sorts, and the focus on numbers tends to bypass that aspect too much.
     
    VABA likes this.
  11. drtth

    drtth Initiate (0) Nov 25, 2007 Pennsylvania
    In Memoriam


    Which is pretty much why both of us use the numbers only as signal posts, if at all, and focus on reading the reviews only of those whose palates we trust. :sunglasses:

    It's also why I personally ignore Untapped, etc. and look only at the info contained here where I know how to read the territory and understand the limitations and strengths of the numbers here.
     
    monkist likes this.
  12. monkist

    monkist Pooh-Bah (2,193) Dec 7, 2016 Hungary
    Pooh-Bah

    I agree, the numbers just aren't enough to describe a beer. As we, human beings (again, not robots) experience something when drinking beer, our description of it will always be made "under the influence" - but isn't it how it should be? I mean, I honestly appreciate the detailed numbering system at BA than anything else (stamp rallies like Untappd will always be a joke for me) for being able to evaluate the dimensions of a beer from multiple angles and perspectives, I just couldn't go without writing a few words about it.
    And then, somehow I happen not to be elaborating on those values but rambling on about something that inspires me in the beer - might just well be considered irrelevant by "professional" reviewers.
    But this is my way of commenting on a beer and by this would like to share my excitement about it, make it appealing for someone who is in dilemma in front of the shelf, trying to choose the right beer to buy. I often find myself doing the same and getting my best advices from BA's valuable reviews.
     
    drtth likes this.
  13. monkist

    monkist Pooh-Bah (2,193) Dec 7, 2016 Hungary
    Pooh-Bah

    I found myself again checking a beer on BA and looked at the plain scores as there were no comments left, trying to make sense to go with the (unwritten) review. What made this person give it such a low score? - I wondered. Without any words written it wasn't possible to add value to the numbers.
    I believe we should leave some remarks there too, as to connect the dots left by the points of evaluation, making it a full picture. If it was good, you could tell us why and vice versa. BA has the best detailed scoring system AND a place to elaborate on them - let alone the hardcore followers who, like nowhere else, are in for the greatness of the beer itself and not for some other interest.
     
  14. drtth

    drtth Initiate (0) Nov 25, 2007 Pennsylvania
    In Memoriam

    That’s why this site distinguishes between reviews with ratings and ratings only. As do you many of us prefer to be able to look at rhe reviews as well as the numbers.
     
    monkist likes this.
  15. gkingus

    gkingus Aspirant (217) Jan 15, 2013 Rhode Island

    In an ideal beer rating world, all beer would be rated after having on tap, before it gets pasteurized. That is where you experience the true/original flavor. I'm also not too keen on how most rating is done with all beers styles lumped together based on what people prefer across the board. In other words, you always see double IPAs and imperial stouts with the highest ratings because those are the most popular styles. e.g. You almost never see a lager with a rating in the > 4.1 range. Now I imagine some people do rate beer according to style, but the fact that some beers like, pale ales and lagers, get such low ratings is indicative of the majority of people not rating by style and just by what they like.
     
    monkist likes this.
  16. bubseymour

    bubseymour Grand Pooh-Bah (4,800) Oct 30, 2010 Maryland
    Pooh-Bah Trader

    BA is way better than Untappd for scores that more accurately reflect the quality of the beer (just IMO).

    One area of beer scores that is interesting as it relates to this point in time of our beer history, is the following scenario (I'll remove beer and breweries names but I'm sure most of you can read between the lines):

    2010 - Beer X was considered best in style and lets say has a lofty 4.7 overall rating with 10,000 reviews, most reviews coming in 2010-2013 timeframe and new ones trickling in at a much slower pace in recent years. In 2010-2013, very few brewers were doing anything comparable to what this beer/brewery was doing.

    In the more recent time frame of 2015-2017, many more brewers are making many beers with similar characteristics as this 2010 "best" groundbreaking beer, and by many people's honest opinion (forget hype), alot of these new beers are equal and/or better than the 2010 best beer with the lofty ratings.

    But because so many reviews were logged previously for the orignal best beer when there wasn't any other brewers making beers to those specs. and characteristics that people seem to love, the weight avg. for the OG beer remains very high, yet there are now alot more equal/superior beers that can't get ratings at that lofty level due to reviewers like us trying to compare/contrast so many outstanding beers against one another and creating pecking order of our preferences. If that OG beer were released in 2016 the weighted avg. score would most likley be coming in at 4.3 and not at 4.7 as there are so many beers "splitting hairs" that are a little better in various rating categories and personal preferences. Forget hype train/social influence factors, just talking about the beers if done via blind tasting.

    Just something probably unique for our moment in time with regards to compiling historic rated averages.
     
  17. Squire

    Squire Grand Pooh-Bah (4,385) Jul 16, 2015 Mississippi
    BA4LYFE Society Pooh-Bah Trader

    Our opinions would be redundant.
     
    VABA likes this.
  18. monkist

    monkist Pooh-Bah (2,193) Dec 7, 2016 Hungary
    Pooh-Bah

    Let me pose then a slightly off-topic question:
    Are we rating beers for the beers per se or for ourselves?
    I would always love to believe that I contribute to something bigger with my ratings/reviews, doing them for the greater good but then again I come to think that I'm just inflating my ego...
     
  19. mudbug

    mudbug Pooh-Bah (1,762) Mar 27, 2009 Oregon
    Pooh-Bah

    As I see it there is no "problem" with the scores on this site, where the problem rests is in the perception by some users that the scores reflect the larger world of beer drinkers, they don't. They reflect the opinions of a very small subset of beer drinkers that tend to only supply data to beers they enjoy (Hello, the disparity of reviews across the board, for instance the fact that Bud Light has only about 1400 reviews and Heady Topper, a very small brewery with very small distribution has over 13,000) Yet Bud sells a magnitude more beer than the Alchemist.
    This site uses the data supplied by that subset, therefore you end up with "Top Beers" lists that are greatly skewed by their opinions and geography
     
Thread Status:
Not open for further replies.