Are beer ratings biased?

Discussion in 'Beer Talk' started by BeerPugz, Apr 21, 2017.

Thread Status:
Not open for further replies.
  1. CrimeDog

    CrimeDog Zealot (749) Dec 31, 2015 New York

    Great post/question....

    Earlier today, I bought a 4 pack of Putting Out Fires from Sand City which got me thinking about the same thing (I think)...

    "Fires" is an awesome session, so in my opinion it should be judged against other sessions - not big DIPA's etc...

    But at the same time I understand someone elses post about how much they enjoy the beer in front of them at that moment....damn...Im rambling now...

    Have a great weekend everyone #LGM
     
  2. drtth

    drtth Initiate (0) Nov 25, 2007 Pennsylvania
    In Memoriam

    Agree, "rating to style" on a site such as this one is basically flawed, if only because almost none of us have the training or experience to do it well and virtually nobody reviews blind. I'd say that's why both @TongoRad's comment and the instructions on this site about "keeping style in mind" when rating are so important.

    But I would suggest that there is indeed a standard for an Helles lager or a different standard for a Stout. But for each currently used/definted style there is not a single point on each dimension used in categorization that is the only target. The categorization dimensions are fuzzy dimensions with each dimention having a range of values associated with or "typical" foreach style. For example, there is a range of ABVs that are said to typical for an APA but there are examples of APAs that fit most or all of the other dimensions of categorization but that have an ABV that falls outside the normal range. Etc.
     
    TongoRad and zid like this.
  3. zid

    zid Grand Pooh-Bah (3,132) Feb 15, 2010 New York
    BA4LYFE Society Pooh-Bah Trader

    I mainly agree, but to a point :wink:. To take it a step further - if standards of beer styles are ranges rather than points, and one is rating to style, then any beer that falls within those ranges w/o flaws (flaws can even be subjective as well) would be a perfect "5." If the ranges are large, then there's little point in using that approach. Rebel IPA, Redhook Long Hammer, Harpoon IPA, Racer 5, and Two Hearted might all fit within the acceptable ranges for American IPAs (and if they don't, then perhaps there's an issue with the ranges rather than the beer), but should everyone rate them evenly?
     
    TongoRad likes this.
  4. rgordon

    rgordon Pooh-Bah (2,701) Apr 26, 2012 North Carolina
    Pooh-Bah

    If nothing else, various rating (systems) are an interesting yardstick for current trends. In the wine world, if, say, "Barnstormer" Zinfandel scored a 95 at the Cloverdale Fair in 1995, that 95 score redounds into eternity on bottle neckers, box graphics, and company propaganda. I take almost all ratings with a grain of salt. There are some voices that I do trust. Ratings are indeed biased.
     
    dennis3951 likes this.
  5. drtth

    drtth Initiate (0) Nov 25, 2007 Pennsylvania
    In Memoriam

    Well, first lets abandon the idea of "rating to style" which is different than rating "with style in mind" and I think we can agree that the former will never be the norm on sites like this, if only because there is no way to force people to do blind ratings.

    Next let's factor in the fact that people themselves have different patterns of behavior when it comes to using rating scales, regardless of what is being rated. There are some folks who almost never use the extreme scores and there are others who seldom use the values in between. So a "perfect" 5 across all raters is simply not going to happen, and is an unrealistic expectation about human behavior.

    Sure, if the ranges are wide rather than narrow there's likely to be a lot more "slop" in the system but that doesn't really mean they are not useful or that there can not be multiple beers falling into the "equivalence category" created by the range and still display differences within that range. Further, that doesn't mean their scores/ratings, etc. are not useful.

    Rather it means people have to live with variability. Which is actually something we do regularly and quite well in many circumstances. To use a loose analogy, we often speak about people who are "redheads" and are well understood when we day that, but there is acutally wide range of shades that fall into that category. To do allow anything else would put us in the position of having to have a different category name for every individual different shade and the location of a line would still be difficult to decide.

    So our choices boil down to either a few potentially imperfect categories with fuzzy boundaries and ranges that may be of different widths or it leaves us with an almost totally unworkable multiplication of descriptors, each referring to a different category. In addition we'd still have a problem determining which criterion of difference-sameness to use. For the example of hair color, would it be human ratings limited by the sensitive of the human eye and differences in terminology, or would it be the unambiguous wave length of light which can be assessed very precisely but contains many subdivisions not detectable by the human eye.

    All of which is a long, overly detailed way to geting to "No, we should not expect every individual to rate exactly as does every other individual."
     
    #65 drtth, Apr 21, 2017
    Last edited: Apr 21, 2017
    SFACRKnight likes this.
  6. Junior

    Junior Pooh-Bah (1,883) May 23, 2015 Michigan
    Pooh-Bah Trader

    Not really bias, but I find it interesting that the average rating on this site is somewhere near 3.75. Shouldn't it be closer to 2.5 or 3.
     
    SFACRKnight likes this.
  7. drtth

    drtth Initiate (0) Nov 25, 2007 Pennsylvania
    In Memoriam

    Only if there is a normal distribution of ratings centered around one of those numbers. (3 since there is no 0 score to be assigned.) But that requires assumptions and/or procedures not in place.
     
  8. Ninjakillzu

    Ninjakillzu Initiate (0) Oct 5, 2015 Washington

    Bias will always exist. My ratings are almost always skewed high, because if I like a beer's aroma/flavor, the rating for those will never go below 4.5. It's not often that I dislike a beer. My ratings are usually a combination between flavor and style correctnesd.
     
  9. SFACRKnight

    SFACRKnight Grand Pooh-Bah (3,348) Jan 20, 2012 Colorado
    Pooh-Bah Trader

    That's funny, I find rare beers never live up to the hype and I usually rate them lower. KBS was one of those beers. I had just built it up in my head and when I finally tried it I was let down. Same with red poppy. Same with a bunch of others too.
     
    cavedave likes this.
  10. 1ale_man

    1ale_man Initiate (0) Apr 25, 2015 Texas

    For what it's worth, I really "listen" to what I read in the ratings on this site! All of you have way more experience than I. These ratings may be somewhat skewed, but I still "listen". We all have our favorites, and no matter how hard we try, we will lean toward them! So please!!! Don't lead me too far astray! By the way. I don't rate porters and stouts! Not my favorite! Y'all keep on rating!!!!

    Cheers:slight_smile::grinning:
     
    cjgiant and zid like this.
  11. StoutElk_92

    StoutElk_92 Grand Pooh-Bah (4,045) Oct 30, 2015 Massachusetts
    Pooh-Bah

    The American IPA is nothing like the original IPAs. Why does a beer need to look traditional to be a 5? If it looks good it looks good that's about it. What if the modern interpretation of a style looks better than the original traditional interpretation? Do we rate it lower because it's not how it is suppose to look?
     
    Squire and cavedave like this.
  12. andy712

    andy712 Initiate (0) Jul 23, 2016 Oregon

    Well, I guess that's what I am trying to get at. Is there any objective measure of appearance or is it entirely subjective? E.g., I like NE IPAs; this looks like a NE IPA; I therefore rate it a 5 in appearance. If we don't "rate it lower because it's not how it is suppose to look" what's the point of rating appearance? I suppose it's a matter of deciding what the criteria are. Maybe NE IPAs are an anomaly, or at least a substandard with its own criteria for appearance. In the end, it's a pretty minor component of what makes a good beer, I guess.
     
  13. drtth

    drtth Initiate (0) Nov 25, 2007 Pennsylvania
    In Memoriam

    It is, as you say, a matter of deciding what the criteria are. In the case of the NE IPA does one use the long standing tradition of clarity or allow murky appearance as a good thing. That really won't be resolved until the larger debate over whether NE IPA is a new style or not has been resolved. The statement of using different criteria should, however, be made by the reviewer if they choose to ignore the long standing criteria of clarity and a fluffy white head that leaves lots of lacing. These did not come about arbitrarily, if only because a fail on either or both might be an indication of a brewing flaw.

    However, this appearance business is not such a minor matter that we should dismiss it out of hand, the way some folks do as being irrelevant to their enjoyment of the beer. (And which you are clearly not doing in this thread.)

    Appearance can have an effect. Not only can it indicate a brewing flaw it can be shown to have an effect on people's expectations of flavors and their enjoyment, etc. This is one reason food production companies actually worry about such things as clarity in some beverages.

    http://www.foodbusinessnews.net/articles/news_home/Beverage_News/2014/01/The_value_of_beverage_clarity.aspx?ID={1DB752F4-D4F8-466E-B250-4FD4AEE47479}

    Until the debate is resolved about whether the NE IPA is "a new style with it's own criteria or still part of an older style that violates one or more widely accepted criteria," my own approach is to be quite clear in my review about what my reasons are for assigning the numerical value that turns my subjective experience into an objective number. That way any one who happens to read the review can make their own judgment about what I've done and why.
     
    #73 drtth, Apr 22, 2017
    Last edited: Apr 22, 2017
    cjgiant, papposilenus and utopiajane like this.
  14. TongoRad

    TongoRad Grand Pooh-Bah (3,884) Jun 3, 2004 New Jersey
    Society Pooh-Bah Trader

    I look at it more like multiple dimensions rather than larger ranges. Some beers are delicate, some are bold, some are inventive, some are classic, and on and on. All can be seen within the construct of keeping style in mind.

    So, although I don't think all of the beers you listed are on the same level, I do allow for multiple beers being on the super elite (not "perfect") 5 level.
     
    utopiajane likes this.
  15. drtth

    drtth Initiate (0) Nov 25, 2007 Pennsylvania
    In Memoriam

    Hmm, but if you allow for multiple dimensions rather than ranges, where is your either/or dividing line between delicate and bold? :slight_smile:

    I think you've clarified part of what I was hoping to say as well. There are both the problem of deciding which dimensions to use and how broad the dividing line (i.e., the range) is to be.
     
    TongoRad likes this.
  16. TongoRad

    TongoRad Grand Pooh-Bah (3,884) Jun 3, 2004 New Jersey
    Society Pooh-Bah Trader

    To really simplify it, the question is "what are they going for, and how well did they do? ". In reality, though, the answer isn't all that simple to answer, and can take a few times with a new beer to wrap ones head around it.
     
    cavedave and drtth like this.
  17. dlcarst

    dlcarst Zealot (733) Aug 21, 2015 Missouri
    Trader

    Absolutely, although I've found myself becoming more balanced lately. I used to give 4.5+ ratings to double IPAs and coffee or barrel aged stouts, now I've found myself giving (only!) a 4 for highly rated/hyped beers and 4.5 for a superb Irish stout or Vienna lager. However, I still can't seem to break out of rating almost every single beer I drink 3.5-4.5. Part of it is that I know how to avoid mediocre beer, but when I look at my ratings I often can't believe that I rated two beers so close to each other when I clearly enjoyed one much more.
     
    Squire and utopiajane like this.
  18. Giantspace

    Giantspace Grand Pooh-Bah (3,043) Dec 22, 2011 Pennsylvania
    Pooh-Bah

    I rate by how I like what I am drinking. If drinking a Hefeweizen I think to the Hefeweizen I have had before and compare. If it's a style I don't know about I will still rate to my taste , correct to style or not. I try to read reviews before using the scores to decide if I would buy.

    I use a 3.0 score for PBR as my base for all beers.

    Enjoy
     
    Squire likes this.
  19. StoutElk_92

    StoutElk_92 Grand Pooh-Bah (4,045) Oct 30, 2015 Massachusetts
    Pooh-Bah

    What I'm trying to get at is I think we should just rate based on how good it is to us, in this case looks. I've had hazy unfiltered IPAs that don't look nearly as good as say most Trillium or Tree House beers. If you like it you like it, never mind what it's suppose to be like. If it's good it's good. That's how I feel about rating.
     
  20. Beer_Economicus

    Beer_Economicus Pooh-Bah (2,698) Apr 8, 2017 Ohio
    Pooh-Bah Trader

    I have read through all these posts, and I feel like everyone is providing solid feedback. Here'S my take (with some overlap to previous posts).

    There will inherently be bias for myriad reasons, most notably:
    1) Everyone is not starting at the reference point. Some people come into a review with 50 beers, some with 250. This will dramatically alter your perspective.
    2) Everyone has different tastes. I do not mean that some people like IPAs and some don't, I mean literally that taste is different. Someone that says "I have rarely had a beer that I cannot drink" will likely have higher avg scores than someone who is significantly more selective.
    3) regarding #2, if you only drink "high regarded" offerings, your avg score will also be different, as you have more high end beers to compare to (even if still comparing to high end beers)
    4) people are likely not only comparing within "style", but also within class. I can see a lot of people comparing KBS largely to other "well regarded BA stouts" rather than just other "BA stouts." This will make a big difference.
    5) There is no guide for people to use for rating, and as a result most of the scale does not get used. One person may use 1 as "the worst beer ever", while someone else might say that their worst beer ever was still drinkable, and therefore a 3. Same silly logic you see in graduate school grades where you see clumping at 90%. Tells you very little about where everyone falls compared to the others in that group - instead it just says "competent."

    Few others, but these seem to carry most the weight (for me).
     
    Phoodcritic, cjgiant and StoutElk_92 like this.
Thread Status:
Not open for further replies.