My recent column regarding the 100-point scoring system for summarizing wine evaluations prompted many reactions and responses, some of them quite thoughtful. This follow-up addresses a couple of the most interesting points, but if you didn’t read the original column, that might be the best place to start: Memo from the Dark Side: In Defense of the 100-Point Scale.
The following comments were received from WRO reader Kent Benson, a Certified Wine Educator based in St. Cloud, Minnesota. What follows is roughly the first half of his email message to me, which I reprint in unedited form, with Kent’s permission:
I enjoyed your thought-provoking piece defending the 100-point scoring system. While I’m not a big fan of any scoring system, 100-point or otherwise, I think they serve the useful purpose of conveying the general quality level of a wine.
I think more reviewers should do a better job defining their use of the 100-point system. It’s funny that you should say that it is a misunderstanding to think of the 100-point system as a sum of things. For its originator, Robert Parker, that is precisely what it is. On his Web site he has a very detailed explanation of how he creates a score, with so many possible points allotted for each of several components. To me, this approach seems to require much more discipline and allows for greater consistency than simply assigning a score based upon one’s “overall admiration.” I’ve never seen anything akin to it in any other publication.
It seems to me, if you are going to assign a score and claim a great degree of precision, you should provide a detailed description of how such precision is achieved. Without such an explanation, how is the reader to know if a reviewer is emulating Parker’s system, has created a different system, or has no system at all, other than assigning a number based upon overall admiration?
In response to Kent’s point, I’d first say that it is perfectly reasonable in the abstract, and also that he’s right in part to credit Parker for being explicit about his scores. However, I would assert that, as a practical matter, virtually every reviewer (including Parker himself) assigns points precisely on the basis of overall admiration, and that virtually no reviewer mechanically “builds” a score based on a formula (e.g., 5 points for color; 10 points for aroma, etc.).
To be clear, it is a very different thing to write out a general formula indicating the rough significance accorded to certain attributes like color or aroma (which is what I believe Parker has done) than it is to score wines mechanically according to a predetermined categorical checklist (which I believe virtually nobody does).
I find it laudable that Parker has spelled out the general importance that he accords to the particular attributes of wines, but I would bet anything that he doesn’t score on a daily basis according to predetermined category values.
On one hand, writing out a general evaluative formula is laudable because it helps a reader follow the primary piece of practical advice emphasized in my column, namely, know thy reviewer. But on the other hand, I’ve never known anyone who reviews lots of wines for a major publication who works routinely with a categorical system attributing points to various aspects of wines in a mechanical manner.
Of course, I could be wrong about this, and I’m prepared to eat my words if there’s a reviewer out there who will state that she or he uses a checklist routinely and scores on the 100-point scale in this way. On three occasions, I’ve tasted more than 100 wines with Robert Parker, and he certainly didn’t seem to be using a checklist or building review scores categorically. To be clear, I don’t intend this as any sort of exposé of Parker, but rather as a clarification of how things really work in practice.
At least one major international wine competition, Concours Mondial, tries to get judges to build scores in this way, but every judge with whom I spoke about the practice at the competition assigned an overall score first, and then attributed points to the various categories. I don’t doubt that a few of the many judges followed the prescribed system, but I doubt that many did, and for good reason: The end result of numerical scoring is a single number, and one can do a better job of getting that number right by scoring it directly based on the totality of a wine’s character, rather than wrestling inside the straightjacket imposed by categories.
An example may help to clarify why reviewers resist the restraints imposed by predetermined categories. If our hypothetical formula attributed 15 points to a wine’s aroma, there’s a risk that Gewurztraminer (which is inherently pungent) would hold a continual advantage over Pinot Blanc, which is much more subtle. This isn’t an insurmountable problem in itself, since there’s no reason why categorical scoring would mandate that I give more aroma points to the Gewurz than to the Pinot Blanc; if I wanted to downgrade the Gewurz because it seemed overwhelming to me, or upgrade the Pinot Blanc because I admired its delicacy, I’d be free to do so. The more important point is this: In either case, it makes no sense to evaluate aroma in isolation from the wine’s other attributes.
If the Gewurztraminer backs up all of that flowery aroma with enough extract and fruit and acidity to counterbalance it, then the wine’s aroma won’t seem overwhelming but rather appealingly expressive, and it will deserve a relatively high score. But if I get a big aromatic blast of rose petals followed up by simple flavors and a short finish, the very same aromas will have thrown the wine out of balance and detracted from its overall appeal. Judging the aroma mechanically wouldn’t result in precision, but rather in distortion.
Similarly, if the Pinot Blanc’s subtle aromas are delicate but not mute, and are followed by symmetrically delicate flavors and acidic structure and minerality, then the wine’s aromas won’t seem inexpressive, but rather appropriately restrained. However, if I smell next to nothing from the glass but then get lots of fruit flavor and an assertive finish, the wine should be downgraded because the aromas are dumb, failing to announce the ensuing sensory signals in a way that contributes to the overall impression of a complete, harmonious wine. Again, a more accurate score on the 100-point scale results from evaluating the wine as a whole, rather than scoring aroma categorically–which is to say out of context.
* * *
Kent goes on to make another point that I’d like to address, one that was also raised by Rhett Gadke, Wine Director of Bounty Hunter Rare Wines in Napa, California. Since Rhett has been waiting patiently on the sidelines, I should get his remarks (reprinted with his permission) up first:
I read your recent article was impressed by your reasoning and logic. While I would definitely include myself in the anti-100 point camp, I came away with a much better sense of why it works for the consumer. Frankly, if everybody used it, there would at least be consistency.
You thoughtfully addressed nearly all of my standard objections to the rating system with one exception: What about the wines that never (literally) get "perfect" scores because of their nature? It seems like low 90’s for a rosé or Sauv Blanc is about the equivalent of "100" for a Cabernet. Does that mean there is no such thing as a "perfect" rosé or are certain styles considered too lowbrow — sort of the movie critic’s equivalent of an action flick — to ever merit that standing?
As you’ll see from the following, Kent Benson raises a very similar issue:
I have always disliked the fact that Parker reserves a portion of each score for the ability of a wine to improve with age. For years I wondered why Sauvignon Blancs never scored over 94 points. This method of scoring seems to presume that the best aged wines are inherently of higher quality than any wine in its youth. That seems awfully arbitrary to me.
Which brings up another aspect you touched upon. A score provides a relative quality level in comparison to other wines. It seems that virtually all reviewers assign their scores relative to all other wines, instead of relative to wines of the same grape(s). This seems to be an impossible and useless objective.
I couldn’t care less whether or not a reviewer thinks this Chardonnay is better than that Sauvignon Blanc. It would make a lot more sense to me to score all wines relative to their peers. In this way, a 100 point Pinot Grigio would be a possibility. By assigning such a score, the reviewer would be proclaiming it to be as good as Pinot Grigio gets. As it is, the best Pinot Grigio scores 94 points, leaving most consumers wondering why.
I have no difficulty understanding why Rhett and Kent raise these objections, though I believe that the 100-point scale only fulfills its function of providing a summary evaluation accurately if all wines are scored against one another. There are also some serious practical problems that would arise from applying the scale differently to different grapes. Finally, I think that much of the helpfulness of the scale–especially for novice wine consumers–would be lost if the scale were disaggregated and applied separately for every grape or wine type.
First, I grant that using the scale to score all wines against all other wines will tend to “tier” the results, with the very best Rieslings scoring roughly in the upper 90s, the best Sauvignon Blancs in the lower 90s, and the best bottlings of Sylvaner in the upper 80s. But there is good reason for this. By my sensory evaluation (and not only mine), the very best Rieslings from Alsace or Germany’s Mosel Valley are–by comparison to the best Sauvignon Blancs from the Loire or Marlborough–more complex in aroma and flavor, more intricate in structure, more expressive of the sites in which the grapes were planted, and more individuated by comparison to one another. They are also (sorry, Kent) better able to develop positive characteristics with bottle age, which wouldn’t necessarily be a huge advantage except that they are also delicious while young.
I would also contend that the same advantages could be cited for Sauvignon Blanc relative to Sylvaner. A really good Sylvaner from the likes of Trimbach in Alsace can be very tasty, but it simply isn’t in the same league as a top Sancerre or Pouilly-Fumé in terms of complexity, intricacy, site specificity, individuation, or capacity to improve.
For these reasons, it seems important to me that these distinctions in overall quality and “completeness” be reflected in summary point scores rather than being obscured by giving the very best Sauvignon Blancs and Sylvaners 100 points each, and grading down from there.
I’ll note a practical problem involved in doing that in a moment, but first I should acknowledge that it could still be the case that you, as an individual taster, simply like Sauvignon Blanc better than Riesling. If that is true, it might bug you that I give 97 points to a great Mosel Riesling but only 94 to a fantastic Sancerre, but that won’t prevent those scores from being useful to you. If I’ve done my job carefully and well, you’ll find that that 94 point Sancerre is notably more complex, intricate, etc., than another Sancerre that I score at 91 points. Moreover, nothing is preventing you from buying bottles of Sancerre that I score at 92 rather than Rieslings that I score at 95. It is your preferences that should determine your buying decisions; my points conform as carefully as I can make them to my preferences. And as I’ve argued repeatedly, the prime maxim for using point scores effectively is to know thy reviewer.
Second, there are two practical problems that are involved with scoring grape varieties individually with their own 100-point potentials. One is that this could be accomplished only by granting different grapes differing allotments of what–for lack of a better term–we might call “affirmative action” points. If I’m correct that there is a natural pecking order in the inherent nobility of Riesling, Sauvignon Blanc and Sylvaner, then it is also true that there could only be a 100-point rendition of the latter two by giving them a boost, and Sylvaner is going to need a bigger boost than Sauvignon Blanc. In my view, this whole “boosting” business will involve more arbitrariness than judging every grape on a single scale.
It would also be confusing as hell. How are you, as a reader, supposed to know how big a boost I give to Chasselas or Gamay Noir? And how about the nearly 2,000 distinct varieties in Italy alone?
I recognize that my point here depends on the correctness of my premise that, “there is a natural pecking order in the inherent nobility of Riesling, Sauvignon Blanc and Sylvaner.” In fact, I’m quite confident that I’m right about that, and one of my sources of confidence is precisely the observaton made by both Rhett and Kent: Professional reviewers almost never award scores of 95 or above for Pinot Grigio or Sauvignon Blanc. There’s almost certainly a reality of some sort other than mere “groupthink” underlying that sort of score clustering. After all, reviewers who are breaking into the ranks always need to make a name for themselves, and what better way to get noticed than to champion a purportedly undervalued grape? Yet nobody can score a Sylvaner at 97 points without becoming a laughingstock, for the simple reason that there’s no Sylvaner that can plausibly support that score.
(Parenthetically, I have other reasons for being confident that I’m right about there being a natural pecking order in the upper-end capacity of different varieties. I love both Riesling and Sauvignon Blanc, and some days [or with some foods], I’d much rather drink a Sauvignon. But nevertheless, doesn’t it mean something that nobody ever oaks Riesling, whereas Sauvignon Blanc is often fermented and/or aged in barrels? Likewise, doesn’t it seem to say something important that Riesling is virtually never blended with anything, whereas Sauvignon is often blended with Semillon or Muscadelle or something else?)
There’s another, equally daunting problem that would be involved in setting up separate scales for different grape varieties. It stems from the fact that wine style and quality isn’t a matter merely of grape variety, but also of growing location and production technique. For this reason, a grape like Chardonnay has an importantly different style if grown in a cool climate like Chablis in Burgundy than if grown in Australia’s Hunter Valley. If we start creating separate 100-point scales for different grapes, where will it end? Why not create separate scales for cool and warm climate Chardonnays, so that the best wine from the Hunter can earn a perfect score? And let’s not forget the factor of production technique: Should this lead us to set up separate scales for wooded and unwooded Chardonnays? Should we break off unwooded Chards from cool climates and score them separately from unwooded ones from warm regions?
And while we’re at it, what about all those grapes in Italy? Shall we add to them all the varieties that exist elsewhere, and multiply that figure by all the production techniques, and multiply that total by the number of all the growing locations around the world, and create separate scales for all of them?
You get the idea.
Of course, no one would suggest such an outcome, and I don’t mean to foist it upon anyone as though they had suggested it. I’m simply intent upon thinking the problem through to its logical conclusion to show that a single scale makes much more sense than a multiplicity of scales.
One last point and I’m done. Rhett and Kent are both wine professionals, and I’m sure that they’d be perfectly capable of interpreting my scores and using them regardless of whether I was employing a single 100-point scale for all wines, or separate scales for different types. But what about the novice buyer? How is the impoverished, inexperienced graduate student who is just falling in love with wine supposed to know how to spend her money if I go by “affirmative action” with separate scales, and give 93 points to an exceptionally good Beaujolais and the same 93 points to a very fine but not stellar Premier Cru Vosne-Romanee?
The first wine costs, $15, but the second costs $60. A student with very little money is going to be strongly tempted to assume that only a chump would buy the Vosne-Romanee rather than four bottles of the Beaujolais. That is likely even if I try to explain that Pinot Noir from the Côte d’Or in Burgundy is inherently more noble than Gamay from Beaujolais, and that I’m using different scales to keep from imposing relatively low scores on Beaujolais. That would be hard to explain, but easy to overlook or forget. It would be better in my view to score that Beaujolais at 90 and the Vosne-Romanee at 93, indicating accurately that the latter is a more complex and complete wine, and letting our grad student decide whether the quality differential is worth the cost difference.
She might decide that she doesn’t have enough money to try the better wine. Or she might decide that it is worthwhile to splurge to find out for herself how much better the supposedly better wine really is. That must be her choice in any case, and I think I can inform her decision much more effectively by emphasizing the quality difference between the wines on a single scale rather than by blurring it with two different scales that don’t stand in a clear relation to one another.
* * *
Thanks to Kent and Rhett for their messages. If others among you would care to join the fray, please write to me at [email protected]