Judging Gymnastics Is Basically Like Being In A Creative Writing Workshop But With Blue Blazers

A Q&A with Hardy Fink: Part Two

In Monday’s newsletter, which was for subscribers only, I published the first half of my Q&A with Hardy Fink, a former Canadian gymnast and judge, and the man who helped create the open ended scoring system that is now used to judge international elite gymnastics. Fink is one of the smartest and most knowledgeable people I’ve ever had the privilege of speaking to about the sport. To read what he said in Part One, please subscribe. Until the end of this year, the “beginner’s luck” price will be $5/month or $50/year. 

A little preview of what I have planned for the end of this year going into the next—an interview with everyone’s favorite non-Olly Hogben commentator, Kathy Johnson Clarke, about the upcoming women’s college gymnastics season; a Q&A with Deanna Hong, the sport’s best videographer; a look at gymnastics in art because gymnastics is art and sport. And more! Thanks, as always, for supporting the weird things I write.

Also, ‘tis the season to give your favorite gymnerd a subscription to this newsletter. If you wish to gift one to the gymnastics obsessed person in your life, please click below.

Give a gift subscription


Back again with Hardy Fink. As I mentioned in the first newsletter, Hardy sent back such detailed and comprehensive answers that I decided to divide them into two posts. As with the first part in this series, today’s ventures deep into the gymnastics weeds. I will do my best along the way to interject with notes with additional explanation. If you are still confused, please feel free to ask questions in the comments and I will do my best to answer them.

The first part of my interview with Hardy was a little bit of an origin story—when and how he first noticed the problems with the Perfect 10, what the process was like to try to shift to a new scoring paradigm, and all of the ways that the change fell short of what he hoped it would be. The International Gymnastics Federation (FIG) only jumped halfway across the stream and still hasn’t made it to the other bank. (It probably never will.)

Part Two isn’t quite as focused as the Part One on any one particular topic. This newsletter, as you will see, hops around a bit. But the part of the conversation that interested me the most was Hardy’s parsing of the psychology of the judges and why they give the execution scores that they do. For him, it’s partly due to confidence, or the lack thereof. What he said reminded me of what it was like to be a graduate student in a creative writing workshop. You’re so insecure and so desperate to prove yourself as a writer and a critical thinker that sometimes you go overboard with the criticism.

For those of you who have never taken a creative writing workshop, it goes like this: every week a few people have to present their work for the class to critique. Usually, you send it out a few days before the workshop so people have time to read it, process, and have comments ready. During the workshop itself—at least the one I’m about to talk about—the writer has to read a small part of their work aloud and then stay silent as everyone in the class gets a chance to weigh in on what they thought of the piece. It is excruciating. On the days in grad school when it was my turn to be critiqued, I couldn’t eat a thing. 

In most of the workshops I took in college and grad school, the instructor usually waited until everyone had their say before weighing in. This way you spoke more freely and didn’t change your critique based on what the teacher said. In one workshop, after the entire group spent the better part of an hour tearing a very good essay to pieces over very minor faults, the instructor, when it was finally his turn to speak, said simply, “I like it. I think it’s really good.” He then went onto say that sometimes when a piece is 80 or 85 percent there, you just have to leave it alone because when you start tinkering in order to get it to perfect, you end up unraveling the whole thing. Also, he added, if we had read this piece published in a newspaper or magazine, we probably wouldn’t have seen the same faults. We would’ve decided whether we liked it or not, agreed with it or not, but we wouldn’t have been looking for ways to make it better. There’s something about a piece being in an unpublished state that makes you feel like it be can be endlessly tinkered with even if it’s already really good. 

It would’ve been hard for most of us to come right out and say, “This is great!” when it was our turn to comment. We lacked confidence in our critical abilities. (I know I did.) We saw it as our job to find fault, which it kind of was. But I think we also wanted to demonstrate to our peers that we were smart and perceptive. How smart would we appear if all we had to say was, “This is good”? Not very. 

Our teacher, however, had nothing to prove to a bunch of aspiring writers. He could say the essay was good if that’s what he believed it to be. (I just want to add that it is definitely possible that some people weren’t chewing out the writer out of insecurity but because they thought the piece was bad.) And helping someone become a better writer doesn’t just mean pointing out their mistakes; you also have to let them know what they did well so they don’t ruin the good parts in their attempt to fix the problems. 

While this is not perfectly analogous to gymnastics judging, I think there are definitely similarities (which Hardy will get into in greater detail in one of his answers below). It takes confidence and a belief in your own competency to say that the routine you were just judging had no mistakes or just very minor ones when you know that your fellow judges are probably not going to do the same. And when there are penalties if your score doesn’t hew closely to the scores that the other judges give. Besides, as a judge on the execution panel, it’s your job to find deductions. 

The last judge to say “no deductions” in world and Olympic competition was Canadian Chris Grabowecky. He did this after Xiao Qin, an exceptional Chinese gymnast, performed on pommel horse during qualifications at the 2003 world championships. When I interviewed him for my book, he said he was nervous when he submitted his “no deductions” because he knew that he would most likely be the only one who had done that. (He was right. No one else had.) He said he was nervous about it but he did it anyway because he believed that to be correct. You see how confidence and belief in your own competency plays a role in being able to toss out high scores.  

Anyway, enough about my thoughts on this topic. Let’s get to what Hardy thinks about about judges’ competence and confidence (and more). Hardy’s answers have been lightly edited for clarity.


Dvora Meyers: During worlds, I noticed how gymnasts were getting within a few tenths of Simone Biles on vault in terms of their execution scores, which is ludicrous because her vault execution is so far ahead of everyone else's, even when she has landing deductions. It seems like the E scores kind of clump together at times. What, in your opinion, contributes to the clumping of E scores? What has to be done in order to address that?

Hardy Fink: Echos of McKayla Maroney at 2011 Worlds. Some things don't change. I have written often about what I call the lack of the 5 Cs for evaluation—lack of competence, lack of conscience, lack of confidence, lack of Code, and lack of control. Long story, but these five factors interact in interesting and complicated ways and their interaction is easily disrupted by changes in the control mechanisms in place.

[Ed. note: When I spoke with Hardy at the 2014 world championships in Nanning, China, he brought up Maroney’s 2011 vault in team finals as an example of a skill or routine that had been lowballed on the execution score.] 

The judges are currently under incredible duress to agree on some "expert score" or the average. The FIG has imposed a culture of punishment with a demonstrably non-valid judges evaluation program and a readiness to give written warnings or worse based on that program. This combined with the 1-3-5 deductions that potentially can quickly separate the scores of two judges or separate their scores from that of the criteria score forces judges to fear giving extreme deductions and also to fear giving high scores.

The combined result is that judges are not rewarded for judging; only for agreeing. In men, for example, a great routine will score 9.3 and with that score you will never be wrong; a good routine will get 8.7 and with that you are never wrong. It is those scores that provide the greatest safety for judges in the permitted deviations. The situation is similar for women but with even lower scores. In a real sense, judges no longer judge, because self-preservation has become more important. The frequent extreme corruption of decades ago has been replaced by extreme external coercion. 

It has to be a concern to everyone in the sport that the best women's routines ever done in the history of this universe are lucky to score an 8.6 or 8.7. Except for vault, there was not a single score of 9.0 in Stuttgart. How is that logical? How is that promoting our sport. How was this allowed to happen? Or more correctly, how was this forced to happen? It is insane. And when I hear that we have lost the "magic 10" because of the D/E [difficulty/execution] separation, then I have to ask "Why was it really lost?" Don't blame the D/E separation.

That aside, the occurrence of a central tendency of the scores has been known for decades; I wrote about this already in 1974. Among the reasons for this, is that it is easier to find a mistake for a great routine and not find all the mistakes in a lesser routine. Moreover, it is more comfortable for a judge to avoid giving extreme scores.

And, perhaps, in the case of Simone, there is an undercurrent (conscious and subconscious) of not permitting someone to be so dominant for many unsavory reasons that I won't bother to enumerate. 

DM: One of the things that made headlines during worlds was the rating of Simone Biles' dismount. Do you think that the women's technical committee (WTC) rated it correctly? Why do you think the skill was evaluated the way it was?

[Ed. note: In gymnastics, each skill is given a letter value. As are the easiest, Bs a bit harder, and so on and so forth.]

HF: The problem is maybe not so much the value of the dismount as the WTC's failure to follow a consistent pattern of upgrading elements and of the differences between the same element on floor and beam. Many have asked me about this. Here is what my answer was to the FIG General Secretary who asked for my opinion after the FIG received protests:

Over history, the values of elements with an additional twist have been most frequently increased by one value. This dismount with one twist is a G (even in the harder piked position). 

The problem, however, from a certain perspective, is that beam dismounts could reasonably be expected to have one value higher than the same element on floor. The WTC did not follow a constant pattern for this. The full twisting [double back] on floor is an E; on beam it jumps two values to G. The double twisting [double somersault] on floor is an H which is three values higher than the full twist on floor and exactly the same value of H on beam. Personally, I think they went too high on floor by jumping two categories and then that makes the beam value look unfair.

A more reasonable progression would have been for full twist E on floor; F on beam; double twist F on floor (a triple twist is already being done by Simone Biles so where will that go beyond H?); and a double twist could then be G on beam. The WTC have blocked themselves in by not having many places to go except to run up the alphabet.

So again, here is the sequence on floor: Double back tuck or pike = D; double back tuck or pike with 1/1 = E; double back tuck 2/1 = G (why not F?)

And on beam: Double back tuck = D; double back pike = E; double back tuck or pike 1/1 = G (why not F?); double back tuck 2/1 = H

So, the 1/1 twist between beam and floor are 3-values apart; the double twist between floor and beam is the same value. Not very logical, but the main problem is not the beam value; it is the floor values (I think) and then that beam dismounts should be 0.1 higher (floor - D-E-F; beam - E-F-G—no Hs for this group of elements so that triple twist can be G on floor).

That being said, there have often been additional considerations for giving values—higher value because the next 1/1 twist increment is much more difficult; lower value to reduce danger, risk or overuse; lower value because the element is not desirable in the sense that the TC [technical committee] does not want to see it done frequently; lower value because everyone is doing it; higher value to promote the element; lower or higher value to balance things like potential D-score on one apparatus vs another, to promote or balance artistic element values over acro elements, and on and on.

Personally, I think the WTC has too quickly run up the alphabet for acro elements. For example, this element is an E for men; a triple salto is a G. By doing so, they have possibly over valued acro elements and left themselves no place to go except I-parts, J-parts, K-parts (Biles with triple twist; maybe a triple salto in the future). I believe that the distribution of all gymnastics elements including future elements and the most basic "age-group style" elements should be able to fit within 10 difficulty categories (J-parts).

DM: There's precedent for undervaluing skills in order to discourage them for reasons of safety. It was clearly ridiculous for the WTC to cite that as the reason for the evaluation of Biles' dismount since she looked safer doing a double double off the beam than many gymnasts look when they're doing relatively easy double pikes. But the Produnova vault was recently downgraded after several attempts ended up in event finals despite very dangerous and scary falls. And back in 1995, Liu Xuan's one armed giant was given a lowly C. Do you think it's appropriate to undervalue certain skills in the interest of safety? Under what conditions? How do the technical committees balance the need to reward athletes for their innovation and daring with the need to keep them safe?

HF: I think I addressed this question above in the second to last paragraph of my answer to the FIG General Secretary.

Personally, this changing of values—in some cases every cycle—should be unnecessary except in the case of extreme mistakes in first attributing a value. 

Men used A-B-C long before women (medium-superior—remember that pre-1980?). So, if I can use men's examples from the '60s—these elements and many others have had a C-value for over 60 years—Russian giant, double back, double twist, Diamidov, back toss, felge, stutz kehr, inverted cross, cross pull out, planche on rings, giant swing on rings. Many other A and B parts have also never changed. When they started to run up the alphabet after 1984, they did not fix the values and kept adjusting. In my opinion it is possible and necessary to set permanent values. They did not take the trouble to take a step back for a full overview of where the sport is and where it will go. But some new values were, by necessity, also maintained—double straight, double back with 1/1, etc.

DM: I've been looking at the sports that have been added to the Games in recent years and the majority of them tend to be value judged sports. Why do you think the IOC is bringing more judged sports into the Olympics? 

HF: I can't pretend to speak for the IOC, but it is an open secret that Thomas Bach [president of the IOC] wants to incorporate more sports that are attractive to the young. The "objectively evaluated" speed, height, distance and target sports, and popular ball games/sports and similar sports (badminton, hockey) have, more or less, been exhausted. The sports popular for a young audience are those for which there are no 100% "objective" evaluation criteria. They all have a difficulty and WOW component and then must devise a complicated and often problematic way of evaluating the quality of the performance.


Between the first and second of the Hardy Fink newsletters, he announced that he is leaving his position as director of education and academy programs for FIG at the end of February 2020.