6

Several hours ago, I received e-mail from Gengo's team, which states that from now on, translators will be evaluated with not only scorecard, but also, with "consistency score".

They explain that this "consistency score" is set up for evaluating consistency of translation quality. For instance, if your last 3 GoCheck scores are "7.0", "7.0", "7.0", your average score is "7.0". Meanwhile, if your last 3 GoCheck scores are "10.0", "10.0", "1.0", your average score is sitll "7.0", but your "consistency score" will be signicantly lower than the previous case (at least, I understand Gengo's explanation as so).

Well, the concept is understandable. Any customer should be able to expect consistent quality of translation, of course.

But seriously, why does Gengo team make such important change without making any prior notice? They say that this "consistency score" will be used for evaluation of translators.
I have searched across twitter, blog post, support page etc., but no information whatsoever was found about this "consistency score".

Besides, it makes situation very hard for translators to keep his/her status as translator.

Along with "consistency score", new "performance dashboard" was revealed. Surprisingly, scores of (probably) every translators of Gengo can be checked from it from now on. I have checked scores of several translators.

To my surprise, many translators with above 8.0 (or sometime 9.0) on his/her scorecard, receive below 7.0 for "consistency score". Such persons received some low score (below 6.0) few times, which makes his/her consistency score significantly lower than his/her "scorecard".

As far as I heard, if your "consistency score" becomes below 7.0, your status as Gengo translator is at risk. In short, even when your scorecard shows above 8.0 or 9.0, when you make even one bad mistake in any of your job, you are in trouble from now on.

Anyway, I wish to ask Gengo's team about specific explanation about this "consistency score". How it is calculated, how it will exactly affect my translator status, etc.
I myself am currently contacting with Gengo's support team about it, but thought that it would be good idea to share this information with you all.

70 comments

  • -2
    Avatar
    Lara Fernandez

    @Nelson - Thank you so much for your detailed and elaborate feedback. You bring up very valid points, and I would like to get back to you once I have had the chance to share everybody's feedback with the team for further discussion and evaluation. I'm compiling everybody's feedback together in a document to share with the team at the beginning of the week, so if there's anything else that you'd like to bring up, please do, and I'll make sure that your voice is heard.

    One quick thing that I would like to inform you of, regarding point 1 of your first comment, is that a translator's consistency is score is not available for the customers to see. Your consistency score is between you and Gengo. If you head over to your translator profile, you'll see that, for the time being, we still display your quality score according to the old formula on there. Eventually, we would like to replace this with something else that's easily understandable such as a Top (percentage) scorer, or something along these lines.

  • 17
    Avatar
    Nelson Bras

    Dear Lara and fellow translators,

     

    Thank you for reading my post and taking it into consideration.

    As a matter of fact, I have just found a different post on the same topic elsewhere (https://support.gengo.com/hc/en-us/community/posts/360020796593-A-bit-of-feedback-on-translation-scores-and-MT?page=1#community_comment_360002687294), where the translator says:

    • I wholeheartedly subscribe to the fact that a translator with a string of 7s is more reliable and sounder from a business standpoint than one with several 10s and a few 3s thrown in the mix, as exemplified by your "how it works" scoring page. But to hammer it on: you then have to make really sure that the 3s really are worth 3s. And by that I'm not necessarily saying "it should be 10 or3", but rather: "3 obviously implies mistakes, but do those mistakes really account for such a plummet in ratings?". If not, you run the risk of having otherwise pretty decent translators working with a sword of Damocles above their heads every time they submit a translation, this is both unfair and detrimental to the business and the overall quality of the work. It's a lose/lose/lose situation for Gengo, the customers, and the translators. I get that Gengo is a business, I get that running it implies handling both carrot and stick with your contractors, but there's simply too much stick here, or rather, to be more precise: handling of said stick is carried on too haphazardly.

     

    Let's focus on the "but do those mistakes really account for such a plummet in ratings?"

    Here's a real-life example on how unfair the "lifetime consistency" system can be:

     

    Check score: 2.34/10.00

     

    Source: Open your {1}store shipping settings.{/1}

    Translation: Abra a sua {1}store shipping settings.{/1}

     

    Senior translator comment: "store shipping settings" should have been translated.

     

    Damn! (Excuse my French). The senior translator is absolutely right! Anyone could see that something was missing, right?

    How is it possible that a seasoned translator like me could do something so blatantly stupid? (I am a Virgo, maybe this is why I am really severe when it comes to self-criticism and ratings below 10/10)

    The thing is… the senior translator and I… we were not playing on a levelled field. This is how the same segment is shown at the translator's platform:

     

    Source: Open your <a href="{STORE_URL}/wp-admin/admin.php?page=wc-settings&tab=shipping" target="_blank">store shipping settings.</a>

     

    Translation: Abra a sua <a href="{STORE_URL}/wp-admin/admin.php?page=wc-settings&tab=shipping" target="_blank">store shipping settings.</a>

     

    If I was using the same platform as the senior translator, I am pretty sure that the mistake would be absolutely clear to me as well…

    As a side note: considering the fact that Gengo is not a technical specialized platform, that most translators lack the knowledge to foresee the final result in cases like the one above, and that a lot of us wake up in the middle of the night, or put other tasks on hold, to help Gengo deliver a short translation and gain another happy customer for the future, I think it would be useful for everybody if the platform was adapted to simplify our work, by showing a clean text, exactly like the one we can see when looking at the GoCheck platform. The implementation of such a system is not that hard. If this happened to me, a seasoned translator who has already localized thousands of apps for every major player in the industry… think how easy it is for a less experienced translator to fall into the same trap.

    Nevertheless, I totally agree that I should be penalized for such a huge mistake. 2.34/10.00 is pretty fair if we consider it objectively. This is what I don't think is fair:

    In relative terms, considering the fact that the large majority of my GoChecks produced a 10/10 and that, within a 4-year period, I have only 2 cases where I received 4/5 feedbacks from the clients instead of 5/5 (for no special reason)… is the situation above so important that I should bear its weight on my lifetime consistency score "till death do us part"? Does it really say anything about the overall quality of my work?

     

    Now, let's focus on the "you run the risk of having otherwise pretty decent translators working with a sword of Damocles above their heads every time they submit a translation."

     

    Well… I couldn't agree more.

    The objectivity of the GoCheck system is so restrictive nowadays that accepting certain risks for certain amounts is just… silly.

    Back in the day, being a seasoned translator, I was that kind of guy who never chose or declined a job… "let them come" was my motto. I was really fast due to my experience with large texts (not careless fast!) and my average scores were never far from 10/10. It was easy for me to put the Minamata Convention on hold for a couple of seconds to translate a simple 0.4 USD job and go back to what I was doing before.

    Nowadays, I always have to think twice before accepting a 200 USD job, just because a single mistake or a different opinion from the senior translator can have a gigantic impact on my ratings.

    Why am I not afraid to work for major companies, governments and a lot of different organizations… but always feel the presence of the "Damocles sword" when I look at Gengo's dashboard?

    1 - The number of words + type of error system is too restrictive. The GoCheck objectivity does not take into consideration that there are several degrees of low/medium/critical severity mistakes, from almost irrelevant to "me Tarzan, you Jane" speech; My suggestion: more severity levels;

    2 - A single huge mistake can affect your score for life (as explained above). My suggestion: abolish the "lifetime consistency" thing as a whole or, at least, consider a specific time frame and implement the word count variable as explained in point 5.

    3 - Things that should be considered "improvement suggestions" and, as such, not being marked as errors, are now reason enough for you to lose your qualifications. (What happened to the "Unlike many other translation agencies, Gengo does not promise to deliver a perfectly polished translation (see our Quality Policy)" and so on???) My suggestion: establish a clear distinction between improvement suggestions and severe mistakes; By the way, as a professional proof-reader myself, I never flag an error without adding a proper reference, in order to help the translator understand the mistake, improve his/her work for the future and be certain that the flag is not there just on a "because I think so" basis.

    4 - Unlike other platforms, we have no chance to change anything after submitting a job by mistake. My suggestion: establish a 10 min tolerance threshold;

    5 - The weight of the word count only refers to that "damn, bad luck" segment (again, excuse my French), not taking into consideration if you have already translated 5,000 words since your last GoCheck or just 50. My suggestion: I really think that the weight of the error should be measured relatively to the number of words translated so far since the last GoCheck, in order to best reflect your quality and consistency, as if you were working on a single larger document. Here's a useful" consistency variable!

    6 - I still see (through the editing jobs) a lot of situations where the translator has no clue of what he/she is doing. Yes, I am talking about really, really, really bad translations. They are still there… so… the GoCheck system is not producing the most important expected results. My suggestion: the only way to detect and solve this kind of situation would be… by assessing the translator's consistency! However, I am not talking about his/her "lifetime consistency". What I am talking about is having a look at different segments within the same collection, in order to establish whether the error derives from a one-time distraction/bad luck moment… or if this is exactly the kind of guy who should be kicked out immediately for faking the admission test! And here's another useful consistency variable.

    7 - Etc. (please, check my second comment above).

     

    Maybe this is why I can see two 6-7 USD editing jobs hanging on my dashboard for… I think… 3 days!

    It doesn't make any sense to take the risks, as mentioned above, rewrite the whole thing (508 units) for 7 USD (6,17 EUR!!!) and allow the original translator to get away with a huge collection of mistakes + a nice looting.

     

    Thank you for your patience.

     

    Best regards,

     

    Nelson Brás

     

     

     

    Edited by Nelson Bras
  • 10
    Avatar
    Xavier

    So now I am a Top Scorer, Pro Translator, and a Gengo Wordsmith with 86 GoChecks and I can only access short low-paid jobs because of some bad scores I got 2 years ago... Way to go Gengo!

    Since there is no real reward apart from a T-shirt for long-term translators, and the clients I used to work for (Oodji, BuzzFeed) are now gone, I even considered starting from scratch and creating a new profile, but my language pair is already full of translators.

    And because I can only access short low paid jobs, I don't really do many, which means that it will take months before I get enough reviews to climb back up to a decent score. I raised that point before, when I got asked to evaluate some changes Gengo wanted to make on the dashboard (including allowing only translators with a score of 8 or above to access bigger jobs): once your score falls below that, since you get fewer jobs, it is harder to get reviewed and you might get stuck indefinitely.

     

    Like Nelson said, some of us used to accept pretty much any job before, and just do our best. Since the only bad thing that could happen was getting one low score, as long as you worked well the rest of the time, it was OK. Ultimately it is not good for Gengo, I agree, and now even one low score can make you lose your qualification, so I don't do it anymore. I actually decline a lot of jobs nowadays and only do the ones I am sure I can do right. But I am still getting punished for my past attitude and inexperience. Gengo changed the rules, but the retroactive weight of the low scores from before is unfair!

    Edited by Xavier
  • 1
    Avatar
    t0m_mcn

    Hello,

    I have had a similar experience in that after a year of having a 10/10 score for Portuguese to English translations, I have now had my qualifications revoked and cannot retake the test, because the PT>EN test is no longer being offered. This seems to be due to my score on a translation of a children's story. This was a job that needed a "second pass." Yet after it was approved, I never saw any feedback from the reviewer. I have no idea what they think I did wrong in my translation, and I have to say that I stand by my work, especially since it is a children's story and therefore more of a transcreation in many ways. I "translated culturally" and not literally, as we are supposed to according to Gengo. If the reviewer really does have a good reason for disapproving my work, then that's fine, but I'd like to hear their reasoning.

  • 10
    Avatar
    Nelson Bras

     

     

    Dear Lara,

    "Maybe this is why I can see two 6-7 USD editing jobs hanging on my dashboard for… I think… 3 days!

    It doesn't make any sense to take the risks, as mentioned above, rewrite the whole thing (508 units) for 7 USD (6,17 EUR!!!) and allow the original translator to get away with a huge collection of mistakes + a nice looting."

    Update: 5 days now! No smart translator will take these.

    Forget about lifetime consistency. If consistency is what you want, have a look at these tasks and try to assess the consistency of the far too many mistakes contained in a single job! Not just one segment, but the whole thing. If you only pick one segment, maybe that guy is lucky enough to be assessed based on a couple of segments with no issues... and receives a perfect 10/10!!!

    This is what Gengo should be worried about! This is what really hurts Gengo's image. This is what good translators here are competing against. This is the real issue!

    I believe that, by implementing my suggestions, the problem would be solved, once and for all.

    Meanwhile... several much better translators are being deprived of the opportunity to do a much better work... just because the system has turned into a casino game.

    Best regards,

    NB

  • -2
    Avatar
    Lara Fernandez

    Hi all,

    Thank you very much for all your comments and feedback. The change in the scoring is a major overhaul of the system that we’ve been working on and testing for quite some time. We failed to properly communicate these important changes to you and for that we’d like to deeply apologize.  

    We’ve heard a lot of concerns about the sudden drop of your translator score, at times as much as several points. We know our translators pride themselves on consistently delivering only the best translations, and it feels like the new score fails to reflect your hard work. This is why we would like to really emphasize that not only has your score changed, but our understanding of what good scores are has also changed. We want you to be proud of showcasing your hard work, and this is why the translator profiles are still displaying your previous translator scores while we work on improving the way we publicly present your score.  

    Even though our customers value consistently high quality, the previous overall translator scores focused only on whether quality was good on average. The new score builds on top of that to add quality consistency into the calculation.

    The previous version of the scorecard was based on a weighted average of the last 10 GoCheck scores. This gives a general idea of overall quality but fails to take into account fluctuations or inconsistencies. The new version adds a buffer around this average of 10 GoChecks based on the standard deviation of past Gocheck scores. This new score (average of the 10 last GoChecks minus standard deviation) represents the lower threshold of quality that a translator delivers consistently. Your new score represents that most of the jobs delivered will be above this score.

    The translator scorecard has previously displayed that a 7 was a good score. In reality, this was the bare minimum expectation for all translators and an overall score of below 7 could cause qualifications to be revoked. With an average calculation, a score of 7 means that half of the scores could be below 7 and a translator would still retain their qualifications. What is considered a good score is redefined with the new calculation, with a 7 being a great score: it means that most delivered jobs will be consistently above this score.

    Because the score consists of two parts, the average of your most recent 10 GoChecks and a fluctuation calculation, the timing of when you receive a low score will impact your overall score differently. For a translator who has fluctuated in the past but has improved, their low scores will only slightly affect their fluctuation calculation, whereas their recent jobs will equal a strong average. For a translator who is recently fluctuating, their score will be doubly affected, both in the fluctuation component and in the recent average. The score still leans towards emphasizing recent performance.

    If you are a long-standing translator consistently delivering good translations, that will be showcased in your overall score.

    We truly appreciate the thought and time everybody has given to providing feedback and suggestions. We plan to continue to improve our service and will be considering some of the excellent ideas we’ve seen here, like limiting the number of GoChecks that are considered for the fluctuation calculation based on number or a certain time period.

    Once again, we’d like to sincerely apologize for our failure to communicate adequately on this issue. We hope that the explanations above shed some light on how your new score is calculated and how it affects you. We would like for this to remain an open and constructive discussion, and are ready to welcome any further feedback.

    Thanks,

    Lara

  • 7
    Avatar
    Sara

    Lara, thank you for your kind and detailed response. I do, though, have a question about GoChecks. I have recently requested a re-review of 3 recent translations in the DE>EN(UK) pair. My score improved upon re-review in two of the theee (haven't heard back yet about the third). In each case, the same reviewer was re-reviewing his or her own work. Is that typical? I thought another reviewer performed re-reviews instead of the original reviewer. Has there also been an overhaul of the reviewing teams behind the scenes as well? Does my pair only have one reviewer now?

    To the original reviewer's credit, he or she did remove some marked errors and my overall score has increased. But that makes me wonder how much better my score might be if I had requested a re-review of every questionable review since I started translating in 2016. As I said before, I do understand that the job of a reviewer is not easy. But as the errors they mark can lead to us having a poor consistency score and losing our qualifications, it is certainly important that translations only be marked for true errors and not just stylistic preferences. Is there a way that reviewers can give us feedback when they feel their suggestion is a better option, rather than marking our stylistic choice as an error?

  • -1
    Avatar
    Lara Fernandez

    @Sara - There are 2 LS/Reviewers in your language pair. For the case you mention, where the same LS/Reviewer re-reviewed their own work, this could have been because of the type of review you requested. The current re-review request form lists 4 different cases, and depending on your choice, the request will be sent either to the same LS (especially if you require further clarification of the errors marked) or to a different LS. Would you happen to remember which option you chose for your requests? If you do, I'd appreciate it if you could email me the job ID # and the options you chose at lara.fernandez@gengo.com, so that I can follow-up and check whether they were handled correctly.

    As for your last question, stylistic choices that are otherwise correct should not be marked as errors. Whenever you feel like this has happened, I'd encourage you to fill out a re-review request. If you feel like this is happening too often, let's further discuss by email so that I can forward to our Quality Team.

    Edited by Lara Fernandez
  • 5
    Avatar
    tashioh

    Hi Lara and fellow translators,

    Lara, thank you for your clarification, "stylistic choices that are otherwise correct should not be marked as errors"! However, my stylistic choices were counted as minor errors numerous times as well and I discovered recently that I was not alone in Japanese speaking community. It was one the main topics even before the implementation of this new consistency scoring system. Filling out a re-review request takes a chunk of my time otherwise could've been spent translating so I didn't bother doing it each time I found an unfair assessment, and now I regret it too. I wonder how many more I'll end up filling out now that we have a new punishment system. : (

     

  • 2
    Avatar
    Sara

    Lara and fellow translators: I misunderstood - it turns out, another reviewer was re-reviewing my work after all! But I still wish I had requested more re-reviews all along since I started at Gengo due to the current scoring system. But, oh well! I've sent Lara an email and I am glad that we all can have a dialogue here in the forums about the scoring situation.

  • 8
    Avatar
    Xavier

     

    Thank you Lara for your answer! I would like to add one more feedback/opinion based on what you said and on the way Gengo works overall:

    "This is why we would like to really emphasize that not only has your score changed, but our understanding of what good scores are has also changed. [...] Even though our customers value consistently high quality, the previous overall translator scores focused only on whether quality was good on average. The new score builds on top of that to add quality consistency into the calculation."

    It is true that everything we translate will be used by someone who is paying for a service, and that we should always aim to deliver the best translation possible. It sounds normal then for Gengo to emphasize the need for better quality and the strictness of the reviews.

    But in my humble opinion, the way Gengo works overall has a flaw: those "reviews" are done after the job/collection has been sent to the client. The GoCheck system is a punitive way of checking how translators perform, and as many people have already stated, the LS are also human beings. No matter how professional and experienced they are, they can make mistakes too. And though recent changes sound like Gengo is trying to get rid of under-performing translators, I am not sure that the general level of quality will increase, since its low rates attract mostly inexperienced translators.

    Gengo was my first job as a translator (needless to say, I wasn't always deliver top quality work at the beginning), but I have since worked for different translation companies, and that has made me realize some of the things Gengo is doing really wrong. And the GoCheck for me is maybe the biggest one. Instead of that, I think Gengo should hire a lot more LS or editors, and proofread (I don't want to take about reviews anymore) as many jobs as possible before sending them back to the clients. That way, the client would get a flawless translation, Gengo's reputation would improve, and the translators could get a meaningful feedback from the editors that will help them work better in the future (instead of focusing on the grade like a student and ignoring all the feedback). I have gotten much better by working for a company that uses this system, because I received constant positive feedback that helped me grow. Here, every time I receive an email saying that my job has been reviewed, all I feel is stress and fear. That is not a good way to work ... and since this new system has been put in place, it's even worse!

    I know you are going to tell me that Gengo is "cheap" for clients, it can't afford to pay so many editors, and that clients already have to option to get the job reviewed before receiving it (by another translator, not necessarily qualified to edit...) but I truly believe it is the only way Gengo can become a serious translation company and not an OK translation platform.

     

  • -1
    Avatar
    Lara Fernandez

    @tashioh - I've been following that thread (I do read Japanese), thanks for bringing it up! I also saw that Chikara (Shimizu LS) has explained about "unnatural/awkward sounding translation" versus "stylistic choices", and the fact that these two are not really interchangeable. Of course, I am just paraphrasing and speaking in general, I haven't taken a look at any of your translations, and it's not my place to judge. In any case, please don't hesitate to fill out a re-review request in the future when you feel it's necessary.

    @Sara - Got your email, thanks! I'll try to reply today :)

    @Xavier - As always, thanks for the feedback, I've passed it along to the team!

    Edited by Lara Fernandez
  • 6
    Avatar
    AlexF

    Dear @Lara,

    Dear fellow translators,

    Dare I suggest that Gengo look at another measure of consistency than the standard deviation parameter?

    Standard deviation works well on a "normal" distribution of scores, a Gaussian curve where the average value is more or less at the center of a symetric bell-shaped curve, with more or less as many values above as below (median close to average)

    In the case of our scores at Gengo, this bell shaped curve is pushed against an upper limit of 10 (because we all try to reach it consistently of course :-)) and the occasional "accident" has a much bigger impact on standard deviation than an exceptionally good score because there is not much room on the right of the medium compared to the left...meaning that compensating for an accident is particularly difficult....(distances to the mean are squared before being added.. but maybe that's going into to much detail.)

    In the current case, with very asymmetric distribution patterns, it appears much more relevant to look at decile or quartile values.

    For example, the value for which 90% of the scores are above (1st decile).

    (note: Gengo's current formula, if it were a perfect normal distribution, is the value for which 84.1% of values would be above, so 90% is even more stringent...)

    In the case of a bad day (or of a contested Gocheck review) that would lower that 1st decile value, the translator would need 9 higher scores to get it out of the picture.

    It also means that if you have 50 or 100 Gocheck scores under your belt, the 5 or 10 lowest of those will be ignored and the 1st decile value will be your next worst score.

    (However, I also vote for a time limit on the scores that are taken into account, considering how my skills have improved over the last three years..)

     

    As I obviously don't have access to real-life translator datasets of Gengo scores, I would like to ask my fellow translators to try it out on their scores and give me their feedback. 

    (BTW, it would be nice to have access to all our scores in our Gengo account, instead of having to extract them all out from emails dating back a few years...)

     

    Just a suggestion but I would love to have any feedback on this, good or bad!

    (If you don't want to share your results, just vote my post up or down based on what it means for your consistency score. Does it work better for you? Does it have more meaning? Would it be encouraging? or the opposite?)

    Hope this helps

    Alex

     

     

  • 8
    Avatar
    masanpra

    If gengo applied the consistency score to the reviewers, how many of them would keep their qualifications?

  • 2
    Avatar

    To all,

    This is the OP. It is quite interesting to see so many translators are now writing into community threads, mostly for stating their opinion to the new evaluation system.

    As some of you might know, this "new evaluation system" actually started 2 months ago (and I made this thread somewhere around that time), or even before that (see the thread made by Xavier). When Gengo was trying to launch their new "Performance Dashboard" around the beginning of October, it already indicated the consistency score of every single translator (because it made scores of every translators open to everyone, it was closed shortly afterward).

    I exchanged several comments with Lara at that time. She blamed me of making argument with insufficient amount of fact. Now that there are so many facts everywhere, and she seems to listen to everybody. Haha.

    Anyway, that is not the main point.

    Many translators are now suggesting to limit amount of scores reflected into their consistency score. It would certainly make situation better for many of them.. Nobody wish to be haunted by their past deeds, of course.

    However, as with me, what would happen if someone make some very low score quite recently, within a year, for instance?
    Since the introduction of consistency score, my score was dropped from 8.2 to 6.2. I had been working in Gengo for two years, and badly, I made quite low score only 3 months ago.
    If Gengo would listend to the sugggestions of translators and limit amount of data reflected into their consistency score, my score will certainly go even below 6.2, and according to the new explanation given in the support page, if score goes below 5.0, the qualilfication of that translator will automatically be revoked.

    Well, I hate to see my qualification being revoked without myself to do anything. As I said, with the old scoring method my score is still 8.2, and even with the current system it is still above 6.2 (barely enough to maintain my qualification). If I am to lose my qualification by further change in Gengo's qualification system, I seriously cannot stand it.

  • 4
    Avatar
    Xavier

     

    Hi 亮!

    I am personally in favor of the new system using only the scores since the date of the implementation (about 1-2 weeks ago, was it?) to calculate consistency, because like we discussed it in great  lengths, it is not fair to see scores we got when the rule was different interfere with the current score that gives us or not access to more jobs. Like I said before, there are many jobs that I wouldn't have taken if I had known that I wasn't only risking one low score, but a long-term effect on my potential incomes as well (just before Christmas by the way, thank you Gengo!)

    And in the future, for that consistency to use scores dating only a few months top, not all our history.

    What do you think guys?

     

  • 11
    Avatar
    Nelson Bras

    Dear all,

     

    Let me start by thanking you all for your inputs, your presence and your interest. It seems like it is paying off and our voice is being heard.

    Lara… what a difficult job you have! We do understand your position and thank you for your efforts.

    Let's get to the point:

    Changing the consistency system in order to reflect only a more recent time frame is not a solution per se.

    By doing that alone, the benefit would be targeted only to translators who had lower scores at the beginning (not my case. I am actually being benefited by the current system, comparatively to other translators, because my scores at the beginning rarely went below 10). Not only that, but the benefit for the translators in question wouldn't last long, since the remaining factors are still there, lurking in the shadow, waiting for a distraction to eat you alive. I don't think it's a good idea to try to stop a huge bleeding with a small Minion's bandage bought at the supermarket.

    So… if my score is still 8.8 (great score, according to Gengo), what am I advocating here? A way to have my score pushed down??? Of course not! I am just looking at the whole picture (as I hope you all are), considering everything beyond my personal immediate interests, thinking about the translators who have been negatively impacted by the recent changes and imagining the possibility to be in the same position one day… because the system, as is, tends to grow more and more unfair by the day.

    By using a 1-10 scale, and by applying it to very small jobs, the weight of a single mistake (fairly or unfairly pointed out…) is too high! Just consider my example above. That, per se, is enough for any good translator to have a huge impact on his/her score, overlooking the fact that he/she, for instance, made a mistake of 6 words after perfectly translating 10000! When you look at a 2.34/10 score, you only get the immediate impression that the translator is one of those who fail to correctly translate 77% of the source text… period. You have no idea if you are talking about someone who has already translated almost a million words in Gengo alone… and you can't see that this individual was having 1 bad day out of 1440, trying to do his best at 4 AM, with a nasty conjunctivitis, just to be there for the clients when they need him (that's one way of gaining and keeping the preferred translator status), and earn enough to pay his rent… cent by cent, by accepting small jobs.

    Statistics are interesting but can also be very unrealistic if used alone. As every economist knows, if I have a rice bowl and you have none… both of us have an average of half a rice bowl… but you won't be able to eat yours!!!

    You just have to do the math to realize how this can produce a huge impact on every translator's work - e.g. in my case, the extrapolation would be: 978,000 words / 90 GoChecks - the system is assuming that 10,866 words have been reviewed per GoCheck (the reality is faaaaaar), so when I receive that one and only 2.36/10, the system treats it as if it was referring to 10,866 words, instead of half a dozen!

    On the other hand, you have this Joe who, most certainly, faked his admission test (e.g. by asking a friend to do it for him), who fails to correctly translate 90% of a large collection (not statistically, but in real terms) and who can easily deceive the system for some time when a single well-translated segment of 10 words that has been reviewed is extrapolated as if it was referring to 10,000 words! Add that to the fact that this large collection will be edited by a decent translator… and no one will ever notice!

    I believe a mix between my suggestions and Alex F's suggestions (great statistical work, by the way!) since they basically cover every input from everybody else (thank you all), would be the solution to our problem, not only for now but for years to come!

    If you approach the problem from the "consistency" point of view alone… it will never work. You need to reinforce both columns (consistency + GoCheck)… otherwise, the building is doomed to collapse!

    A lot more could be said… however… it's 6 AM, I haven't slept yet and I still need to finish a 23,000 words job… because this is what I do for a living. It's time for me to let the management and stats issues for those who earn the big bucks to think about it.  

    Best regards,

    Nelson Brás

  • 3
    Avatar
    AlexF

    Hello again,

    I would also like to add that all our statistics should be shown on our Dashboard.

    Had you left our rolling average untouched, and simply added a new indicator called consistency (explaining how it was calculated/used, and replacing the totally useless "hours translating" indicator for example), I doubt you would have created such an upheaval.

     

    A positive side-effect of all this upheaval is a pretty vibrant forum! I sincerely  hope we can keep it up on some other community-oriented topics!

     

     

     

  • 10
    Avatar
    JY-LEE

    Hi everyone, I really don't like standing out, but I also really wanted to say something about the recent changes in Gengo's evaluation system. So please allow me to talk about it.

    As a student of economics, I'd like to point out that the rewards and penalties resulting from an incentive mechanism will make individuals act in a certain way. This certain way may not be the one which was intended in the first place, especially when the incentive mechanism is not designed properly.

    - With the recent changes in the evaluation system, Shorter jobs impose a heavier risk to translators than before. Since the full text is very short in these jobs, one minor error can take up a large portion of the whole job and can do a critical damage to the review score.
    This is more threatening to the experienced translators since they are now more vulnerable to lower review scores than newer translators. Eventually, experienced translators will avoid these shorter jobs (I have seen shorter jobs being declined many times after the introduction of this new evaluation method in my language pair. I also started to avoid accepting shorter jobs recently; $0.2 is not worth the risk) and relatively inexperienced translators will accept these jobs.

    The problem is that these shorter jobs can act as a threshold to some customers. These customers may order shorter jobs first to figure out what Gengo's capable of. They won't come back with another job if they think the translation quality of those shorter jobs is not satisfying. To sum up, experienced translators will now avoid taking shorter jobs, and if the quality of these jobs drops, it will make the customers unhappy and they may not use Gengo next time. This can cause serious damage to this translation platform in the long run.

    - Also, The new evaluation system can be punitive and harsh to translators. This can be justified only when the review system is fair and professional. But I'm not sure that this system is working well. The reasons are as follows:

    1. Conflict of interest - According to this post, LSs are allowed to work on the same jobs as regular translators. So, LSs are our competitor and reviewer at the same time. I have no doubt that the LSs are fair and conscientious people, but the system should not allow this awkward situation. Any LS can abuse his/her power to rule out competitors, and there is no way to prevent it. Since the evaluation scores have become more important than before, this matter should be taken into account seriously. I suggest that LS should be only dedicated to reviewing, not competing for the same jobs with regular translators.

    2. A matter of professionalism - I'm not questioning LSs' professionalism in the field of language and translation here; I do not have doubt about the reviewers' expertise and professionalism and I truly respect the LSs. This is more about their role as interpersonal service providers. LSs are expected to be fair and consistent, and they should communicate to the translators with a cooperative attitude, not in a high-handed manner, since their job is to assess the translators' performance precisely for Gengo, not to punish or discourage the translators.

    Before the new evaluation system was introduced, I have requested some re-reviews, since some parts of my translation were marked as errors for reasons that I could not agree. The requests were made in a courteous, non-aggressive way (I thanked in advance a lot, and politely asked the LS to teach me and provide some guidance if I was really wrong) and I presented sufficient explanations about my translation.

    But the score of the re-review was about 4 points lower than the first review, and it came with somewhat upset comments by the LS (I think the re-review was done by the same LS who did the first review, since there is no reason for the LS to be upset if it was done by another LS). The re-reviewer did accept some of my feedback, but he/she marked a lot of other parts of my translations as "wrong terms"(Some of them were not marked as errors in the initial review, and I think it is a matter of writing style which does not affect the accuracy of the translation). I know that my translation isn't perfect and I do make errors sometimes. That was the reason I asked for re-review, to know what I have done wrong and grow as a translator, in order to do better next time. But I guess the reviewer misunderstood my request for re-review as a challenge to his/her authority. So I plucked up my courage and contacted Gengo support team regarding that issue.

    But the response from the support was quite disappointing. I'll just copy and paste some part of it below;

    "The best thing is to just accept the initial review by the reviewer and keep the feedback in mind for future translations. Requesting a re-review because you disagree with the score doesn't mean you will get a higher score the second time. "

    "You should only use the re-review form if there is clearly a mistake in the language specialist's feedback. If it is a matter of style or preference, you may be better off just accepting the score and improving on your future translations. "

    "... it is best to accept the score given by the LS and to always keep the customer in mind."

    I have requested re-review about 3 times in different jobs but never received a score higher than the ones from the first review, because the LSs marked new errors that were not present in the first review. To me, some were acceptable but some were not. At some point, I gave up and stopped requesting re-reviews because I realized it is a waste of time and will not help me in any way.

    But under the new evaluation system, many translators will request re-review since the review scores have become more important. To handle these kinds of issues smoothly, the LSs should be more aware of their role. I think this cannot be done on a personal level. Gengo should provide guidelines to their LSs regarding consistency, fairness, how to communicate properly with translators, etc, since the purpose of this reviewing system is to encourage the translators to work hard, not to demand absolute obedience to their reviewers.

    - Gengo is a decent translation platform, and I want to keep working with Gengo. But I'm a little worried since I don't think recent changes will bring out their intended outcome - The prosperity of this translation platform. I hope Gengo is aware of the fact that the fair performance evaluation system is very important, and reward/penalty according to this fair evaluation is critical to the motivation of us translators.

     

    Whew, It is always exhausting to write in a language other than my native language. I hope I have expressed myself clearly.

    Best regards,

    JY

    Edited by JY-LEE
  • 3
    Avatar
    Nelson Bras

    A senior translator considers that the word 'x' is more often used than the word 'y', and this is enough to be recorded as a mistake with a huge impact on your overall score...

    A senior translator considers a minor mistake as a major mistake just because the option is there...

    And another one bites the dust!

    According to Gengo, the reviewer has no access to the final score upon finishing his/her work... Well... maybe they should, in order to have a clear view on the result of their actions and... perhaps... go back and try to be less severe in certain instances.

    Just a thought.

    Have a nice week everyone!

     

  • 4
    Avatar
    Tony

    My mind is boggling at this drastic change, really, it smacks of senior management blanket decision making, at someone in the upper echelons who's had enough of just enough. I hope I'm wrong, but if not, so be it.

    The reason I am finding this change so unfathomable is, under the new changes, I am wondering whether there is a single Gengo translator with a 10 (or even 9+) score. Is a 10 score even realistically possible now??

    If, as a customer, I find out that there's this crowdsource traslation firm with a pool of slightly above average quality translators (based on their 6-ish 7-ish 8-ish average scores across all translators). It's called Gengo. I'll look for another, thanks but no thanks.

    Notice how your public profile is still showing your older, higher score. No doubt that'll be changing eventually, I think I read from someone, sometime back in the past.

    I'm all for high competency and quality at all times, something I continuously strive to aim for (I can't aim for a 10 any more), but this is quite a heavy burden for me to shoulder (I have to aim for just surviving), others too it seems. What sort of company am I working for? The type of company that's ok with sevens and eights?

    Peace.

  • 2
    Avatar
    Gyuri

    Hi all,
    I am going to keep my comment short.
    AlexF has described the evaluation issue in correct mathematical terms: the average minus standard deviation is a wrong calculation. If we add to it the consideration of the most recent 10 evaluation, this makes it worse.
    My average score was 9.2 before the recalculation and now it is 6.9. I have translated 73902 units up till now in the recent (I simple cannot fiind on my account page, how many) years. 
    I would go for the lifetime scoring, that really shows the lifetime performance. It is not good for a beginner, who wants to learn at the Gengo, how to make consistent translations. 
    One vote for the comments of Nelson; most of his arguments I should repeat. 
    After reading the whole thread I don't remember who stated: there should be weighted the amount of job. A wrong score for a 6 word job should be different of a large translation. 
    P.S.: I could get here after submitting a ticket on the scoring and getting a link to this discussion. To put my five cents on the table has also required some efforts from me...
    Kind regards to everyone out there

  • 6
    Avatar
    gunnarbu

    I have a suggestion for improvement of the system which would probably increase the score for many translators, while still maintaining the intention of measuring consistency as one of the important quality parameters. 

    As in many other connections, like e.g. in scoring of performance in sports, one could remove the single lowest score (Or the lowest score for each whole year, or similar models …..)  from the track record of each translator, to smoothen the curve. In this way, a top scoring translator with a long time consistent track record of only top scores, but only one single very low score from a bad day, will not have this hanging with him or her forever, which is what happens now. 

    I am in a similar situation myself, and I feel that it is very demotivating to know that this one single very low score will haunt me forever, making it very hard to improve my score significantly.

     

    Just a suggestion.

     

  • 12
    Avatar
    sesztak.zsolt

    Freelance translator speaking here with 11 years of experience.

    This scoring system is a major mistake and Gengo already pulled out, at least partially, and soon, silently, cowardly, they will pull back completely. But we know how corporate PR works: they will never acknowledge that they made a huge mistake, but there will be the "we are listening to feedback and we are working on the solution" bullshit. The truth is that this new system overpenalizes translators for a single bad score. Add to the mix that their proofreaders are also translators and many of them look at other translators as competitors and give unjust low scores in the hope they will have more work in the future by eliminating competitors. I have seen this malicious and unprofessional behaviour many times during my whole career and on several occasions I even started legal actions against such "proofreaders".

    But if Gengo treats translators as a commodity and does not care about destroying one, believing that they are infinitely replacable, that's OK. There are many clients for me, Gengo is not the only one. If this company decides to shoot himself in the foot and trashes the translators who made them to grow, that will be their problem, not mine.

    BTW all the major brands (eBay, Amazon, Tripadvisor, etc.) use simple average for evaluation/feedback/review score. No need to reinvent the wheel. End of story.

    This comment will be soon removed by Gengo staff, so if you are reading this, consider yourself lucky.

    Edited by sesztak.zsolt
  • 1
    Avatar
    gunnarbu

    Dear Lara,

     

    Can you enlighten us about the status? Will there be any changes to the consistency scoring system based on all the translator feedback, and if so what and when?

     

    Gunnar

  • 1
    Avatar
    Lara Fernandez

    @gunnarbu — There will be changes indeed. As I said, we’ve been listening to everybody’s feedback and experimenting with a variety of modifications to the current formula. Our Quality Team is currently in the process of narrowing down the details and getting them ready for implementation. However, I am not able to give you the details or a date quite yet. I’ll definitely update you all here as soon as I have all relevant information :)

  • 3
    Avatar
    Xavier

    That is good news! Let's hope that the changes will be real and not just for the show...

    Since the consistency score has been implemented and I can't access higher-paying jobs, I refuse to work for Gengo and earn my money on other sites. And considering that I have recently received 4 or 5 emails about collections/jobs that need to be done urgently and offering an extra incentive (as opposed to one a month before for S [customer name has been removed - Lara], which nobody likes), it looks like I am not the only one :)

    Not to mention the TM jobs which we are discussing on another thread as well, that pay even less than the already low rates Gengo offers, make us waste more time than anything, and offer a significant risk of getting a low score since we have to work both as a translator and an editor (which with the current system is a too serious to take any chances)...

    I don't know if Gengo earned more money in 2018, but as for translators satisfaction, I personally went from an 8-8.5 to a 3!

  • 2
    Avatar
    Lara Fernandez

    Hi all,

    Thank you very much for your continued discussion, thoughtful feedback, and suggestions.

    We've followed the discussion closely and have been researching different modifications to the current formula in order to further fine-tune it. One concern we saw being brought up repeatedly is that, with the initial formula changes in November, you felt that one bad day could negatively affect a score that would otherwise be very consistent.

    Taking this into account, and after several iterations, we have modified the formula to remove the lowest GoCheck score in your history when calculating the overall translator score. This change will ensure that a single low score does not outweigh an otherwise consistently good performance, while also giving newer translators a fair chance to prove their skills when they are just starting out with Gengo.

    Please let us take this opportunity to remind you that your current quality score is calculated based on a weighted average of your last 10 GoChecks, from which the standard deviation is deducted in order to add a consistency factor. As a result, your score is not merely a reflection of your delivered quality, but a prediction of quality for future jobs based on the consistency of previous scores. For further details, please review this Support article.

    Keep up the great work!
    Lara

  • 0
    Avatar
    gunnarbu

    Hello Lara,

    Very good :-)

    Will the consistency score still be 'internal' as opposed to the 'public' score displayed on our public translator profiles?

    Gunnar

  • 0
    Avatar
    Lara Fernandez

    Hi @gunnarbu!

    I don't think we have any plans to make the consistency score public as is. As explained previously, in this thread or in a related thread, when we do adjust the Translator Profile to reflect the consistency score, it won't likely display the numbers that you can see. Instead, we're thinking about other ways to express your score, such as "Top XX% scorer", etc. Changes to this will still take a while and are undecided, though, so please don't quote me on that!

    Thanks,

    Lara

Please sign in to leave a comment.