I was going to attach these comments to my test transcription, but since we can only take this test once, and you might disqualify me for delivering anything but a bare-bones, unannotated transcription, I've decided to ask here instead. Partly also so that you might clarify your Guidelines on these points before transcription jobs really start coming in.


1. In the testing rules it says that omitting a phrase is a major "Accuracy" error and leads to immediate disqualification (and ommitting single words is a minor error). And not following the "clean verbatim" style described in the Transcription Guidelines is a major "Formatting" error and leads to immediate disqualification. But the Transcription Guidelines state that we should ommit unfinished "false start" phrases, repeated words, and signs of active listening. So what rule is taking precedence here? (And, by the way, in the very colloquial audio sample for the test, it's not always obvious when the speakers just trailed of without finishing a thought--which we are supposed to keep in--and when they're just reformulating a sentence mid-way or repeating themselves because the other one interrupted them. And lots of time there's a "yeah" from the other party that's not an answer to a direct question, but which is also more than active listening. Rather, people just throw it in to agree with something the other said, often quite emphatically, so it seems wrong to edit it out. But even so, there are still MANY phrases and words in the test audio sample that should be ommitted if you want a readable, flowing text. The example transcription linked in the Guidelines is no help for comparison, because that's a polished scene from a movie, not natural speech. Also, in the test case, several phrases can genuinely not be heard because the speakers keep talking over each other all the time, or because the audience is too loud. I've tried for several hours, and with a second version of that interview, as well, with a somewhat different audio quality. So it's hardly fair to disqualify would-be transcriptors for not getting every single word.)


2. The Guidelines state that we should produce a flowing text with as little interruption as possible, and not to transcribe background noises if they don't affect the speakers. But the test case is a comedic interview filmed in front of a live audience precisely because the show producers want the viewer at home to know how other people reacted to the jokes. (Laughter is infectious.) Presumably they'd also want hearing-impaired viewers to know when the audience laughed or applauded, if this test case was a real job for the production of subtitles. So I feel like the audience reactions are part of the content in this special case, even if the speakers mostly ignore them, other than interrupting their speech flow due to the noise level. And the Guidelines state that spplause and laughter may be mentioned if transcribing a speech at a conference or something like that. But on the other hand, there is so much audience noise in the test case that it certainly would help the readability of the transcription if I would leave out all the "Audience: [laughter]" lines. So what do you want us to do?


3. I couldn't tell from the Transcription Guidelines whether to edit out "like" when it's used as just a discourse marker or colloquial quotative. On the one hand, the former use is clearly filler, just like "umm". But then, so is starting a sentence with "Well," and you kept that in the example. So should only inarticulate filler sounds be edited out, but not real words? Besides, it's kind of necessary to transcribe all the "likes" if one wants to keep the colloquial style of the interview and show that the speaker is using a mix of British slang and the Californian Valleyspeak sociolect. Your Guidelines state that we're not supposed to alter a speaker's dialect, but does that include the Valleyspeak-typical peppering of the sentences with random "likes"?


4. We're supposed to edit out repeated words for the sake of clarity and flow. Okay, but what if it seems like the speaker intentionally repeated themselves for emphasis? And what about phrases like a rapid "no, no, no"? That's not stammering, that's a personal eccentricity, so I feel like it should stay in.


5. None of the examples for crosstalk or interrupted sentences in the Guidelines show how you want the formatting to look for the continuation on another line. Should every new line of dialogue start with a capitalized word, even if it's technically not the start of a new sentence? I feel like readability would be improved if we didn't capitalize it in this instance, since that would make it clearer that the same sentence is being continued and the speaker is just ignoring the interruption.

For example:

Speaker 1: When we got to the house, there was a man on the roof,-

Speaker 2: Oh, my God!

Speaker 1: looking right down at us, and I thought he was going to jump.


6. The Guidelines contain examples for what to do with unfishished sentences. But what about unfinished words? Keep as-is for the sake of an accurate reproduction of the speech pattern, or complete them in the transcription for the sake of readablity? (And so the reviewer won't think I've made a grammar mistake.)


Speaker 1: What the Hell are you do- No, wait, I don't want to know.


7. If the names of the speakers aren't given in the audio sample, but they are easy enough to recognize or research from context, should we use them? Or stick to "Interviewer" and "Actress"?



1 comment

  • 0
    Lara Fernandez

    Hi Antje,

    My apologies for the late reply. I have checked with our Head of Quality to make sure I had the correct answers to this, however please do note that some of the examples you bring up are really situations in which we'd much rather have our transcriptionists use their best judgement. Without further ado, let me get on to your questions :)

    1) For "omission" to become an "accuracy" error it would have to negatively affect the content of the transcription, as in relevant information that was actually mentioned would be missing from the text.

    2) This is the perfect example of when we'd need you to use your best judgement. In the comedic example you mention, while it may not be necessary to insert every laughter interruption if it's continuous, it might be relevant at certain points of the interview depending on the content. 

    3) As you mention, our Guidelines state not to alter the speaker's dialect, so if a certain repetition (i.e. "like") is part of this dialect and a characteristic of the person's speech, you should leave it as is. 

    4) Again, if the speaker is intentionally repeating the words for emphasis, leave them in.

    5) Either option you mentioned is okay. You wouldn't have your test rejected based on this.

    6) Keep as is.

    7) Yes, if you can find the names of the speakers based on research, please do use them. Otherwise, you can stick to generic labels.

    Hope this helps!


Please sign in to leave a comment.