2

There was a job posted in a PDF file format. It's impossible to change the text within the PDF itself and the images and format are all very complex. What's the best thing to do here?

a. Convert the PDF to a word doc and risk messing up the format?

b. Write a word doc with the equivalent text and highlight where in the pdf the text belongs?

c. Translate the PDF text within PDF commentaries right next to it?

I'm a new translator and have never come across this issue before. I would greatly appreciate any tips here, as I can't find anything specific enough in the style guides. Thanks!

21 comments

  • 0
    Avatar
    KevanSF

    This is an interesting question for which I would also appreciate any tips and/or clarification.

    I recently declined a job because the original document was, as you describe, a PDF document with different images and a lot of different, complex formatting styles.

    I have learned that I can't simply change existing text in a PDF without purchasing software I do not wish to buy (free software allows me to read, but as far as I know, there are no freeware solutions for editing a PDF, please correct me if I am wrong), and, as you say, I could switch it all to a word doc and translate it, but it would look nothing like the original, and I didn't want to take the chance of the customer not accepting the translation because it didn't match the original format (although the instructions did say that a doc file was an acceptable method of submitting the translation).

    Thanks to anyone who is able to share their experience in this matter!

     

  • 0
    Avatar
    Alexander

    We're supposed to translate the text. Everything else is the responsibility of the customer.

    The few times I had to deal with a PDF document, I just translated the text and added instructions which text goes where (natalia.pastorpearce's option b). I never got complaints.

    Example:

    [[[page 1 starts here]]]

    [[[text below top left image starts here]]]
    Xxx xxx xxxx xxx xxx xxxx
    [[[text below top left image ends here]]]

    [[[text below top right image starts here]]]
    Xxx xxx xxxx xxx xxx xxxx
    [[[text below top right image ends here]]]

    [[[left column starts here]]]
    Xxx xxx xxxx xxx xxx xxxx
    [[[left column ends here]]]

    [[[right column starts here]]]
    Xxx xxx xxxx xxx xxx xxxx
    [[[right column ends here]]]

    [[[page 1 ends here]]]


    [[[page 2 starts here]]]

    ....

  • 0
    Avatar
    KevanSF

    Thank you for that info, Alexander!

    The next time such a translation job presents itself, I'll go ahead and do it that way!

  • 0
    Avatar
    mirko

    Happy New Year everyone :)

    Just wanted to say that I agree with Alexander. DTP isn't definitely part of our job.

    Also, here's the "official word" from Gengo about this:

    "PDF (.pdf)

    • If possible, use a PDF editor such as Adobe Acrobat to edit files. There are online tools such as PDFescape that allow you to do so.
    • If you don’t have a PDF editor, you can convert the file into another file format by opening the file with that program. Alternatively, you can copy and paste the text into a different file format, or use a file converter.

    • If you complete your translation in another file format, you can upload your translation in that file format—you don’t have to convert it back to the PDF format before uploading.
    • Use your best judgment in choosing the right format (uploading an Excel file for a presentation wouldn't make much sense). If in doubt, ask the customer which format they prefer." Source: http://support.gengo.com/entries/23716668-What-file-formats-will-I-be-working-in-

    However, in my opinion customers should be expressly asked to upload simple translatable/editable source files (not PDFs, nor complex presentations, for instance). That would make things easier for translators, ensure the customer is actually able to link source sentences to their translations (as opposed to having a potentially complex source PDF/presentation page translated as plain text and without formatting), and also avoid possible misunderstandings and doubts from both parts concerning the final output.

  • 0
    Avatar
    Megan Waters

    Hi all,

    We all hate file jobs and know the headaches they cause our translators :( Our engineering team is working to build an integration whereby the translator never actually sees a file, but the text is removed from the file and you will work on it as a normal text job in the workbench. This is planned for this year, so I will keep you updated once I hear anything more. In the meantime, our sales teams are working to convince customers who can to instead not upload any type of file jobs, so you should be seeing less of these in the near future.

  • 0
    Avatar
    kvstegemann

    Good morning to all. When you say "we all hate file jobs" I beg to differ a little. Obviously some of us do, that's for sure. Yesterday I picked up a job that was rejected before by two fellow translators. It was a Powerpoint presentation filled with a lot of graphics and rather few text elements, so the word count was rather low. For the translation I had not only to find all text elements (distinguishing them from graphic elements that also had text in them, which was obviously untranslatable), I also had to resize some of the text elements a little in order to accommodate the translated text (text in my native German is usually longer than the corresponding text in English). I notified the customer of these changes, of course.

    Now, I did not really mind this kind of work, I am comfortable with most file formats and having the full file gives me a better feeling for the context than just working with text fragments. So I do like file jobs. But it should be obvious that in such a case there is some more work involved than just translating a number of words. And this effort should be marketed and priced accordingly. So why not simply charge a little extra for file jobs and give the translator a little extra for it and nobody would hate file jobs anymore. And the customers should be able to see the difference it makes for them as well and appreciate it accordingly.

    Or, if you want to keep it even simpler: Do not accept file jobs on standard level any more. If all file jobs were automatically elevated to pro level, the price would probably cover the effort (if you combine this with the minimum fee I suggested in the other posting). This should be easy to communicate to the customer since anyone can see that file jobs do not integrate so well with any automation process.

  • 0
    Avatar
    Alexander

    Hi kvstegemann,

    Combining the suggestions you did, I would say the price should depend on the time needed to do the job, which is not simply proportional to the number of words. About a year ago, we had a heated debate about just how much time is needed for a job (https://support.gengo.com/entries/59519754-New-allotted-times-data-gathering), and everyone seems to be happy with the outcome, so we may assume the current table/formula for time allocation is quite realistic.

    So why not make the price proportional to the allotted time, and perhaps refine the time allocation for extra requirements of the customer like file type?

  • 0
    Avatar
    mirko

    @Megan - Just like kvstegemann, I don't "hate" file based jobs either... On the contrary, I actually prefer to work on files rather than on the "workbench" (since I am not tied to Gengo's web interface and its glitches/restrictions and I can use my CAT tool of choice). The point here is just that of having a "simple" translatable file (txt, rtf, docx, doc, odt, xls, xlsx, ods, etc.) versus graphic based files (pdf, ppt, pptx, etc.) and the extra (and unpaid) work involved in handling them. If you just removed the possibility for the customer to upload those files, and clearly told them they can only request the translation of text (and not expect transcription, DTP, "reconstruction", etc.), then problem solved in 2 minutes, easy as pie.

    I also see a problem with this: "Our engineering team is working to build an integration whereby the translator never actually sees a file, but the text is removed from the file and you will work on it as a normal text job in the workbench". Files may very well contain comments and additional elements (such as pictures) as reference (spreadsheet files are perfect for this), so, if we can't see the original file, that could be an additional problem...

    At any rate, I see Gengo is still dead set on forcing us to work exclusively on the website and not "offline"...  (as you had already mentioned here: https://support.gengo.com/entries/61191144-Export-contents-of-group-jobs-to-files ). I frankly don't understand your stance and all of the (often unexplained) changes you're making to the system... That also seems to be a sad departure from the approach Gengo adopted in the past, which was based on discussion, collaboration (even "confrontation") with translators and on listening to their voice, ideas and reasons. That's a real pity...

  • 0
    Avatar
    ikoeriha

    @Megan: I also don't "hate" file based jobs. And it seems Gengo project team have often used document files when they ordered the jobs for Preferred Translators. I think maybe you should ask why they order file based jobs.

  • 0
    Avatar
    kvstegemann

    @Alexander: Thank you for pointing me to that thread. I'm still new with Gengo so I have not witnessed what went on at that time. When I found and joined Gengo I thought how great it would have been if I had done this years before, but on the other hand I did not have to go through the stuff you seemingly had to :) Today the interface seems to be fine, I have no problems to work with it, and the deadlines are short compared to other jobs, but manageable.

    Since I am not familiar with the process of allotting the time to a job, I could not say if your idea is feasible. However, it seems to me that Gengo is focusing on streamlining the whole translation process, therefore I don't think they would like to put more effort into evaluating the complexity of a job. The risk of complexity will in the end always be on the side of the translator and this will probably be a permanent source of some conflict. A minimum fee and some extra award for file jobs would be easy to implement and without any managing overhead, it's just a marketing thing. And while we are doing the translation work, the marketing work is up to Gengo.

  • 0
    Avatar
    Alexander

    @kvstegemann - Perhaps I should clarify my previous post. By time allocation I mean setting a deadline, which is done by Gengo. E.g., a job with less than 67 words should be completed within 1 hour, a 100 words job should be completed within 1 hour and 10 minutes. For 200 words, the allotted time is 1 hour and 40 minutes.

    My final line "why not make the price proportional to the allotted time" was addressed not to you but to Gengo. I think if Gengo would make this change (don't hold your breath), that would come close to your suggestions. In particular, all tiny jobs would be priced at the current rate for 67 words, which is $2 at the Standard level and $5,36 Pro, not much different from the $3/$5 you suggested in https://support.gengo.com/entries/98591138-Minimum-fee-.

    To my taste, tiny jobs pay way too little, so most often I set RumpelstiltskinRSS to only alert me for jobs with at least 100 words. (Yet depending on my mood I sometimes pick smaller jobs as well. For me, translating is a hobby which happens to bring in some money, not my main source of income.)

    I joined Gengo in the course of 2014. From what I read since then on the forums, I understand there have been many debates in the past about suggested improvements, and most of them are still going on today. Gengo seems to be slow at implementing those improvements, even if they acknowledge the value. That's not to say they don't listen to the translators at all. In the past 1.5 years, I have definitely seen progress in several respects.

  • 0
    Avatar
    kvstegemann

    @Alexander: Ah, I see. That makes sense indeed and would not increase administrative overhead. You have my vote :)

    Many tiny jobs take disproportionately much effort, but not all. I do pick them up when they appear. But of course that could not work if I had to make a living by translating alone. I cannot imagine that any translator collects enough income at Gengo to exist on that, at least not in a first world country. Correct me if I'm wrong, anyone. To make a living by this, not only the rates would have to be sustainable but also the flow of incoming jobs. But it seems to me that the flow of jobs is unpredictable and with the rat race and grab-what-you-get principle it is by far too unreliable to make a living by this. I would like it very much if it were.

  • 0
    Avatar
    Masami

    Dear Megan, I kindly ask that you do not get rid of file-based job just like that. I certainly don't hate them, and quite often they are easier to work with as they provide a lot more context. Yes, of course I avoid PDF files because they are a pain to work with as they are not editable, but anything else like doc, xls or ppt files are absolutely fine. I would say that translating a powerpoint presentation without being able to see the original file would be almost impossible. And even for spreadsheets with multiple columns, how will you ensure that we know which bits of text are related to one another? Maybe Gengo could integrate into its workflow a reliable tool that converts PDFs to Word documents? Or you could simply pass the task back to the customer by not accepting PDF files. I know we are now getting off-topic on the original discussion of PDFs, but please don't convert all file types into plain text as this would make our job extremely difficult, and even impossible!

    Masami

  • 0
    Avatar
    Megan Waters

    I apologize, but “hate” was probably too strong a description for file jobs. However, we do consistently hear from translators that file jobs are problematic for them for a number of reasons. They are also consistently picked up much slower by translators and regularly cause many technical difficulties for us on the back-end of our system which causes other bugs.

    At this stage, we are only thinking of possible solutions to help both the translator and customer. Nothing has been decided right now, and even if we do decide to extract the text from files to display in the Gengo workbench, we will still provide a means to download the original file to provide context.

    Some of the ideas you have suggested here are all good options and I will give them to our engineering team to make sure they consider them all. As always, please continue to leave your feedback.

  • 0
    Avatar
    Masami

    Hi Megan, thanks for your response. I think it was the "whereby the translator never actually sees a file" that gave me cause for alarm. As long as we can see the original somehow then that'd be great! M.

  • 0
    Avatar
    Nuno

    @Megan

    I'd hate to see any kind of files go, including PDF files. For those of us who use CAT tools like SDL Trados, for example, they cause no technical or formatting difficulties at all, and greatly improve the workflow for larger jobs.

  • 0
    Avatar
    mirko

    @Nuno - In my experience, every single time I tried to directly translate a PDF using a CAT tool (such as MemoQ or Studio), I have encountered one or more "showstopper" issues, ranging from an ocean of tags to very bad segmentation (since text in PDFs is arranged in "floating" text boxes - sometimes A LOT of them -  placed on the page), to file protection settings preventing any action, to an horrible output file.

    You might get lucky if you're dealing with an extremely simple PDF containing mostly (well arranged and not "messed up") text in a single text box per page, but add things such as "non-sequential" text, multiple text columns, images, a lot of different formatting/fonts/styles/etc. and both the imported text (segmentation, tags, etc.) and the exported file will most probably be a complete mess. 

    Here is a thread about this on ProZ (there are A LOT about this topic, and most say the same things and/or suggest workarounds).

  • 0
    Avatar
    Evan

    Personally I find it a little inconvenient when file jobs are nothing more than just a few lines of text. Downloading and uploading small job files isn't particularly grueling or anything, but it feels like an unnecessary extra step when the text could have just been uploaded onto the workbench as is. Also, it seems like word/character count issues are much more prevalent in file jobs as well. So with that in mind, I appreciate Gengo's effort to consolidate and streamline things a bit.

  • 0
    Avatar
    mirko

    @Evan - True, short (say, <100 words) jobs are well suited to be carried out online (on Gengo's "workbench"), but think about a 5,000 words source, possibly split in 4-500 separate segments. Dealing with that on Gengo's platform isn't an optimal solution either. At the moment you can't search all of the occurrences of the same terms in the source, you can't do a concordance search (not sure if this will be an option with the TM...), if you need to check both glossary (on the right) and comments (on the left) for each segment, then the translation space becomes a tiny strip in the middle of the screen, you have to manually click on each empty target segment and wait for it to become "active" in order to be able to edit it, there's no auto-propagation (again, maybe addressed by the TM?), etc.

    Even now, when longer texts are posted as a "single segment", I copy and translate it outside of the platform, then paste it back once done (even translating a single big chunk of text online is not such a good idea), and I always lamented the fact I couldn't do the same with (big) job groups/"collections", but if, in the future, everything is going to be online (and split into segments to leverage TMs on top of that), well, that's definitely something I'm not looking forward to...

  • 0
    Avatar
    Alexander

    I think this has been suggested before, but it's worth repeating: there should be a possibility to export the source text as plain text (XML, JSON or whatever) and import the translation back into the workbench.

    Thus, those who wish to use some software of their own choice could easily do so (as they are probably doing already, using tedious manual copying and pasting; exporting and importing would speed things up and reduce the risk of errors), while others still have the possibility to stick to the workbench. Also, after the import, the translator could still benefit of all features of the workbench like spellchecking and checking the translation against the glossary. (Ideally the relevant glossary terms should be incorporated in the export, but I don't see a convenient way to implement that.)

  • 0
    Avatar
    Zac

    I'm generally fine with file jobs and light formatting in office formats (ppt, xls, doc, etc), but I would love to have some more formal guidance from Gengo on how we're supposed to handle PDFs.  Having to recreate PDF formatting in a word doc or attempt to edit the PDF ourselves seems out of the scope of the way jobs are set up here, and if the client would like that done, there ought to be an additional fee to compensate for the additional time it takes us to handle that sort of thing.  Without this kind of guidance from either Gengo or the client, it doesn't make much sense for me to take on a PDF job and then just guess about whether it's ok to just translate the text or if the client is also expecting me to recreate the formatting.

    I don't really have much of an opinion in terms of whether it's better to extract the text or not out of these files, but it seems really strange that we don't usually have access to the source files from which the text was extracted, as that can go a long way in helping translators to understand the context of the text.  As far as I know, most other translation agencies tend to provide the source files as reference, so I'm really surprised that it doesn't seem to be the norm here at Gengo.

Please sign in to leave a comment.