Mobile menu

Huge project - excluding repetitions - project management question
Thread poster: Sebastijan Pilko
Sebastijan Pilko  Identity Verified
Slovenia
Local time: 01:57
English to Slovenian
+ ...
Dec 12, 2007

Hi all!

I have a question. I have a project of 300 xml files, 5 million words, 2.5 million 100% matches and 1 million repetitions. The files will be distributed among 15 translators.

The problem is that I have to be *careful* since the analysis for a separate file is bigger than the whole analysis and translation agency can lose a lot of money because the client pays only the word count based on the analysis of all the files together.

I have made an analysis for all the files and then exported the segments to rtf using "Export unknown segments" with the setting 99% or lower match value.

I now have a rtf file.

Any now my question.

Is it *safe* to split the big rtf file to smaller ones, distribute them to translators (working in tageditor) and when finished doing a final fuzzy over original files (xml/ttx)?

Any pointers how to do this efficiently?

Kind regards,
S.

[Subject edited by staff or moderator 2007-12-12 11:53]


Direct link Reply with quote
 

Fabio Descalzi  Identity Verified
Uruguay
Local time: 21:57
Member (2004)
German to Spanish
+ ...
A task for a good editor Dec 12, 2007

Hi Picow

You are facing a challenge. I have never before faced anything similar - but let's give it a try, basing myself on my experience. My contribution is not so much about "technical support", but rather a conceptual aid.

I did process a huge XLS file some time ago. And as I couldn't work it with TagEditor, I had to first cut it into smaller files, and then process every single file with TagEditor (for practical purposes, let's call them "the junior files").
I was working alone. And the junior files were not "single units" in themselves, but rather a complex mix of different sorts of texts. Junior file #1 contained things in common with junior file #6, etc. So: the wisest thing I could do was NOT "work every single file from the beginning to the end", but "work a bit on this file, a bit on that file, then the other one, etc." - that is: collect common translation material by translating parts of several junior files. The final result was satisfactory, the client keeps prising me

So, let's go down to your own task.
PICOW wrote:
I have a question. I have a project of 300 xml files, 5 million words, 2.5 million 100% matches and 1 million repetitions. The files will be distributed among 15 translators.

The problem is that I have to be *careful* since the analysis for a separate file is bigger than the whole analysis and translation agency can lose a lot of money because the client pays only the word count based on the analysis of all the files together.

I have made an analysis for all the files and then exported the segments to rtf using "Export unknown segments" with the setting 99% or lower match value.

So in average every translator will be processing some 333 thousand words, of which (once again in average) there will be 100 thousand new words. Every translator will be working about 2 months at least.
Then there is the "file to be careful with" which you are hoping to split. Say: to ensure a "fair share" among the translators, split the file among them - and give them a full copy as well, with the instruction to translate only the assigned part.

And then, your edition task.
You must ensure that translators share common material.
From time to time you must request them to send their TMs and re-send those partial TMs.

Wish you good luck!


Direct link Reply with quote
 
Sebastijan Pilko  Identity Verified
Slovenia
Local time: 01:57
English to Slovenian
+ ...
TOPIC STARTER
Common files, yes .. Dec 12, 2007

Hi Fabio!

Thank you for your answer!

The thing is I will not be splitting the xml files (I have 300 of them). I have made an analysis of all the files and then exported them to .rtf (99% or less option).

Now I have an .rtf files with segments from all 300 xml files, minus repetitions and 100% matches.

What I plan to do now is to cut this rtf file into smaller .rtf files and make anylsis for each file before sending them to the translators.

Every translator will translate the cut .rtf file in TagEditor, when all the files are translated I will perform a fuzzy to all the ttx files, based on the updated TM.

All translators work on the SAME TM (remote).

Am I far off?

Kind regards,
S.


Direct link Reply with quote
 
Haiyang Ai  Identity Verified
United States
Local time: 18:57
English to Chinese
+ ...
Consistency Dec 12, 2007

Not quite sure about how efficient your big operation will be, but I just want to say that terms consistency might be a problem, especially if you have 15 translators working simultaneously. Are you planning to use MultiTerm? Very often people translate a term differently, and it's your job to keep the consistency of the big project. Maybe you can setup a issue tracker or something, to make a way for translators to communicate and stick to something agreed on. Just my 2 cents.

Kind regards,
Haiyang


Direct link Reply with quote
 
Sebastijan Pilko  Identity Verified
Slovenia
Local time: 01:57
English to Slovenian
+ ...
TOPIC STARTER
MT in full swing ... Dec 12, 2007

Haiyang Ai wrote:

Not quite sure about how efficient your big operation will be, but I just want to say that terms consistency might be a problem, especially if you have 15 translators working simultaneously. Are you planning to use MultiTerm? Very often people translate a term differently, and it's your job to keep the consistency of the big project. Maybe you can setup a issue tracker or something, to make a way for translators to communicate and stick to something agreed on. Just my 2 cents.

Kind regards,
Haiyang


Hi Haiyang!

We already have a MT base (+term recognition enabled) with a few thousand terms and we have a shared excel file where new terms are registered, resolved and verified then imported to Multiterm.

My main issue is still the .rtf file split and correct translation of segments with tags.

Kind regards,
S.


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 19:57
English to French
+ ...
Why analyse all files together? Dec 12, 2007

Analysing a bunch of files together before splitting them among fifteen translators makes no sense to me, especially when more than half of the project is made up of 100% matches and repetitions. Each translator will have only roughly 1/15 of the text, so some of the repetitions and 100% matches in the analysis will simply not be in their files (some repetitions may become no matches), which means that some of them will actually work on more no match segments than agreed.

When splitting large projects, the analysis should be run on each separate batch. Otherwise, you would be denying payment for work done...


Direct link Reply with quote
 
Sebastijan Pilko  Identity Verified
Slovenia
Local time: 01:57
English to Slovenian
+ ...
TOPIC STARTER
I agree, but .. Dec 12, 2007

Viktoria Gimbe wrote:

Analysing a bunch of files together before splitting them among fifteen translators makes no sense to me, especially when more than half of the project is made up of 100% matches and repetitions. Each translator will have only roughly 1/15 of the text, so some of the repetitions and 100% matches in the analysis will simply not be in their files (some repetitions may become no matches), which means that some of them will actually work on more no match segments than agreed.

When splitting large projects, the analysis should be run on each separate batch. Otherwise, you would be denying payment for work done...


Hi Viktoria!

You are right about that. But think about it.

I have made (previous project) 2 separate analysis. I had 90 files and 600.000 no match words.

When I analysed all files I got 600.000 nomatch words. When I analysed each separate file I got over 1 million no match words because of repettions.

I repeat: ALL translators work on the SAME TM real-time.

So. Let me clear things up.

I have 2 files. Each file has 10 sentences with 200 words total = 400 words in both files.

Out of 10 sentences 3 sentences are IDENTICAL. So, 60 words = the same.

If I analyse both files together I get 400 words total, 60 repetitions, the rest is nomatch.

If i perform an analysis separately, I will get 400 words, no repetitions and 400 nomatch.

Ok?

here is what is the problem. Both translators are using the same TM. So, when the translator #1 translates those 3 sentences and closes the segment, those 3 sentences are in the TM.

When the translator #2 reaches the exact same 3 sentences, he will get a 100% match proposal from translator #1. And he will NOT translate those 3 sentences. But will get paid according to your solution.

This is not fair to the translator #1 + the agency pays for work not done (the client pays the analysis based on all files!).

Now to my initial question.

If I perform an analysis for both files, I extract 99% or lower segments, those 3 sentences will appear ONLY ONCE.

Please, be careful, if you have only one translator of 15, working on 1 TM with repetitions, it is not the same.

My question remains: I have a word .rtf with not repetitions, I want to split this file into 15 files, make 15 different analysis and distribute the files to translators. Only this way everyone gets paid for their work.

Kind regards,
S.


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 19:57
English to French
+ ...
That's a different story then Dec 13, 2007

That sounds much better. You actually sound like you are leveraging your tools like a good project manager would (I have yet to work on a project where a server is used for TM). In this case, then, it does make sense to analyze the entire batch in one run.

Direct link Reply with quote
 
Daniel García
English to Spanish
+ ...
Pre-translate repetitions? Dec 13, 2007

You say the project has 1 million repetitions.

Is it one sentence repeated a million times or is it 1.000,000 sentences which are appear twice? These are exaggerations of course but it would help you to have an idea. I will show you how.

I had a similar project once and we did the following:

a) Analyse all the files with the TM and export the repetitions in TRADOS txt format. The TM must have the "Usage count" field. This is important because this way the repetions that you export will have a usage counter field for the number of times that they are repeated.

b) Using a customised macro (I can't send you the one we used because I don't work there any more and it belongs to that company), convert the exported TXT file to a Word table where you have the sentence in one column and the number of times that sentence it appears in the next column.

Now you have a list of the repeated sentences and the number of times they appear.

You can now sort by the number of times to prioritise the translation of the repetitions up-front.

You can send the list of repeated sentences to a translator, together with the XML files where they appear "for reference".

Translating isolated sentences in this way is bit more time consuming because the translator has to check where the sentences appear to provide a meaningful translation.

After the translation of the repeated sentences is finalise, you can launch the translation of the XML files. Repetitions will appear now as 100% matches to all translators and they can review them and make sure that they are fine in their contexts. Of course, translators should be paid for reviewing 100% matches.

Now, this will work if the customer pays something for repetitions and 100% matches.

If they don't pay to review repetitions and 100% matches, you can just pretranslate the XML files and instruct translators not to review the 100% matches.

Tell your customer that because he is paying only once for the each repeated sentence, they are not going to be reviewed in every time they appear in the files and that this can jeopardise the quality of the translation.

If he is happy to assume the risk, then that's fine.

In our case, our customer knew TRADOS very well and understood how the tool works and agreed with our procedure and to pay accordingly for reviewing 100% matches in their context.

If your customer is clever enough as to demmand to pay based on the total analysis, they should be able to understand this procedure.

I hope this helps!

Daniel


Direct link Reply with quote
 
Sebastijan Pilko  Identity Verified
Slovenia
Local time: 01:57
English to Slovenian
+ ...
TOPIC STARTER
Daniel, great advice.. Dec 13, 2007

Thank you for you valued reply!

What do you think you think about the second option:

I have exported all segments 99% or lower into one rtf file.

I have a plan to cut this file into let's say 30 files and distribute them among translators.

I have only 2 concerns:

a) keeping the tags intact in tag editor and when translating into .ttx files

b) getting my hands on original pdf's, so translators have a reference to check where the sentence appears.

Kind regards,
S.

[Edited at 2007-12-13 14:52]


Direct link Reply with quote
 
Daniel García
English to Spanish
+ ...
Distribution plan Dec 13, 2007

PICOW wrote:

Thank you for you valued reply!

What do you think you think about the second option:

I have exported all segments 99% or lower into one rtf file.

I have a plan to cut this file into let's say 30 files and distribute them among translators.



Yes, that should work as well. My concern would as with the translation of repetitions: translators will be translating a lot of disconnected sentences and not just the repetitions.

This might not be too bad for the quality of the text depending on the type of project but keep in mind two things:

a) Translating disconnected sentences takes more time than translating whole documents.

b) The person(s) reviewing the translated XML files may have a tough job editing the final text. A perfectly translated sentence done in isolation might need changes when reviewed in its context.

This does not that your approach is wrong, I am only highlighting the possible risks that you might have to deal with at a later stage.




I have only 2 concerns:

a) keeping the tags intact in tag editor and when translating into .ttx files


Are there any internal tags in your RTF file with the unknown segments? If not, it is likely that there weren't any in the TTX files either.

Nevertheless, you can test the procedure by "translating" the RTF file with TRADOS using the "segment unkonwn segments" option. Then you can "clean-up" the "translated" RTF file into the memory aand test the fuzzy translation on the TTX files.



b) getting my hands on original pdf's, so translators have a reference to check where the sentence appears.[Edited at 2007-12-13 14:52]


Yes, that would be of course excellent but keep in mind that, if translators have to go to the PDFs to find where each sentence appears, it is going to take them time which should be compensated.

You might also want to make sure that everyone has some tool to search in several PDF files smultaneously.

Good luck with the project!

Daniel


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 19:57
English to French
+ ...
Daniel's solution looks good Dec 14, 2007

I just have something to add to Daniel's solution that might help translators pick up some speed.

After exporting repetitions and having them translated, maybe it is a good idea to start again with the entire document (not just the RTF that doesn't contain the repetitions) and pre-translate them using the TM that the translator created for the repetitions. Then, you can give these documents to the translation team. The pretranslated 100% matches will appear in the documents, so the translators will have all the context even though they only need to skip 100% matches. No need to look at the PDF anymore except to see which sentence goes where, if that is an issue.

Maybe this could help save some time for the translators - and a little money for the client.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Huge project - excluding repetitions - project management question

Advanced search


Translation news related to SDL Trados





memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs