Producing a glossary with no repetitions from a file with many repetitions
Thread poster: Huw Watkins

Huw Watkins  Identity Verified
United Kingdom
Local time: 10:33
Member (2005)
Italian to English
+ ...
Jul 26, 2013

Hi Guys,

Some help with how SDL Trados Studio (2009) works under the hood would be very helpful here. I have a very large excel file that has now been fully translated. The issue I now have is that the agency has requested a glossary to be compiled from this file, but with no repetitions.

The original file was 300,000 words with 60,000 no match and a number of fuzzies. There are over 200,000 repetitions and I am using a fresh TM, so no TM matches.

Given the complexity of the task at hand, the agency has kindly agreed that I compile the glossary on a segment by segment basis (not word by word - or the next 10 years of my life would be written off(!!) or I'd have to buy some sort of term extraction tool, which is not going to happen).

My plan of action is this:

1) Recreate the project with another fresh TM.
2) Select the Export unknown segments option in Analyze Files settings
and Possibly:
3) Select the Export frequent Segments option in Analyze Files settings.
4) Process the project as normal and use the export files from 2 and possibly 3 for my glossary.

My doubt comes in step 3. Thus far my experiment has involved me doing steps 1 and 2 and producing an unknown segments file that contains solely the no match words. It doesn't contain the fuzzies however (this is based on looking at the analysis of the file exported during the batch processing).

If I repeat the process but including step three will there be any duplications with the no match words. Do the no match words actually count the first occurrence of segment that is repeated numerous times throughout a file? Is this the same for fuzzies?

Am I running the danger of having repetitions if I use both the unknown segment file and frequent segment file included in the final glossary (bear in mind that I want the fuzzies to appear, but not the reps - there are no 100% matches which makes things easier...)

My next question is this - I am finding that I am not able to export the unknown segments file to excel (the original format of the original file) - does anyone know how to solve this? I have attempt a good old fashioned copy and paste into excel with all the target segments and that seems to work, thankfully(!!!), but I'm curious to know if I can save the target file in excel or not.

Any other tips on how I should approach this?


[Edited at 2013-07-26 14:31 GMT]


 

Mark
Local time: 11:33
Italian to English
Does MultiTerm count? Jul 26, 2013

I'd have to buy some sort of term extraction tool, which is not going to happen
I believe an extraction tool is part of the package.


 

Huw Watkins  Identity Verified
United Kingdom
Local time: 10:33
Member (2005)
Italian to English
+ ...
TOPIC STARTER
Multiterm is not something I considered - perhaps it's feasible? Jul 26, 2013

When converting from excel to a termbase is it possible to eliminate reps?

Does anyone know if this is feasible? i.e. organise the entire translation into two columns, reps and all, then convert to a termbase but eliminating the reps? It's not a feature I am aware of in multiterm, but perhaps it exists?

I know there's an app available now to convert termbases back to excel too.


 

Mark
Local time: 11:33
Italian to English
Gosh, I should hope so. Jul 26, 2013

The component would be "MultiTerm Extract (year)" I think: I expect the official blurb would give you a good idea. As far s I'm aware, it's meant to automate the whole process, within reason. (I think it makes proposals for acceptance/rejection.)

As for repetitions, any such system would have to be fairly lousy to duplicate entries, since identifying repeated term combinations is the whole idea.


 

Charlotte Farrell  Identity Verified
United Kingdom
Local time: 10:33
Member (2013)
German to English
+ ...
Is there not an option not to allow multiple translations Jul 26, 2013

of the same segment? I think this is doable with MemoQ but I've never specifically tried it. Maybe get the free trial and have a go?

 

Huw Watkins  Identity Verified
United Kingdom
Local time: 10:33
Member (2005)
Italian to English
+ ...
TOPIC STARTER
This is how I went about it in the end, for anyone interested: Jul 29, 2013

I:

1) Recreated the project with another fresh TM.
2) Selected the Export unknown segments option in Analyze Files settings.
3) Prepared project
4) Added the unknown segments sdlxliff file to a fresh project, using the full TM and prepared and pretranslated
5) Tidied up the resulting file, which was more or less entirely pretranslated (a couple of -1%ers for formatting penalties - easily sorted)

The issue of not being able to export the file into excel was a bit of a shame, but a simple copy and paste of all source and target segments did the job fine.

I am still unsure of the 'under-the-hood' workings of unknown segments vs frequent segments, however I got 20,000 more segments in the unknown file than the frequent one, which all turned out to be repetitions of the remaining 40,000 words in the unknown file anyway. In other words, the unknown segments matched the word count for the no match words. By unticking the "Report Internal fuzzy match leverage" - I also resolved/eradicated the fuzzy matches, which are then incorporated into the no match words.

Multiterm extract I think may have been included as a trial in Studio 2009, but it's not inherently part of the multiterm application and a word-by-word glossary was not necessary in this case anyway, so didn't explore multiterm further. I also didn't try the MemoQ option as I think I have already used up my trial user rights by now. Would be interested to hear how it performs with this though if anyone has a similar situation.

The initial glossary request by segments rather than words makes sense as the original excel file was essentially a product list and brief description. It's all down now anywayicon_smile.gif

[Edited at 2013-07-29 10:57 GMT]

[Edited at 2013-07-29 11:01 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Producing a glossary with no repetitions from a file with many repetitions

Advanced search







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
SDL Trados Studio 2017 only €435 / $519
Get the cheapest prices for SDL Trados Studio 2017 on ProZ.com

Join this translator’s group buy brought to you by ProZ.com and buy SDL Trados Studio 2017 Freelance for only €435 / $519 / £345 / ¥63000 You will also receive FREE access to Studio 2019 when released.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search