Studio skipping most TUs when importing TMX, no error, no warning, no log?
Thread poster: Gergely Vandor

Gergely Vandor
Hungary
Local time: 20:46
English to Hungarian
Jan 31, 2012

Hello,

I have a TMX file which I wanted to import into Studio (I tried 2009, while our customer tried 2011).

From some 45k TUs, Studio imports 16k. As far as I can tell, there's no error or warning, no import report, and no log file to say anthing about the skipped TUs.

Can I assume that the skipped TUs were duplicates? Or is there anything I can do?

In fact, the client says that on their system, Studio 2011 imported only 50 TUs from the same TMX, but that's another story, that one looks like some sort of error, while my 16k import could be "normal". (Although I'd really like to have information about what Studio did with my data.)

Thanks,
Gergely


 

merinber
Germany
Local time: 20:46
German to English
Skipped TUs Jan 31, 2012

My first reaction is to say that those were indeed repetitions or you have a filter active. You may also want to look a the import options. For example, if the tmx files has unknown fields, it is possible that they may be skipped if the option is active. Lastly, activate the Export invalid TU options to see if you get any feedback at all. Hope that helps.

 

Grzegorz Gryc  Identity Verified
Local time: 20:46
French to Polish
+ ...
Upgrade TM wizard... Jan 31, 2012

Gergely Vandor wrote:

I have a TMX file which I wanted to import into Studio (I tried 2009, while our customer tried 2011).

From some 45k TUs, Studio imports 16k. As far as I can tell, there's no error or warning, no import report, and no log file to say anthing about the skipped TUs.

Can I assume that the skipped TUs were duplicates? Or is there anything I can do?

In fact, the client says that on their system, Studio 2011 imported only 50 TUs from the same TMX, but that's another story, that one looks like some sort of error, while my 16k import could be "normal". (Although I'd really like to have information about what Studio did with my data.)


In the TMX import wizard, you have an option which permits to write the invalid segments in a separate file.
Nonetheless, it doesn't work when e.g. the TMX languages are incompatible with the TM, the TUs are simply skipped.
In fact, it's completely useless for advanced troubleshooting.

Try tu use the Upgrade TM wizard instead.
It generates a more sound log.

Catspeed
GG

[Edited at 2012-01-31 16:50 GMT]


 

Gergely Vandor
Hungary
Local time: 20:46
English to Hungarian
TOPIC STARTER
none of the above helps Jan 31, 2012

Thanks everybody, but none of the suggestions helped.

-I didn't have any filters active
-I didn't choose to skip segments based on custom fields
-I did choose to have an export of invalid TUs, but nothing was created. (I don't expect invalid TUs either in the TMX.)

I think the end result is that it was most probably duplicates, although Studio doesn't say anything about this. The very same behaviour is reproducible with a TMX file that contains two identical TUs. Studio will say "1 imported out of 2", and nothing more.

The problem is the uncertainty. How can I be certain that all the many TUs it skipped are really dupes?

Gergely

[Edited at 2012-01-31 14:51 GMT]


 
Send it to SDL support and ask them to check it! Jan 31, 2012

If you believe there is a possible problem with the import then send the file to SDL support explain your situation and ask them to confirm. If you feel that you aren't provided with enough information to be sure then ask for the information you need to be provided in the error reports or logs.

 

Gergely Vandor
Hungary
Local time: 20:46
English to Hungarian
TOPIC STARTER
interoperability Jan 31, 2012

If it wasn't clear, I work for a competitor to SDL and Trados, and the origin of this issue is a support case from one of our users.

To be fair, our tool memoQ has the same problem (in a TM created with default options), but it doesn't even tell you that it skipped anything. On the other hand, in memoQ you have the ability to allow multiple translations for the TM, in which case it will import everything from the TMX. memoQ always creates an import log, and lists the number of entries skipped for various reasons, but this log remains silent about duplicates.

TMs are very important resources, and dupes are pretty common. I think this is worth looking at for both parties for better interoperability between tools. I'm quite sure somebody from SDL will read this, or has already.icon_smile.gif I'll raise this within Kilgray too.

Gergely


 

SDL Community  Identity Verified
United Kingdom
Local time: 20:46
English
Duplicates Feb 1, 2012

Hi Gergely,

The only log generated, as you have seen, is one reporting invalid units that were omitted. Duplicates, and it's worth reviewing what a duplicate is for Studio, are as follows:

- where source and target segments are equal

- where the only difference between two or more TUs is because of auto-substitutable placeables
--- so numbers, tags and formatting, measurements, variables, and date and time expressions

We also have at least another complexity (just an example of one) where TMs that come from SDLX for example may actually expand without reporting and this is because SDLX puts multiple translations into a single TMX TU... we seperate them out.

So basically, we don't care (technically speaking... please don't respond saying SDL don't care..!) whether the duplicates, or other correctly handled TUs, change the number of TUs that now appear in Studio compared to the original TMX. The TMX that has been imported is there for use in Studio. So if you need to check the differences then you will have to consider other tools to do this that allow a comparison... or something that allows you to identify the contents of a TMX based on a set of rules and report on it before you import. Might make an intyeresting OpenExchange application but I don't expect to see this sort of funtionality added to the core product.

I guess this could mean work for another tool that requires seperate TUs for things we don't and there will be a loss of leverage in the other tool accordingly. So translating "Give me 3 cats" in one segment, and "Give me 5 cats" in another will result in one TU in Studio, and then a 100% match plus a 95% (or something) in another CAT tool if the exported TMX from Studio was used to try and translate the same segments... plus associated sore eyes and sneezing for handling too many cats.

I think interoperability is one thing, and generally TMX does a good job, but the differences in the way tools handle the translations and store the information leads to potentially more complex scenarios for CAT Hoppers.

Regards

Paul

[Edited at 2012-02-01 15:28 GMT]


 

Gergely Vandor
Hungary
Local time: 20:46
English to Hungarian
TOPIC STARTER
technical vs business point of view Feb 1, 2012

Hi Paul,

The technical explanation is interesting and I'm OK with it technically, but I don't think you should look at this from a technical point of view.icon_smile.gif Imagine you are running a translation business, and moving translations into Studio, and you see that out of your precious 3 million entries, only 2.5 million are imported. In my opinion, the typical reaction is that "Nooo, I am losing my translations and I am losing money. One of these tools must be doing something stupid!" Finding out that this is actually about duplicates requires detective work and/or technical knowledge that might not be available.

Studio could probably quite easily count how many TUs it skipped because of duplicates, and give some sort of report, even if the numbers don't actually correctly add up. Or if it isn't trivial to count them for some technical reason, it could at least say that *some* TUs were skipped becaue they were duplicates. We are simply going to add a line in our TMX import log about TUs skipped because they were dupes.

BR,
Gergely


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Studio skipping most TUs when importing TMX, no error, no warning, no log?

Advanced search







SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search