Studio skipping most TUs when importing TMX, no error, no warning, no log?
Gergely Vandor Hungary Local time: 04:03 English to Hungarian
Jan 31
Hello,
I have a TMX file which I wanted to import into Studio (I tried 2009, while our customer tried 2011).
From some 45k TUs, Studio imports 16k. As far as I can tell, there's no error or warning, no import report, and no log file to say anthing about the skipped TUs.
Can I assume that the skipped TUs were duplicates? Or is there anything I can do?
In fact, the client says that on their system, Studio 2011 imported only 50 TUs from the same TMX, but that's another story, that one looks like some sort of error, while my 16k import could be "normal". (Although I'd really like to have information about what Studio did with my data.)
Thanks,
Gergely
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
merinber Germany Local time: 04:03 German to English
Skipped TUs
Jan 31
My first reaction is to say that those were indeed repetitions or you have a filter active. You may also want to look a the import options. For example, if the tmx files has unknown fields, it is possible that they may be skipped if the option is active. Lastly, activate the Export invalid TU options to see if you get any feedback at all. Hope that helps.
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Grzegorz Gryc Poland Local time: 04:03 French to Polish + ...
Upgrade TM wizard...
Jan 31
Gergely Vandor wrote:
I have a TMX file which I wanted to import into Studio (I tried 2009, while our customer tried 2011).
From some 45k TUs, Studio imports 16k. As far as I can tell, there's no error or warning, no import report, and no log file to say anthing about the skipped TUs.
Can I assume that the skipped TUs were duplicates? Or is there anything I can do?
In fact, the client says that on their system, Studio 2011 imported only 50 TUs from the same TMX, but that's another story, that one looks like some sort of error, while my 16k import could be "normal". (Although I'd really like to have information about what Studio did with my data.)
In the TMX import wizard, you have an option which permits to write the invalid segments in a separate file.
Nonetheless, it doesn't work when e.g. the TMX languages are incompatible with the TM, the TUs are simply skipped.
In fact, it's completely useless for advanced troubleshooting.
Try tu use the Upgrade TM wizard instead.
It generates a more sound log.
Catspeed
GG
[Edited at 2012-01-31 16:50 GMT]
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Gergely Vandor Hungary Local time: 04:03 English to Hungarian
TOPIC STARTER
none of the above helps
Jan 31
Thanks everybody, but none of the suggestions helped.
-I didn't have any filters active
-I didn't choose to skip segments based on custom fields
-I did choose to have an export of invalid TUs, but nothing was created. (I don't expect invalid TUs either in the TMX.)
I think the end result is that it was most probably duplicates, although Studio doesn't say anything about this. The very same behaviour is reproducible with a TMX file that contains two identical TUs. Studio will say "1 imported out of 2", and nothing more.
The problem is the uncertainty. How can I be certain that all the many TUs it skipped are really dupes?
Gergely
[Edited at 2012-01-31 14:51 GMT]
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
If you believe there is a possible problem with the import then send the file to SDL support explain your situation and ask them to confirm. If you feel that you aren't provided with enough information to be sure then ask for the information you need to be provided in the error reports or logs.
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Gergely Vandor Hungary Local time: 04:03 English to Hungarian
TOPIC STARTER
interoperability
Jan 31
If it wasn't clear, I work for a competitor to SDL and Trados, and the origin of this issue is a support case from one of our users.
To be fair, our tool memoQ has the same problem (in a TM created with default options), but it doesn't even tell you that it skipped anything. On the other hand, in memoQ you have the ability to allow multiple translations for the TM, in which case it will import everything from the TMX. memoQ always creates an import log, and lists the number of entries skipped for various reasons, but this log remains silent about duplicates.
TMs are very important resources, and dupes are pretty common. I think this is worth looking at for both parties for better interoperability between tools. I'm quite sure somebody from SDL will read this, or has already. I'll raise this within Kilgray too.
Gergely
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
SDL Support United Kingdom Local time: 03:03 English
Duplicates
Feb 1
Hi Gergely,
The only log generated, as you have seen, is one reporting invalid units that were omitted. Duplicates, and it's worth reviewing what a duplicate is for Studio, are as follows:
- where source and target segments are equal
- where the only difference between two or more TUs is because of auto-substitutable placeables
--- so numbers, tags and formatting, measurements, variables, and date and time expressions
We also have at least another complexity (just an example of one) where TMs that come from SDLX for example may actually expand without reporting and this is because SDLX puts multiple translations into a single TMX TU... we seperate them out.
So basically, we don't care (technically speaking... please don't respond saying SDL don't care..!) whether the duplicates, or other correctly handled TUs, change the number of TUs that now appear in Studio compared to the original TMX. The TMX that has been imported is there for use in Studio. So if you need to check the differences then you will have to consider other tools to do this that allow a comparison... or something that allows you to identify the contents of a TMX based on a set of rules and report on it before you import. Might make an intyeresting OpenExchange application but I don't expect to see this sort of funtionality added to the core product.
I guess this could mean work for another tool that requires seperate TUs for things we don't and there will be a loss of leverage in the other tool accordingly. So translating "Give me 3 cats" in one segment, and "Give me 5 cats" in another will result in one TU in Studio, and then a 100% match plus a 95% (or something) in another CAT tool if the exported TMX from Studio was used to try and translate the same segments... plus associated sore eyes and sneezing for handling too many cats.
I think interoperability is one thing, and generally TMX does a good job, but the differences in the way tools handle the translations and store the information leads to potentially more complex scenarios for CAT Hoppers.
Regards
Paul
[Edited at 2012-02-01 15:28 GMT]
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Gergely Vandor Hungary Local time: 04:03 English to Hungarian
TOPIC STARTER
technical vs business point of view
Feb 1
Hi Paul,
The technical explanation is interesting and I'm OK with it technically, but I don't think you should look at this from a technical point of view. Imagine you are running a translation business, and moving translations into Studio, and you see that out of your precious 3 million entries, only 2.5 million are imported. In my opinion, the typical reaction is that "Nooo, I am losing my translations and I am losing money. One of these tools must be doing something stupid!" Finding out that this is actually about duplicates requires detective work and/or technical knowledge that might not be available.
Studio could probably quite easily count how many TUs it skipped because of duplicates, and give some sort of report, even if the numbers don't actually correctly add up. Or if it isn't trivial to count them for some technical reason, it could at least say that *some* TUs were skipped becaue they were duplicates. We are simply going to add a line in our TMX import log about TUs skipped because they were dupes.
BR,
Gergely
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
To report site rules violations or get help, contact a site moderator:
A fully featured online CAT tool and TMS, with no installation required, and a simple, intuitive interface. Maximize linguistic assets by sharing in real time as you collaborate with colleagues. Make use of next generation, cloud-based translation technol
SDL provides market-leading translation software to over 185,000 users
SDL offers leading translation management solutions to meet LSPs needs throughout the whole translation supply chain.
With over 185,000 licenses being used by translators and organizations worldwide, our products will help you to connect to a supply chain that guarantees compatibility, making it easier to work with your customers and other users.