Text-file parsing -- tutorials, training or tips?
Thread poster: Marketing-Lang.

Marketing-Lang.  Identity Verified
Germany
Local time: 09:17
English to German
+ ...
May 27, 2009

Dear all,
I work with Trados 2007 and would finally like to learn how to import my customer's own exotic text files without too much fiddling around. I've long played with the idea of using the import/export functions in TagEditor, but I find the standard documentation incomprehensible. I am no programmer, although I do have rudimentary (if somewaht outdated) knowledge of Basic programming.

It just can't be that hard... Does anybody have any tips? Web sites, training courses, or any sources whatsoever?

The files I have to deal with are .inc files relating to PHP. The customer develops their own CMS, so there are no standard filters out there.

With thanks,

-Mike-


 

Jorge Aguilar Juarez  Identity Verified
Australia
Local time: 17:17
German to Spanish
according to SDL May 27, 2009

You can work with inc files using the HTML filter.

http://www.translationzone.com/en/products/sdl-trados-freelance/languagessupported/

Good luck!
Jorge


 

Marketing-Lang.  Identity Verified
Germany
Local time: 09:17
English to German
+ ...
TOPIC STARTER
Thanks, but... May 27, 2009

Unfortunately the customer has his own "version" and the results with the HTML filter are unusable.

Thanks for the tip anyway, Jorge,

-Mike-


 

Harry Bornemann  Identity Verified
Mexico
English to German
+ ...
Perl... May 27, 2009

Perl with its "regular expressions" is the most powerful/specialised language to parse texts, but it may take you some months to learn it.

See for yourself at:
http://de.selfhtml.org/perl/index.htm
http://de.selfhtml.org/perl/sprache/regexpr.htm

(You are lucky to speak German, because other Perl documentations are not as clear as this one.)


 

xxxOlaf
Local time: 09:17
English to German
Can you post a code example? May 27, 2009

Without exactly knowing what your customer's .inc files look like, it's hard to provide useful feedback. If they're basically .html files, you could use Tag Editor to create a custom Tag Settings .ini file that specifically defines what tags and/or tag attributes need to be translated.
If you have excellent Visual Basic 6.0 or VBA skills and translate a lot of these .inc files, you could try using the Trados SDK to write your own macros or VB programs.

Otherwise you could try using regular expressions to either separate translatable strings from the code or somehow mark them as translatable.

Olaf

[Edited at 2009-05-27 23:56 GMT]


 

Marketing-Lang.  Identity Verified
Germany
Local time: 09:17
English to German
+ ...
TOPIC STARTER
Happy to post some code... May 28, 2009

Thanks for the feedback, everybody. @Olaf - here is a sample:
===
?php

require_once( "applications/user_portal_frontend/application_definition.inc" );

$currentLocalisingLanguage = "DE";

$languageLookup = array();


//////////////////////////////////////////////////////////////////////////////


$languageLookup[ "NAVIGATION_INTRODUCTION_HELP" ][ "DE" ] = "Hilfe/FAQ";
$languageLookup[ "NAVIGATION_INTRODUCTION_HELP" ][ "EN" ] = "Help/FAQ";

$languageLookup[ "NAVIGATION_INTRODUCTION_START" ][ "DE" ] = "Start";
$languageLookup[ "NAVIGATION_INTRODUCTION_START" ][ "EN" ] = "Start";

...etc...

===

I have pages of this stuff to to by next Tuesday... gulp!

Cheers,
-Mike-

[Edited at 2009-05-28 08:25 GMT]

[Edited at 2009-05-28 08:25 GMT]

[Edited at 2009-05-28 08:25 GMT]


 

Marketing-Lang.  Identity Verified
Germany
Local time: 09:17
English to German
+ ...
TOPIC STARTER
Or does anybody offer this as a service? May 28, 2009

nt

 

xxxOlaf
Local time: 09:17
English to German
There's no easy solution May 28, 2009

I'm sure that there's a more elegant way, but if I had to translate files like that I'd use the following quick and dirty solution, which will convert the .inc file into a tab delimited file that can be further manipulated in MS Excel:

1. Download Notepad++ (freeware)
http://sourceforge.net/projects/notepad-plus/
(or any other Unicode editor with regular expression support)
2. Open the .inc file with it
3. Press CTRL+H to display the Replace dialog box
4. Select the Extended search mode in the lower left corner.
5. Enter " in the Find what box and \t in the Replace with box. Click Replace all. (This will replace all double quotation marks with tabs.)
6. Press CTRL+A to select everything and then CTRL+C to copy the text to the clipboard.
7. Open MS Excel and paste the text into a new spreadsheet.
You'll find the translatable strings in the F column.
8. Copy the column to a word doc and translate it as usual.
9. Copy the translated strings back to the spreadsheet.
10. Select all text in the spreadsheet and copy the text to Notepad++.
11. Replace all tabs with quotation marks.
12. Search for all occurrences of "" and replace them with nothing then save the text as an .inc file.

Of course this approach only makes sense if the .inc files are rather large.

HTH,
Olaf




[Edited at 2009-05-28 19:19 GMT]


 

Jaroslaw Michalak  Identity Verified
Poland
Local time: 09:17
Member (2004)
English to Polish
The Trados way May 28, 2009


  1. Open "Filter Settings" from the Trados program group.
  2. Select "Regular expression text files".
  3. In the "Supported file formats" insert the extension. As .inc is probably recognized by another filter, I would recommend that you rename the extensions of your files to something else, e.g. ".trn" and put it in this field.
  4. Select "External patterns".
  5. Remove all patterns with "Delete" button.
  6. Input the following expression in the "Opening pattern": DE\" ] = \"
  7. Input the following expression in the "Closing pattern": \";
  8. Press "Add" and then "Save".
  9. Restart TagEditor and select your file with "Generic text files .trn" filter.
  10. All text should be greyed out with the exception of "Hilfe/FAQ" and "Start" strings.


Let me know how it goes...


 

Marketing-Lang.  Identity Verified
Germany
Local time: 09:17
English to German
+ ...
TOPIC STARTER
Two fantastic posts... May 29, 2009

Thanks, guys - I'll be trying your tips out over the weekend!
-Mike-


 

Marketing-Lang.  Identity Verified
Germany
Local time: 09:17
English to German
+ ...
TOPIC STARTER
RE: The TRADOS way May 29, 2009

Hi Jabberwock,
what a neat little tool - at least in principle. If I follow your instructions and try to open a file, TagEditor displays "Error when using plug-in filters: Unexpected failure in Codepage property for COM plug-in component... "

If I try to open the file anyway, it says

(80003): TagEditor is unable to open this document because the file type is not recognised."

Any ideas v. welcome...

-Mike-


 

Jaroslaw Michalak  Identity Verified
Poland
Local time: 09:17
Member (2004)
English to Polish
Hmm... May 29, 2009

Well, I tried it with the code example you have posted pasted into a simple txt file and it worked...

I suppose it might be an issue of encoding, which is not detected properly. In the Filter Settings program, on the main page of "Regular Expression text files" you can set source and target encodings - you might try to test the encoding you know your files have.

Alternately, you might send me one of the files, I'll have a look...


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Text-file parsing -- tutorials, training or tips?

Advanced search







SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search