Eliminating certain lines when translating TXT files
Thread poster: lingoneer

lingoneer  Identity Verified
Local time: 19:49
English to Finnish
+ ...
Jan 26, 2010

Dear all,

I am faced with the following dilemma.

Using SDLX 2007 Professional, I need to translate a big bunch of plain text (.TXT) files which will be converted into ITD files (one ITD per one TXT file).

The TXTs must be converted into ITDs. Due to the nature of the project, no other file conversion is possible. (The TXTs cannot be converted into Word or other format before import into ITDs).

All of the TXT files contain text that should not be translated (code) in odd lines (line numbers 1, 3, 5...) and translatable text in even lines (line numbers 2, 4, 6...). All odd lines begin with the same code (##).

Now, is there any way, based on the even/odd division or the ## beginning the code line, to exclude the odd lines from being imported into ITDs and import even lines only, so that I cannot see, in the ITDs, the lines containing code at all?

Help is much appreciated.

Kind regards,
Tuomas


 

Heinrich Pesch  Identity Verified
Finland
Local time: 19:49
Member (2003)
Finnish to German
+ ...
I don't think you can Jan 26, 2010

In Word you could format the lines as hidden, but when you save the file as txt, not formatting "sticks".
But it does not matter. When you open the itd in Editor all lines are copied to the right and you only translate every second line. Then you save the translation and all is well.

Regards
Heinrich


 

Jaroslaw Michalak  Identity Verified
Poland
Local time: 18:49
Member (2004)
English to Polish
No conversion Jan 26, 2010

Just to clarify: you cannot convert the txt files before import, because you need to deliver itd files to your client, is that right? I.e. you cannot deliver just translated txt files?

 

FarkasAndras
Local time: 18:49
English to Hungarian
+ ...
Can you delete them from the txt Jan 26, 2010

lingoneer wrote:

Dear all,

I am faced with the following dilemma.

Using SDLX 2007 Professional, I need to translate a big bunch of plain text (.TXT) files which will be converted into ITD files (one ITD per one TXT file).

The TXTs must be converted into ITDs. Due to the nature of the project, no other file conversion is possible. (The TXTs cannot be converted into Word or other format before import into ITDs).

All of the TXT files contain text that should not be translated (code) in odd lines (line numbers 1, 3, 5...) and translatable text in even lines (line numbers 2, 4, 6...). All odd lines begin with the same code (##).

Now, is there any way, based on the even/odd division or the ## beginning the code line, to exclude the odd lines from being imported into ITDs and import even lines only, so that I cannot see, in the ITDs, the lines containing code at all?

Help is much appreciated.

Kind regards,
Tuomas


I'm not sure what you are trying to achieve here.
If deleting the lines with the ## (or just their content) is the goal, that's fairly easy to do. I would use sed. The rest of the file would be left identical, byte for byte.
Something like sed -e "s/^#.*$//" file1.txt > file2.txt should delete everything from lines that start with # and if you want you can follow it up with sed -e "/^$/d" file2.txt > file3.txt to remove all empty lines.
Sed for Windows here: http://gnuwin32.sourceforge.net/packages/sed.htm

Of course you could do the same with any number of methods including opening the txt in MS Word and using its own search and replace before resaving as txt etc etc... Perhaps you are looking for some method that wouldn't change the source txt, just flag the lines somehow as non-translatable during the import? I don't know SDLX well enough to be able to tell if such a feature exists, although I think it does.


 

lingoneer  Identity Verified
Local time: 19:49
English to Finnish
+ ...
TOPIC STARTER
Problem solved Jan 27, 2010

Dear all,

Thank you for your kind effort to help me out.

Heinrich, one way to solve the issue would be as you suggest, to leave the ## lines in the ITDs and then just skip them while translating. This, however, leaves margin for human error, which I'd like to eliminate.

Jabberwock, I can just deliver translated TXT files, but would want to use a CAT tool to minimise the chance of human error.

FarkasAndras, I would rather not delete the ## lines from the TXTs, as there are a thousand files each with hundreds of lines to be processed and I am not too familiar with the sed command you mention.

However, I was able to solve the issue using Trados/TagEditor (not SDLX).

In Filter Settings, Regular Expression text files, I added *.txt in the Supported file formats field. In External Patterns, I included "##0" in Opening patterns and "##" in Closing patterns. Now, I can open the TXTs in TagEditor, and all lines beginning with ##, or ##0 rather, and ending in ## are protected and I cannot edit them.


 

Piotr Bienkowski  Identity Verified
Poland
Local time: 18:49
Member (2005)
English to Polish
+ ...
Possible in SDLX, too Jan 27, 2010

There is a setting in the SDLX Switchboard to ignore 'number-only' segments.

HTH

Piotr


 

Stefan de Boeck  Identity Verified
Belgium
Local time: 18:49
English to Dutch
+ ...
a note Jan 27, 2010

lingoneer wrote:
In Filter Settings, Regular Expression text files, I added *.txt in the Supported file formats field.

Good… But,
especially if you're not working on your own machine,
remember to undo this after you're done. Some DTP formats,
e.g. Ventura* and Pagemaker*, also use the .txt extension,
and they may (in my experience will) be opened as plaintext.

And there will be howls.

* Tagged


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Eliminating certain lines when translating TXT files

Advanced search







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search