ProZ.com global directory of translation services
 The translation workplace
Ideas

 
Pages in topic:   [1 2] >
User
Thread poster: Nurzhan Nagashbekov
Counting words in a txt file within quotation marks

Nurzhan Nagashbekov  Identity Verified
Kazakhstan
Local time: 08:24
Member (2008)
English to Kazakh
Feb 9

Hello fellow translators,

I have a txt file with software strings in it to be localized. It looks like:

#command some text "text to be localized" // comment

I want to count the words within quotation marks. Is there any way to do it, except manual counting?

I tried importing txt to MS Excel, but it seems the file is not correctly delimited. So, the words I need may appear on different columns.

Any help will be much appreciated.


Direct link Reply with quote
 

Philip Lees  Identity Verified
Greece
Local time: 05:24
Member (2008)
Greek to English
A job for Perl Feb 9

Give the file to somebody you know who uses the Perl programming language, and ask them to run this:

perl -i.bak -pe "s/^.+?\"//; s/\".+$//" yourfilename

That will remove everything from your file except the parts in quotes (the original file will be renamed as yourfilename.bak). You can then count the words in the new file.

This assumes that all the lines have the same format.


Direct link Reply with quote
 

Nurzhan Nagashbekov  Identity Verified
Kazakhstan
Local time: 08:24
Member (2008)
English to Kazakh
TOPIC STARTER
Some strings are different Feb 9

Thanks Philip,

Unfortunately, some lines contain only comments and there are lines that contain #command... and "text" but no comments.

I was able to count the number of quotation marks in excel using countif function but it was useless, since there are lines with sentences in quotation marks.


Direct link Reply with quote
 

Amit Evron  Identity Verified
Brazil
Local time: 23:24
Partial member (2011)
Spanish to English
+ ...
Send it over Feb 9

If it's not confidential and if the file isn't too big, feel free to send it over and I'll write a quick perl script. Shouldn't take more than 5 minutes. Just send me a message through Proz and I'll reply with my e-mail address.

Direct link Reply with quote
 

Tony M  Identity Verified
France
Local time: 04:24
Member
French to English
+ ...
Paste into Word Feb 9

Haven't tested it, but why not try this:

Select all your text and paste it into Word (etc.)

Do a 'replace all' on the " (careful to get the right character!), replacing with (say) Tab

Select all and convert text to table, using the character you replaced above (e.g. Tab) as the delimiter.

This should enable you to get a column that just has your text to be translated in, and you can take it from there

If you have any lines with no " " at all, they should just appear all in the first column.

Theoretically at least, you ought to be able to reverse the process at the end...

One proviso: one has to assume that each line does end with a Return character or similar; if necessary, you might need to go through and replace whatever the end-of-line delimiter is with something that will work in Word for the conversion to table.


Direct link Reply with quote
 

Philip Lees  Identity Verified
Greece
Local time: 05:24
Member (2008)
Greek to English
Should still work Feb 9


Nurzhan Nagashbekov wrote:
Unfortunately, some lines contain only comments and there are lines that contain #command... and "text" but no comments.


I think my script should still work with a small modification (for the comment only lines), but as Amit has kindly offered to take it on I'm happy to hand over to him.


Direct link Reply with quote
 

Nurzhan Nagashbekov  Identity Verified
Kazakhstan
Local time: 08:24
Member (2008)
English to Kazakh
TOPIC STARTER
Thanks for suggestions! Feb 9


Amit Evron wrote:

If it's not confidential and if the file isn't too big, feel free to send it over and I'll write a quick perl script. Shouldn't take more than 5 minutes. Just send me a message through Proz and I'll reply with my e-mail address.


It is confidential


Direct link Reply with quote
 

Nurzhan Nagashbekov  Identity Verified
Kazakhstan
Local time: 08:24
Member (2008)
English to Kazakh
TOPIC STARTER
This may work... Feb 9


Tony M wrote:

Haven't tested it, but why not try this:

Select all your text and paste it into Word (etc.)

Do a 'replace all' on the " ....



Thanks Tony, I will try your method.


Direct link Reply with quote
 

Jabberwock  Identity Verified
Poland
Local time: 04:24
Member (2004)
English to Polish
Okapi Rainbow Feb 9

I think the best option would be to use Okapi Rainbow, especially if you expect more such work form the client. Basically, it would allow you to extract the text you require (using regular expressions) and then calculate the wordcount.

Trados 2007 also has an option to import text based on regular expressions. You have to use a separate application Filter Settings for this. After the import you just analyze the resulting ttx file as usual.

I realize that having to learn regular expressions might seem daunting, but if you plan to translate such texts it will be a sensible investment of your time...


Direct link Reply with quote
 
FarkasAndras
Hungary
Local time: 04:24
English to Hungarian
+ ...
CAT Feb 9

I fervently hope that you'll be using a CAT for this job. The localization of SW strings requires strict formatting consistency and there are a lot of repetitions etc., so it' really not the job you'd want to do by typing over the original.
Now, If you do use a CAT, just do the word count there.
Studio has the required capabilities (i.e. you can specify regex rules that separate the translatable text from the rest), and the Studio package also comes with a specialized sw localization tool (Passolo). Of course there are lots of other tools that'll work, too.

The more interesting question is: who is in charge of this project? Isn't there a PM/client who sorts these things out before you get involved?


Direct link Reply with quote
 

Philip Lees  Identity Verified
Greece
Local time: 05:24
Member (2008)
Greek to English
Try this Feb 9

I had a few minutes to spare, so I set up this:

http://quote.writewords.eu/

If you paste your text in the box and click Submit, it should return you only the stuff that's between quotes.


Direct link Reply with quote
 
FarkasAndras
Hungary
Local time: 04:24
English to Hungarian
+ ...
perl regex Feb 9


Philip Lees wrote:

Give the file to somebody you know who uses the Perl programming language, and ask them to run this:

perl -i.bak -pe "s/^.+?\"//; s/\".+$//" yourfilename

That will remove everything from your file except the parts in quotes (the original file will be renamed as yourfilename.bak). You can then count the words in the new file.

This assumes that all the lines have the same format.


It also assumes that there is only one pair of quotes in one line and that there are no escaped quotes inside quoted strings. It'll fail with lines like this:
StringID:4567267; text:"Press the \"Browse\" button to pick a file"; Button:"Browse"
And it doesn't skip lines that have no translatable content at all.

Also, .+? is better written as .* and the " may very well be the last character on the line so .+$// should be .*$//.

So, I'd rewrite your one-liner as:
perl -i.bak -pe "s/^.*\"(.*)\".*$/$1/" yourfilename

...but this still doesn't handle the problem cases I mentioned above.
You could do this (untested) to delete lines that don't contain any quoted string:

perl -i.bak -pe "next unless /\".*\"/; s/^.*\"(.*)\".*$/$1/" yourfilename

... but the bottom line is, it's still only usable if the input file is "simple". You could add negative lookahead/lookbehind to cater for escaped quotes inside the quoted strings etc. to make it work and then somehow adapt it for multiple strings per line, but it starts to get tricky there, and you need to see the input file (or know its spec) to take a reasonable stab at solving the problem.

[Edited at 2012-02-09 10:54 GMT]


Direct link Reply with quote
 

Nurzhan Nagashbekov  Identity Verified
Kazakhstan
Local time: 08:24
Member (2008)
English to Kazakh
TOPIC STARTER
Initial stage of the project Feb 9

I am at the very beginning of the project and just wanted to know what is the wordcount for now. I will definitely try regex. Thanks!

Direct link Reply with quote
 

Philip Lees  Identity Verified
Greece
Local time: 05:24
Member (2008)
Greek to English
Nobody's perfect Feb 9


FarkasAndras wrote:

It also assumes that there is only one pair of quotes in one line and that there are no escaped quotes inside quoted strings. It'll fail with lines like this:
StringID:4567267; text:"Press the \"Browse\" button to pick a file"; Button:"Browse"



Oh, sure, it breaks in lots of cases, as does the simpler match I used on the web version:

/"(.+?)"/

I am well aware of the pitfalls of text parsing, which is why I added the caveat about all lines having the same format as the example provided.

As this is not a Perl or a regex forum, I'll leave it at that.


Direct link Reply with quote
 

Ambrose Li  Identity Verified
Canada
Local time: 22:24
Member (2011)
Chinese to English
+ ...
simplier perl code Feb 9

I think this single line of perl should suffice:

perl -nle 'print $1 if /#command\s+"([^"]*)"/'

This assumes that double quotation can’t occur inside the pair of double quotation marks that marks the string to be translated. Usually this is not the case and (assuming that " is escaped with a single backslash) the perl needed will more likely be

perl -nle 'print $1 if /#command\s+"((?:\\"|[^"])*)"/'

Of course, if escaping of quotation marks occurs but is not signalled by backslashes then the perl code needed will be different.

ETA: The above assumes that continuations don’t occur. If continuations do occur the above won’t work and one-liner solutions might not be sufficient…

[Edited at 2012-02-09 19:03 GMT]


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Catherine Piéret[Call to this topic]
Maria Castro[Call to this topic]

You can also contact site staff by submitting a support request »

Counting words in a txt file within quotation marks







XTM Cloud
20,000 extra words free with XTM Cloud!

A fully featured online CAT tool and TMS, with no installation required, and a simple, intuitive interface. Maximize linguistic assets by sharing in real time as you collaborate with colleagues. Make use of next generation, cloud-based translation technol

More info »
memoQ translator pro 5.0
Save 20% with memoQ today!

memoQ translator pro is the premium product for professionals. It is Kilgray's best-selling tool among freelance translators: you get all the functionality available in memoQ in your local environment plus the ability to work on remote servers.

More info »