Pages in topic:   [1 2] >
Counting lines
Thread poster: Serge Driamov

Serge Driamov  Identity Verified
Belarus
Local time: 21:51
Member (2008)
English to Russian
+ ...
Aug 18, 2008

A client wanted me to charge price per line and I did. Now I am puzzled by the task of counting the lines: the document is a broshure with the landscape orientation of the text. It has various figures with titles, legends and explanations, tables, etc. The text is in German.

[Edited at 2008-08-19 11:34]


Direct link Reply with quote
 

KSL Berlin  Identity Verified
Portugal
Local time: 19:51
Member (2003)
German to English
+ ...
What's the problem? Aug 18, 2008

Did you agree on the standard definition of a line? Although most people I know use 55 characters including spaces, I've seen other variations such as 60 characters incl. spaces (Thieme), 50 characters without spaces and 50 characters with spaces.

If your text is mixed up in body text, chart objects, etc. and a bit complex to count, you can save yourself some effort by making a PDF then simply saving the text out of it and counting that. This is also a good way to check whether your counting by other methods is accurate (though you may have to remove repetition from the header and footer on each page of the PDF).


Direct link Reply with quote
 

Serge Driamov  Identity Verified
Belarus
Local time: 21:51
Member (2008)
English to Russian
+ ...
TOPIC STARTER
making PDF? Aug 18, 2008

Thank you Kevin. I don't see, however, how PDF can help here. What does it change if I convert the file into PDF and back? Actually, it came as PDF initially. The tables and figures with their text pieces will remain. The header and the footer do not bother me any way.
Pity, I did not make an agreement with the client regarding the number of characters in a line. It may be a good idea to do it in order to clarify this point.


Direct link Reply with quote
 

KSL Berlin  Identity Verified
Portugal
Local time: 19:51
Member (2003)
German to English
+ ...
PDF Aug 18, 2008

OK, I didn't know what your formats were, nor is it clear to me if you intend to charge source text lines or target text lines. If the latter, you'll have to make a PDF of your translation for my method.

The reason that I do this is that *assuming there is no bitmap text*, by saving a text file from the PDF all the the text is captured in a way that is easily counted. If, on the other hand, you have a bunch of embedded Excel tables, PowerPoint objects, Visio drawing, charts or whatnot in your document, it is damned hard to get an accurate count any other way. I have seen many projects where the PMs screwed their own agencies by missing elements like this and based their offers to the end customer on too low a word or line count. Making a PDF and saving out the text file (or running an OCR on a bitmap document) is one way of doing a "sanity check" and making sure that you do not cheat yourself by overlooking text in your count.

[Edited at 2008-08-18 22:05]


Direct link Reply with quote
 

Peter Linton  Identity Verified
Local time: 19:51
Member (2002)
Swedish to English
+ ...
PDF software essential Aug 18, 2008

Kevin Lossner's PDF suggestions, particularly for wordcounts, are excellent advice, and PDF software should be a basic tool in every translators' toolbox. Such software has certainly saved me much time and effort.

Direct link Reply with quote
 

Astrid Elke Witte  Identity Verified
Germany
Local time: 20:51
Member (2002)
German to English
+ ...
You use PractiCount to count the lines Aug 18, 2008

It is normal to charge for source lines of 55 characters with spaces.

You have to have a Word document first, and it does not matter what it contains (such as tables). You let PractiCount count the lines. It calls them "Custom lines". It also counts the actual lines, so you have to make sure that you look at the figure for "custom lines". Actually, you have to go into the settings first and define your line as 55 characters with spaces.


Direct link Reply with quote
 

Richard Bartholomew  Identity Verified
Germany
Local time: 20:51
Member (2007)
German to English
No. characters / (No. characters / line) Aug 19, 2008

You can get the number of characters in a Word document, with or without spaces, by left clicking on the word count field in the lower left-hand corner of the window. I just divide this number by the number of characters per line, which I've already published on my web site (55 with spaces). Then I multiply this result by the per line charge, which I also publish on my web site, and I have a number for the invoice. 35,252 characters, for example, corresponds to 35,252 characters / (55 characters / line) = 641 lines. Finally, I multiply this number by the per line rate and I have a number for the invoice.

I find that if I publish details like how many characters per line, with or without spaces, per line charge, and so on, then I can use them as defaults for any details missing from my agreement with the agency. There shouldn't be any missing details, of course, but sometimes there are.


Direct link Reply with quote
 

KSL Berlin  Identity Verified
Portugal
Local time: 19:51
Member (2003)
German to English
+ ...
PractiCount Aug 19, 2008

Astrid Elke Johnson wrote:
You have to have a Word document first, and it does not matter what it contains (such as tables).


I wasn't talking about ordinary tables in MS Word. Are you certain that PractiCount will include text from all the types of embedded objects one encounters in Word documents these days? There are often serious problems with counts I see from agencies, because these are skipped by the various methods they use.


Direct link Reply with quote
 

Lawyer-Linguist  Identity Verified
Portugal
Local time: 19:51
Dutch to English
+ ...
Not everywhere ... Aug 19, 2008

Astrid Elke Johnson wrote:

It is normal to charge for source lines of 55 characters with spaces.



May be the norm in Germany Astrid, but it's not a universal standard. If it was, these word/line count programs like PractiCount and Total Assistant wouldn't need to have various settings for lines.

Many Belgian agencies - for instance - work on 60 characters with spaces. I even have one agency client - incidentally in Germany - that works on 50 characters with spaces.

Before anything else, Dr_Serge needs to clarify the definition of a standard line with his client.

Have a good day everyone
Debs

[Edited at 2008-08-19 08:06]


Direct link Reply with quote
 

Serge Driamov  Identity Verified
Belarus
Local time: 21:51
Member (2008)
English to Russian
+ ...
TOPIC STARTER
OCR and bitmap Aug 19, 2008

Many thanks to everyone taking part in the discussion. I have already started approaching my client to clarify the size of a line.
Especial thanks to Kevin wiht his thorough way of counting. I am not sure, however, that I understand everything. Obviously, I am behid in software. What is OCR? What is the difference of bitmap compare to other formats in the respect of characters/words/lines counting?

[Edited at 2008-08-19 09:50]


Direct link Reply with quote
 

Astrid Elke Witte  Identity Verified
Germany
Local time: 20:51
Member (2002)
German to English
+ ...
Conversion software Aug 19, 2008

Dr_Serge wrote:

What is OCR?


You use it to convert .pdf documents into Word documents. Abbyy Fine Reader is the best known.

"OCR" stands for "optical character recognition".

[Edited at 2008-08-19 10:46]


Direct link Reply with quote
 

Serge Driamov  Identity Verified
Belarus
Local time: 21:51
Member (2008)
English to Russian
+ ...
TOPIC STARTER
OCR Aug 19, 2008

Thanks Astrid. So, OCR stands for Fine Reader and its relatives?:)

Direct link Reply with quote
 
FHvastija
Slovenia
Local time: 20:51
English to Slovenian
+ ...
Technically yes Aug 19, 2008

OCR is a process where an image is analyzed by an application and text is extracted from it. This is commonly used on locked PDFs that don't allow you to copy text, screenshots, etc.

OCR apps became popular with the advent of home-scanners when everyone scanned documents into .tiff format and wanted to extract the text from them to avoid retyping it.

Right now I'd expect there's dozens of professional apps as well as shareware that can get the job done.


Direct link Reply with quote
 

KSL Berlin  Identity Verified
Portugal
Local time: 19:51
Member (2003)
German to English
+ ...
FineReader etc. Aug 19, 2008

Dr_Serge wrote:
So, OCR stands for Fine Reader and its relatives?:)


OCR = Optical Character Recognition. Fine Reader, Omnipage et alia are common applications that do this. If you get into the subject (very useful if you deal with PDF files and scans much), you might want to take a look on the "How To" tab on my profile and download the document called "Post-processing of OCR text files". It's a few years old and needs to be updated for a newer version of Abbyy FineReader (if I ever bother to upgrade from v7), but it gives an overview of things you can do to make your converted texts easier to work with if you use CAT tools. Many agencies do a miserable job in this regard, so I usually insist on doing these conversions myself or I charge them a fat premium for working with their defective documents.

But that is all far off the original topic. A good OCR tool like Fine Reader is very valuable, because it also allows you do do very fast counts of PDFs of ANY kind (except handwritten, of course), without the need to worry about embedded structures and text that might be skipped. Some of my customers use these tools just to do text counts to bid jobs for end customers and can't be bothered with the details of producing a good text for translation by OCR, which may take a lot of skill in some cases.

Peter Linton does nice presentations on this and other PDF subjects at ProZ events - if you ever get a chance to attend one where he is speaking, you won't regret spending time in that session. He's also a good person to ask about specific tools for odd cases with PDF. His experience in that regard is much broader than mine and a lot more up to date.


Direct link Reply with quote
 

Victor Dewsbery  Identity Verified
Germany
Local time: 20:51
German to English
+ ...
Where PractiCount is blind Aug 19, 2008

Kevin Lossner wrote:
I wasn't talking about ordinary tables in MS Word. Are you certain that PractiCount will include text from all the types of embedded objects one encounters in Word documents these days? There are often serious problems with counts I see from agencies, because these are skipped by the various methods they use.


I just had an interesting instance in which the character count in PractiCount was way over the top. It was a PowerPoint file with embedded Excel tables, and each table was just a part of an underlying Excel file (it seems you can embed just a bit of the table, and leave the rest in there but unseen). Only the visible bits needed to be translated, the hidden bits were to be left untouched. But PractiCount does not "see" which bits are visible and which bits are hidden, so it counts the whole kaboodle - and in this case the PractiCount result was more than twice as much as it should have been (even though it missed a bit of text in a couple of graphics). In this case, the PDF approach would have been far better.

BTW, this particular job fell through for other reasons, so the question is just academic for me at the moment. But the PDF hint (thanks Kevin and Peter) is now filed away in my brain for future use.


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Counting lines

Advanced search







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Across v6.3
Translation Toolkit and Sales Potential under One Roof

Apart from features that enable you to translate more efficiently, the new Across Translator Edition v6.3 comprises your crossMarket membership. The new online network for Across users assists you in exploring new sales potential and generating revenue.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs