smartCAT: match analysis
Thread poster: Chiara Foppa Pedretti

Chiara Foppa Pedretti  Identity Verified
Italy
Local time: 01:30
English to Italian
+ ...
Jun 13

Hello everyone,
I've just started using smartCAT (https://www.smartcat.ai/) and I'm pretty happy with its basic functions.
However, I can't find a way to run a match analysis on the texts I've uploaded.
Is it just me or is this function actually missing?

Thanks in advance!


Direct link Reply with quote
 

Mikhail Zavidin
Ukraine
Local time: 02:30
English to Russian
Statistics tab Jun 13

You can try to get the analysis by clicking on Statistics tab in your project window.

Hope this helps.


Direct link Reply with quote
 

Chiara Foppa Pedretti  Identity Verified
Italy
Local time: 01:30
English to Italian
+ ...
TOPIC STARTER
Yes! Jun 13

Thank you so much, Mikhail, that's it!
I'm afraid it's not totally reliable, since it counts at least 300 words less than Word does, but that's another story...
Thanks again!


Direct link Reply with quote
 

Pavel Doronin
Local time: 03:30
word count differencies Jun 28

Dear Chiara,
Every software has its own algorithm. Usually, the statistics are calculated following these steps:
Text extraction. Different software may or may not extract the text from footers, headers, tables of contents and embedded objects. This affects the total number of words or symbols. For example:
MS Word ignores header text, however it is included in the word count in SDL Trados and Smartcat.
MS Word does not include automatically generated page numbers in its statistics, while SDL Trados does.
MS Word counts the words in the table of contents as separate words, while SDL Trados and Smartcat do not (we believe it makes sense since it’s created automatically based on the titles and subtitles which will be translated anyway, so after the translation is completed, you will just need to update the table of contents).
Text segmentation (splitting the document into sentences). This is not applicable to MS Word. Here, the approach may be different, depending on:
What is considered a “segment” — For example, a line that contains only spaces will not be seen as a segment by both Smartcat and Trados, so the spaces won’t be counted as characters. However in MS Word, they will be considered characters, and included in the statistics.
Which characters (combination of characters, line breaks) are treated as segment delimiters — this may also affect the number of TM matches (in the cases when a Trados TM is used in a Smartcat document or a Smartcat TM is used in a Trados document).
The segments-into-words splitting can also work differently in different software and even different versions of the same software, as each of them utilize different algorithms. The differences may include:
Apostrophes or slashes are not treated as word delimiters in MS Word, unlike Trados and Smartcat (“Student’s Book” counts as 3 words).
Trados 2011 does not consider digits-only segments to be containing any words, while Trados 2007 and MS Word do.
Dashes are treated as delimiters in Trados 2007, but not in the other software.
MS Word counts numbers in numbered lists as separate words, while Trados and Smartcat ignore them.
Various character sequences, such as ________ or ***** are treated as words in MS Word but are not considered to be such by Trados and Smartcat.
PowerPoint statistics are a total mess.
And the list goes on.
Matches and repetitions — if two lines are almost identical and the only difference between them is a number, a tag or a certain kind of character, they will be considered to be repeating. For TM matches it works in a similar way.


[Edited at 2017-06-28 18:00 GMT]


Direct link Reply with quote
 

Chiara Foppa Pedretti  Identity Verified
Italy
Local time: 01:30
English to Italian
+ ...
TOPIC STARTER
Interesting Jun 29

Wow Pavel, thank you for this thorough explanation.
I've never needed to use CAT tools before, so that's all new to me.
However, after a couple of weeks, I can really confirm I love working with smartCAT, maybe even more after reading your post.


Direct link Reply with quote
 

Iris Schmerda  Identity Verified
France
Local time: 01:30
Member (2016)
French to German
+ ...
Can't find the statistics tab Jul 6

Hello,

is it possible for me to see this match analysis if I am not the person who uploaded the texts?
The agency did it, and I would like to check the statistics, but somehow don't manage to find them.

Thank you very much.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

smartCAT: match analysis

Advanced search







Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search