smartCAT: match analysis
Thread poster: Chiara Foppa Pedretti
I've just started using smartCAT (https://www.smartcat.ai/) and I'm pretty happy with its basic functions.
However, I can't find a way to run a match analysis on the texts I've uploaded.
Is it just me or is this function actually missing?
Thanks in advance!
You can try to get the analysis by clicking on Statistics tab in your project window.
Hope this helps.
Thank you so much, Mikhail, that's it!
I'm afraid it's not totally reliable, since it counts at least 300 words less than Word does, but that's another story...
| word count differencies || Jun 28 |
Every software has its own algorithm. Usually, the statistics are calculated following these steps:
Text extraction. Different software may or may not extract the text from footers, headers, tables of contents and embedded objects. This affects the total number of words or symbols. For example:
MS Word ignores header text, however it is included in the word count in SDL Trados and Smartcat.
MS Word does not include automatically generated page numbers in its statistics, while SDL Trados does.
MS Word counts the words in the table of contents as separate words, while SDL Trados and Smartcat do not (we believe it makes sense since it’s created automatically based on the titles and subtitles which will be translated anyway, so after the translation is completed, you will just need to update the table of contents).
Text segmentation (splitting the document into sentences). This is not applicable to MS Word. Here, the approach may be different, depending on:
What is considered a “segment” — For example, a line that contains only spaces will not be seen as a segment by both Smartcat and Trados, so the spaces won’t be counted as characters. However in MS Word, they will be considered characters, and included in the statistics.
Which characters (combination of characters, line breaks) are treated as segment delimiters — this may also affect the number of TM matches (in the cases when a Trados TM is used in a Smartcat document or a Smartcat TM is used in a Trados document).
The segments-into-words splitting can also work differently in different software and even different versions of the same software, as each of them utilize different algorithms. The differences may include:
Apostrophes or slashes are not treated as word delimiters in MS Word, unlike Trados and Smartcat (“Student’s Book” counts as 3 words).
Trados 2011 does not consider digits-only segments to be containing any words, while Trados 2007 and MS Word do.
Dashes are treated as delimiters in Trados 2007, but not in the other software.
MS Word counts numbers in numbered lists as separate words, while Trados and Smartcat ignore them.
Various character sequences, such as ________ or ***** are treated as words in MS Word but are not considered to be such by Trados and Smartcat.
PowerPoint statistics are a total mess.
And the list goes on.
Matches and repetitions — if two lines are almost identical and the only difference between them is a number, a tag or a certain kind of character, they will be considered to be repeating. For TM matches it works in a similar way.
[Edited at 2017-06-28 18:00 GMT]
| || || |
Wow Pavel, thank you for this thorough explanation.
I've never needed to use CAT tools before, so that's all new to me.
However, after a couple of weeks, I can really confirm I love working with smartCAT, maybe even more after reading your post.
| | Iris Schmerda
Local time: 01:30
French to German
| Can't find the statistics tab || Jul 6 |
is it possible for me to see this match analysis if I am not the person who uploaded the texts?
The agency did it, and I would like to check the statistics, but somehow don't manage to find them.
Thank you very much.
To report site rules violations or get help, contact a site moderator:
You can also contact site staff by submitting a support request »
smartCAT: match analysis
|Déjà Vu X3 |
|Try it, Love it|
Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market.
See the brand new features in action:
*Completely redesigned user interface
*Inline spell checking
More info »
|memoQ translator pro|
|Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.|
With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.
More info »