Nightly build: Prioritising of Terms by Properties
Thread poster: xxxwilhelm_zwo
xxxwilhelm_zwo
Netherlands
Local time: 23:42
German to Dutch
Aug 12, 2013

It is now possible to use properties (subject field, source, client etc.) to prioritise target terms in auto-assembling:

http://cafetran.wikidot.com/using-advanced-glossary-features#properties


Direct link Reply with quote
 

Selcuk Akyuz  Identity Verified
Turkey
Local time: 01:42
Member (2006)
English to Turkish
+ ...
Algorithms Aug 13, 2013

Assigning priorities may need a fine tuning, e.g.

1. Terms from the same project, client, subject
2. Terms from the same client or
2. Terms from the same subject
3. Terms from the same client but different project or
3. Terms from the same subject but different project

(I am not sure what does 'source' mean in the wiki so it is not included in the above list)

Which one is important for CafeTran, same client or same subject?

Is there a built-in subjects list or can we store and reuse such a list? Same for the clients, can we store our clients list?


Direct link Reply with quote
 

Selcuk Akyuz  Identity Verified
Turkey
Local time: 01:42
Member (2006)
English to Turkish
+ ...
Glossary or Database Aug 13, 2013

Many features related to terminology added in recent builds. You can store terms in tmx (memory for terms), in a text file as glossary, in a text file as dictionary and in an external database (H2, MySQL, Oracle 10g, HSQLDB 2.0, MS Access, and Derby).

Many alternatives with different features! It is flexible but confusing. I prefer databases but would like to edit them easily as well.

Do we really need all those alternatives (memory for terms, glossary, dictionary, external database).

Selcuk


Direct link Reply with quote
 

Meta Arkadia
Local time: 05:42
English to Indonesian
+ ...
Philosophy Aug 13, 2013

Selcuk Akyuz wrote:
Do we really need all those alternatives (memory for terms, glossary, dictionary, external database).

No, we don't, Selçuk. We only need one, maybe two of those categories, I think. For people who know what they are doing - people who have a CAT tool "philosophy", I'd call it - it's brilliant what CafeTran has to offer. You prefer "real" databases - as you do? Great, use real databases. Want TMX or tab delimited text files? Go ahead, it's all there. But I'm pretty sure most users, both CAT tool newbies and long-time CAT users, don't really have a philosophy. They just choose what they were used to use, or what the CAT tool manual suggests them to use. And the latter happens to be one of CafeTran's problems. The only problem?

I've been a CAT tool user since 1997. Yes, DejaVu. Few years later, there was a kind of consensus among DV3 users on how to deal with databases: The MDB for segments, the TDB for general terms and phrases, the Lexicon for project/client/subject specific phrases. Good. That's clear.
I've been using CafeTran for almost exactly three years this month. It's a terrific tool, arguably the best tool around. But I'm sure I use less than half of the features CafeTran boasts, and I don't even understand a fair part of those I don't use. Now what if you're a CT newbie? A CAT tool newbie?

Cheers,

Hans

[Edited at 2013-08-13 03:44 GMT]


Direct link Reply with quote
 
xxxwilhelm_zwo
Netherlands
Local time: 23:42
German to Dutch
TOPIC STARTER
Priorities Aug 13, 2013

Selcuk Akyuz wrote:

Assigning priorities may need a fine tuning, e.g.

1. Terms from the same project, client, subject
2. Terms from the same client or
2. Terms from the same subject
3. Terms from the same client but different project or
3. Terms from the same subject but different project



I have no idea, I'm afraid ...

Which one is important for CafeTran, same client or same subject?


It's just not such a good article (yet). The name of the property 'Source' is a little misleading (since there is also a field 'Source term')

Is there a built-in subjects list or can we store and reuse such a list? Same for the clients, can we store our clients list?


There isn't. But it's a good idea!


Direct link Reply with quote
 
xxxwilhelm_zwo
Netherlands
Local time: 23:42
German to Dutch
TOPIC STARTER
Pick what you like Aug 13, 2013

Selcuk Akyuz wrote:

Many features related to terminology added in recent builds. You can store terms in tmx (memory for terms), in a text file as glossary, in a text file as dictionary and in an external database (H2, MySQL, Oracle 10g, HSQLDB 2.0, MS Access, and Derby).


Not sure whether dictionaries belong here ...


Many alternatives with different features! It is flexible but confusing.


Yes, that is often the case in the real world .

I prefer databases but would like to edit them easily as well.


What tasks do you (want to) execute with databases, that you cannot perform with tab-delimited glossaries?

Do we really need all those alternatives (memory for terms, glossary, dictionary, external database).


I'm not sure yet. Perhaps from a certain complexity on (e.g. in situations like the ones described in your previous posting), databases will have advantages over tab-delimited text glossaries. I'll investigate it, but it's very unlikely that I'll start using them for my daily work. I never used them in Déjà Vu either ...


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 22:42
Member (2009)
Dutch to English
+ ...
built-in subjects/clients list for storing/reusing/keeping track of subjects/clients Aug 13, 2013

Selcuk Akyuz wrote:

Is there a built-in subjects list or can we store and reuse such a list? Same for the clients, can we store our clients list?


Hi Selcuk,

I just sent a RFE to Igor about this. Good idea!

Michael


Direct link Reply with quote
 
xxxwilhelm_zwo
Netherlands
Local time: 23:42
German to Dutch
TOPIC STARTER
Client codes Aug 13, 2013

Client code has to be related to the client record of the Billing component, of course.

Direct link Reply with quote
 

Selcuk Akyuz  Identity Verified
Turkey
Local time: 01:42
Member (2006)
English to Turkish
+ ...
lists Aug 13, 2013

Michael Beijer wrote:

Selcuk Akyuz wrote:

Is there a built-in subjects list or can we store and reuse such a list? Same for the clients, can we store our clients list?


Hi Selcuk,

I just sent a RFE to Igor about this. Good idea!

Michael


Good idea for the clients list but I am not sure about a built-in subjects list. In Déjà Vu X it is hierarchical, e.g.
3 Social Sciences
314 Demography
316 Sociology
3161 Object and scope of sociology

5 Natural Sciences
57 Biological sciences in general
572 Anthropology

6 Technology
697 Heating, ventilation and air conditioning of buildings

It may not be a good list and some translators delete and create their own hierarchical lists but the logic is if the project's subject is 316 Sociology then a term with subject code 314 will have priority over another with code 697.

Déjà Vu X uses client and subject metadata in both TBs and TMs. CafeTran currently uses them only in "Glossaries". So a good start but copying only part of a feature is not sufficient.

Moreover the algorithms in prioritization of terms (and TM matches) are important as mentioned above. Nightly builds are good but the different terminology resources are still confusing. It should be simplified by dropping "memories for terms" and "dictionaries". Glossaries in txt format and external databases (only if supported by a built-in editor) should be sufficient.


wilhelm_zwo wrote:
What tasks do you (want to) execute with databases, that you cannot perform with tab-delimited glossaries?


Ease of find&replace operations in "columns", I know it is possible in text files as well at least through Excel. Perhaps we need only the Glossaries


wilhelm_zwo wrote:
Client code has to be related to the client record of the Billing component, of course.


Agreed.

Selcuk


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 22:42
Member (2009)
Dutch to English
+ ...
overzichtelijkheid* Aug 13, 2013

Selcuk Akyuz wrote:

Good idea for the clients list but I am not sure about a built-in subjects list. In Déjà Vu X it is hierarchical, e.g.

3 Social Sciences
314 Demography
316 Sociology
3161 Object and scope of sociology
31619 Crazy abstract features of layered rhizomes

5 Natural Sciences
57 Biological sciences in general
572 Anthropology
5728 Study of upside-down thought processes in blind mole rats

6 Technology
697 Heating, ventilation and air conditioning of buildings
6973 Something even more specific and useless

It may not be a good list and some translators delete and create their own hierarchical lists but the logic is if the project's subject is 316 Sociology then a term with subject code 314 will have priority over another with code 697.


DVX's approach borders on the pathological, if you ask me. Who the hell actually uses the built in list of subjects in DVX? In my opinion, hierarchical lists are for people with way too much time on their hands and who have to touch doorknobs 4 times while blinking 8 times every time they enter or exit their living room.


Déjà Vu X uses client and subject metadata in both TBs and TMs. CafeTran currently uses them only in "Glossaries". So a good start but copying only part of a feature is not sufficient.


Yes, the next step would be metadata-based prioritisation in TMs as well.

Nightly builds are good but the different terminology resources are still confusing. It should be simplified by dropping "memories for terms" and "dictionaries". Glossaries in txt format and external databases (only if supported by a built-in editor) should be sufficient.


I wouldn’t really want to remove something that certain users find useful, but I would probably also be in favour of dropping the ‘memories for terms’ (M4Ts). I also find the fact that there are 1. memories for terms, 2. glossaries, 3. external databases and 4. dictionaries rather confusing. It’s also not exactly overzichtelijk (http://en.wiktionary.org/wiki/overzichtelijk) to newbies. However, I would definitely, 100%, be in favour of getting rid of ‘dictionaries’.

One thing about the ‘memories for terms’ though is that they have various functions that can’t be replicated if they were to be removed.

E.g.,

1. there are currently two shortcuts to quickly add terms on-the-fly (to a glossary, and a M4Ts, respectively). If this was changed to 2 diff. shortcuts to send terms to (2 different) glossaries, I might not care if the M4Ts get retired from service, but until then I use an open M4Ts to send longer fragments to on-the-fly, while I send everything else (shorter terms) to my main glossary, while adding subject/client info along the way.

2. I think Hans mentioned something, somewhere, about terminology QA-ing that can presently only be done with M4Ts and not with Glossaries.

etc.

Michael

---------------------------- —
*overzichtelijkheid, de (v.)

clear / convenient arrangement, clear / convenient organization
surveyability
 
usage examples:
ter wille van de overzichtelijkheid = for easy reference, for convenience of comparison, for purposes of review
de overzichtelijkheid laat veel te wensen over = the arrangement / organization (of the material) leaves much to be desired, 2 the arrangement / organization (of the material) is poor
---------------------------- —


Direct link Reply with quote
 

Selcuk Akyuz  Identity Verified
Turkey
Local time: 01:42
Member (2006)
English to Turkish
+ ...
continued Aug 13, 2013

Michael Beijer wrote:

DVX's approach borders on the pathological, if you ask me. Who the hell actually uses the built in list of subjects in DVX? In my opinion, hierarchical lists are for people with way too much time on their hands and who have to touch doorknobs 4 times while blinking 8 times every time they enter or exit their living room.



As I wrote it may not be a good list and some translators delete and create their own hierarchical lists. I personally use 7 or 8 of these subjects.



Yes, the next step would be metadata-based prioritisation in TMs as well.


Another nightly build?

I wouldn’t really want to remove something that certain users find useful, but I would probably also be in favour of dropping the ‘memories for terms’ (M4Ts). I also find the fact that there are 1. memories for terms, 2. glossaries, 3. external databases and 4. dictionaries rather confusing. It’s also not exactly overzichtelijk (http://en.wiktionary.org/wiki/overzichtelijk) to newbies. However, I would definitely, 100%, be in favour of getting rid of ‘dictionaries’.


Such major changes should be made in new major versions not in nightly builds. So that one can continue using version 2013 or upgrade to 2014. Recently CafeTran is copying features from other CAT tools (nothing wrong in that) but most of these new features are requested by new users of CafeTran and I am not sure if all users are happy with the new features.

Some want a DV3-like CT, some others want a memoQ-like CT. All welcome, but the GUI, menus, terminology resources etc. need major changes which should not be made in nightly builds.

A new version should have much better file filters as well. Currently any import options in Word, Excel, PowerPoint files? No!


Direct link Reply with quote
 
xxxwilhelm_zwo
Netherlands
Local time: 23:42
German to Dutch
TOPIC STARTER
No Copy CAT! Aug 13, 2013

Selcuk Akyuz wrote:

Recently CafeTran is copying features from other CAT tools (nothing wrong in that) but most of these new features are requested by new users of CafeTran and I am not sure if all users are happy with the new features.



Well, Igor surely isn't copying any CAT tool (he never looks at other CAT tools, which is a good thing IMO). When a new feature is implemented, it's on user's request. Very well possible – even very likely – that this feature already has been implemented elsewhere. Nothing new under the sun ...

About improvements in nightly builds: I think they are very productive but still minor, not justifying a new version number. If one is happy with his current version of CafeTran, by all means stick to it. If you want to profit from the latest improvements, ask for updates. Have a look at http://cafetran.wikidot.com/pre-release-version now and then. If there is something there that you need: ask Igor for an update. Else, wait for the next official release.


Direct link Reply with quote
 

Selcuk Akyuz  Identity Verified
Turkey
Local time: 01:42
Member (2006)
English to Turkish
+ ...
no copycat Aug 13, 2013

I know what Igor said in the google group, he does not know the features in other CAT tools. But it is us, new users who know the features in other CAT tools and ask for similar features.

Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 22:42
Member (2009)
Dutch to English
+ ...
Progress relies on both copying & imagination. Aug 13, 2013

I have said this elsewhere and I will say it again.

Question: How do you create the best CAT tool ever (in the shortest time)?

Answer: You need to perform two different types of thinking simultaneously/synthetically, etc.:

1. Look at every other CAT tool on the market and try to identify all of their useful features. If you find a feature that your CAT is lacking, copy it into your own CAT tool, modifying/improving it wherever necessary. You can of course also copy pieces of features from different tools and combine them as you see fit.

2. Try to take a step back and look at your CAT tool through a fresh pair of eyes. This can be done either because you simply don’t know about any other CAT tools or imaginatively.

I think that one of the reasons that CT is so good is that Igor is good at 2. and his users are good at 1 and 2. The interplay between Igor and his users and the above modes of thought has created a particularly fertile type of soil for the CafeTran plant to grow and evolve in.

Michael

[Edited at 2013-08-13 17:18 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie[Call to this topic]

You can also contact site staff by submitting a support request »

Nightly build: Prioritising of Terms by Properties

Advanced search






PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search