How to segment 15th to 17th century Arabic manuscripts for CAT use
Thread poster: Haytham Abulela

Haytham Abulela  Identity Verified
Canada
Local time: 18:43
Member (2008)
Arabic to English
+ ...
Mar 11, 2019

I have been translating Arabic alchemical manuscripts into English for few years, and wanted to use CAT tools to compile a translation memory and glossary, to use my previous translations in case similar text occurs. Various manuscripts contain quotations or relatively long allegories, which might be repeated in other manuscripts, though not verbatim. The challenge I have is that such manuscripts have little punctuation which makes segmentation hard. If I try to force my own interpretation of se... See more
I have been translating Arabic alchemical manuscripts into English for few years, and wanted to use CAT tools to compile a translation memory and glossary, to use my previous translations in case similar text occurs. Various manuscripts contain quotations or relatively long allegories, which might be repeated in other manuscripts, though not verbatim. The challenge I have is that such manuscripts have little punctuation which makes segmentation hard. If I try to force my own interpretation of segment length, I will have either big chunks of text, or small segments that risk nullifying the benefits of translation memory if segmentation was done differently for similar texts. In addition to this, subject specific glossaries are rare and I have to consult several sources and thesauri to find equivalents, and as such I wish to compile my own glossary to ensure quality and consistency.

You may check this printed book from the Asiatic Society of Bengal which compiles three of Ibn Umail's treatises, which demonstrates how an Arabic manuscript of the same period looks like: pahar.in/mountains/Journals/Asiatic%20Society%20of%20Bengal%201788-1921/Memoirs%20of%20Asiatic%20Society%20of%20Bengal/1933%20Memoirs%20of%20Asiatic%20Society%20of%20Bengal%20Vol%2012%20s.pdf It is a 228 page PDF file, but a quick look at pages 7-17 should be enough.
Collapse


 

Suzanne Chabot
Local time: 20:43
Arabic to French
Slip segments Sep 16, 2019

Hello

I am translating from Arabic into French and I have the same problems.

The solution I found is to make the i'rab of the sentence in my brain, and when I understand that the 'wa' is only a way to connect two sentences, I split the paragraph at this point, using the 'Split segment' functionality in SDL Trados Studio. I know many CAT tools enable you to do so.

This cannot be done automatically, since the 'wa' in Arabic has many meanings, but also, it is
... See more
Hello

I am translating from Arabic into French and I have the same problems.

The solution I found is to make the i'rab of the sentence in my brain, and when I understand that the 'wa' is only a way to connect two sentences, I split the paragraph at this point, using the 'Split segment' functionality in SDL Trados Studio. I know many CAT tools enable you to do so.

This cannot be done automatically, since the 'wa' in Arabic has many meanings, but also, it is often attached to the first word of the sentence.

Concerning the term recognition, I did not find any other solution than to add each term with all the different 'damîr' attached to it and the broken plural forms and so on... These CAT tools are so bad to recognize the Arabic terms. Hope they will fix this problem one day. The problem become worse when there are 'harakat' attached to a word, which is very frequent when you translate classical Arabic documents.

Wish you all the best. Baraka Allahu fik.
Collapse


 

Haytham Abulela  Identity Verified
Canada
Local time: 18:43
Member (2008)
Arabic to English
+ ...
TOPIC STARTER
Solution found Oct 22

I have been trying to figure a way out of this dilemma, and fortunately I found a way.

Since I type the manuscripts into an MS Word file to avoid relying on scans that are at times blurry or blotted with stains, the digital text is available for use. I thought about a highlighting method to highlight words in MS Word so I have a visual indication that I have this word in my glossary which is in MS Excel spreadsheet. Designing a macro to highlight words in MS Word is easy, but macro
... See more
I have been trying to figure a way out of this dilemma, and fortunately I found a way.

Since I type the manuscripts into an MS Word file to avoid relying on scans that are at times blurry or blotted with stains, the digital text is available for use. I thought about a highlighting method to highlight words in MS Word so I have a visual indication that I have this word in my glossary which is in MS Excel spreadsheet. Designing a macro to highlight words in MS Word is easy, but macro code does not recognize non-Latin characters. This means that adding Arabic words to a macro is impractical, since editing or updating them in the macro becomes impossible or will require additional steps that will complicate the process beyond practical use. So I reached a conclusion that the only viable option is to have a macro designed that highlights words in MS Word files which calls words from an MS Excel spreadsheet. I posted this task on Freelancer.com and hoped that it will be a possible code. Fortunately I found someone who wrote an MS Excel macro that reads/calls words from within the glossary by means of turning it into a macro enabled spreadsheet, which has a action button. Clicking this button will open an "Open file" dialogue box, where the user chooses the MS Word file to apply the process to, then the process will open the MS Word file and start highlighting until the list of words in MS Excel spreadsheet is finished. After this you have the highlights you want and can now save the MS Word file. This process will require manual changes to remove false positives, since it is better to ignore diacritics, hamzas, final yaa, and matching whole words. As such, a review will be required to remove wrong highlights.

I hope you find this workaround useful.
Collapse


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


How to segment 15th to 17th century Arabic manuscripts for CAT use

Advanced search






SDL MultiTerm 2021
One central location to store and manage multilingual terminology.

By providing access to all those involved in applying terminology (such as engineers, marketers, translators, and terminologists), our terminology management solution ensures consistent and high-quality content from source through to translation.

More info »
WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search