How to convert streaming text in Word file into columned text
Thread poster: ahmadwadan.com

ahmadwadan.com  Identity Verified
Kuwait
Local time: 22:06
English to Arabic
+ ...
Oct 20, 2008

Peace be upon you,

Lets say that you have a Word file formatted as follows:

Term1 and definition of Term 1 in the same line, then Paragraph Character.
Term2 and definition of Term 2 in the same line, then Paragraph Character.

And So on. The question is: how to convert text to be in a table of 2 columns where Term is on the left and definition is on the right (as if it is an excel file)?

Notes:

- There is no blank lines between terms.
- Inserting special characters manually is possible.
- Some terms take more than 1 line without containing special characters.
- Many terms consist of more than 1 word.

Thank you



[Edited at 2008-10-20 09:22]


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 21:06
Member (2006)
English to Afrikaans
+ ...
You have little choice, I think Oct 20, 2008

Ahmad Wadan wrote:
- There is no blank lines between terms.


That is actually a good thing.

- Inserting special characters manually is possible.


I'm afraid that is what you're going to have to do.

- Some terms take more than 1 line without containing special characters.


You'll have to ensure that no term+definition stretches over more than one paragraph (in other words, there must not be a paragraph mark in the middle of such a "line").

- Many terms consist of more than 1 word.


Yes, that is why manual work is for your fate.

==

Essentially you're going to use your mouse and the TAB key. Click at the end of the term and press TAB (preferrably press TAB twice, just to make sure). Then do Find/Replace to turn double tabs into single tabs, and to remove spaces on either side of tabs. Then select the entire text and go Table -> Convert -> Text to Table and select tab as the delimiter.


Direct link Reply with quote
 

Renée van Bijsterveld  Identity Verified
Netherlands
Local time: 21:06
Member (2007)
English to Dutch
+ ...
convert to table Oct 20, 2008

Replace the space (or other character, like a period) between term 1 and term 2 with a tab.
Select all text.
Choose Table > Convert > Text to table
Select # of columns
Check Tab as list separator
Choose OK


Direct link Reply with quote
 

ahmadwadan.com  Identity Verified
Kuwait
Local time: 22:06
English to Arabic
+ ...
TOPIC STARTER
Thank you Oct 20, 2008

Thank you so much Samuel & ReneevB.

Thats brilliant and agrees with my previous experience with the same. However, I wish there could be a way to avoid donkey work since I'll have to insert more than 3000 tabs (or whatever) manually!

Kind regards


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 21:06
Member (2005)
English to Czech
+ ...
It depends Oct 20, 2008

Is the term in bold and the definition not in bold?
If yes, there is some hope for automated processing. Or is there any other "sign" by which you recognise the end of the term and beginning of the definition?

Antonin


Direct link Reply with quote
 
xxxmediamatrix
Local time: 15:06
Spanish to English
+ ...
Exploit the statistics of the source file - reduce the time and effort Oct 20, 2008

The tedium of the job can be reduced significantly by first inspecting the data and deciding the most-common number of words in a term (1, 2, 3 ... probably not more than 4 or 5); call this number n.

- Convert the entire text to a table, splitting the columns at the nth space only. Note: the standard Word text-to-table tool won't do that - you'll have to write a macro (or, better still IMHO, transfer the data to Access which makes this sort of thing soooooo much easier).

- Add a single extra column to the table.
- Go through the table, typing an 'x' in the new column against all the terms/definitions that were split correctly.
- Sort the table on the 'x' field.
- Select all rows with 'x', copy, paste to a new table.
- Delete those same rows from your 'working table'.

If the statistics are in your favour - and you correctly assessed the most-common term length - you have probably extracted well over half the terms.

- Repeat the above process with the second most-common term length.

Depending on the structure of your data you may quickly get to a stage where the law of diminishing returns kicks in and it's quicker to split them manually.

And if you are likely to have to repeat this sort of exercise very often, it may be worth writing a Word macro to automate the entire process (except checking off the correctly-split terms, of course). I always handle this kind of problem in Access, but that's only because it provides easy-to-use tools to analyse and split your text lines; however, the same result can be achieved using a Word macro, or in Excel.

Final thought: If you need to restore a specific term-order after this processing (other than ordinary alphabetical order of the terms), then you should first add a column with serial ID numbers (an 'autonumber' field in Access). When the text manipulation is finished you can then sort the final table using this number to restore the original order. Again, this is very easily done in Access but requires some macro skills in Word.

MediaMatrix


Direct link Reply with quote
 

ahmadwadan.com  Identity Verified
Kuwait
Local time: 22:06
English to Arabic
+ ...
TOPIC STARTER
Yes Oct 20, 2008

Antoní­n Otáhal wrote:

Is the term in bold and the definition not in bold?
If yes, there is some hope for automated processing. Or is there any other "sign" by which you recognise the end of the term and beginning of the definition?

Antonin


Yes, the term is bold and the definition is not bold. However, the bold term contains more than 1 bold word.

What do you think?

Thank you

[Edited at 2008-10-20 13:17]


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 21:06
Member (2005)
English to Czech
+ ...
OK then Oct 20, 2008

go Edit-Replace, Use Wildcards and leave out the quotation marks below:

In the first run, put in the find field: (<*>) with bold formatting of font and in the replace field "\1^t"


In the second round, put in the find field: "(<*>)^t (<*>)" with bold formatting of font and in the replace field "\1 \2"


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 21:06
Member (2005)
English to Czech
+ ...
correction Oct 20, 2008

Put the closing parenthesis instead of the smiley face in my previous post
Antonin


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 21:06
Member (2005)
English to Czech
+ ...
Showing code on this forum is a bit difficult Oct 20, 2008

So, you can see both replace dialog boxes here:

ftp://public.otahal.biz:anonym@public.otahal.biz/081020_proz.jpg

Antonin


Direct link Reply with quote
 

ahmadwadan.com  Identity Verified
Kuwait
Local time: 22:06
English to Arabic
+ ...
TOPIC STARTER
Thank you Oct 20, 2008

Antoní­n Otáhal wrote:

So, you can see both replace dialog boxes here:

ftp://public.otahal.biz:anonym@public.otahal.biz/081020_proz.jpg

Antonin


Dear Antoní­n Otáhal,

Your input is of great help!

Thank you so much


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to convert streaming text in Word file into columned text

Advanced search






memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums