Pages in topic:   [1 2] >
S&R regex help please
Thread poster: Bilingualduo
Bilingualduo
Bilingualduo  Identity Verified
Italy
Local time: 20:23
English to Italian
+ ...
Apr 18, 2005

I would be very grateful if someone who knows how to use S&R strings could help with the following....

SOURCE TERM space TARGET TERM^p
Definition etc etc etc.^p

(There is a variable in this but I don't know if anything can done. Namely the SOURCE TERM in ALL CAP can be a varying number of words - could be 1 could be 3 or 4) as could the TERM.?

I would like to be able to:

i) find the space between last SOURCE TERM and FIRST TARGET word in
... See more
I would be very grateful if someone who knows how to use S&R strings could help with the following....

SOURCE TERM space TARGET TERM^p
Definition etc etc etc.^p

(There is a variable in this but I don't know if anything can done. Namely the SOURCE TERM in ALL CAP can be a varying number of words - could be 1 could be 3 or 4) as could the TERM.?

I would like to be able to:

i) find the space between last SOURCE TERM and FIRST TARGET word in upper case and replace it with a tab. (I presume I have to do this on a one-by-one basis given the variable number of words?)

ii)Find paragraph mark at end of line in UPPER CASE ONLY (i.e not the ^p in rest of document) and replace with tab

iii) move the next line (which begins with a Capital letter) up so that it follows on after the inserted tab at the end of UPPER CASE line.

The aim is to end up with a tab delimeted .txt glossary with Source term Target Term and definition-

Also, any suggestions on how to delete all the ^p EXCEPT where preceded by a full stop?

e.g. XXX XXX.^p
Xxx xx xxx^p xxx xxx xxx.^P
In this example, I would only want the ^p in the middle of second line replaced.
Hope I've been clear enough. Many thanks,
Bambi
Collapse


 
Aleksandr Okunev (X)
Aleksandr Okunev (X)
Local time: 21:23
English to Russian
My try Apr 18, 2005

i) find the space between last SOURCE TERM and FIRST TARGET word in upper case and replace it with a tab.

Spell check the document so that languages are correct.
Replace all source language spaces with Blah-Baba-Dada.
Make all source terms bold. You will have messy bold source entries.
Search for bold letter and a space at the end of the word:
(?>)^32 (wildcards on) and replace it with the same letter and a tab \1^9

Running away...
Will come online in teh afternoon, sorry


 
Bilingualduo
Bilingualduo  Identity Verified
Italy
Local time: 20:23
English to Italian
+ ...
TOPIC STARTER
partial solution found but.. Apr 18, 2005

Aleksandr Okunev wrote:
Running away...

lol
Thank you so far.
The following seems to work as far as finding space in between end of one CAP word and beginning of another CAP (though given unknown number of words in each source/target I have to check which one is the right one, individually:
[A-Z] [A-Z].
I can even get it to be highlighted
What I don't know is how to replace the highlighed end letter-space-beginning letter replaced with a tab *keeping the existing letter/s. Everything I'm trying at the moment replaces the end/beg letters with [A-Z, lol
Sigh
Thanks as always,
Bambi


 
Hynek Palatin
Hynek Palatin  Identity Verified
Czech Republic
Local time: 20:23
Member (2003)
English to Czech
+ ...
S&R Apr 18, 2005

i) find the space between last SOURCE TERM and FIRST TARGET word in upper case and replace it with a tab. (I presume I have to do this on a one-by-one basis given the variable number of words?)


If I understand it correctly, there is no way of finding the space between source and target term automatically, because both source and target are in upper case and they contain variable number of spaces. You will have to insert the tab manually. You could save time by not deleting the space manually, but replacing [space][tab] with [tab] in the end.

ii)Find paragraph mark at end of line in UPPER CASE ONLY (i.e not the ^p in rest of document) and replace with tab

iii) move the next line (which begins with a Capital letter) up so that it follows on after the inserted tab at the end of UPPER CASE line.


Provided that each definition ends with a full stop:

Replace ".^p" with yabbadabba.
Replace ^p with tab.
Replace yabbadabba with ".^p".

I hope this helps.


 
Bilingualduo
Bilingualduo  Identity Verified
Italy
Local time: 20:23
English to Italian
+ ...
TOPIC STARTER
No of words are variable, not spaces in between Apr 18, 2005

Hynek Palatin wrote:

i) find the space between last SOURCE TERM and FIRST TARGET word in upper case and replace it with a tab. (I presume I have to do this on a one-by-one basis given the variable number of words?)


If I understand it correctly, there is no way of finding the space between source and target term automatically, because both source and target are in upper case and they contain variable number of spaces.


In fact the space between source and target is always 1. It is the number of source words and target words which is variable...So as I wrote in my second post, I think I can solve that by highlighting (manually) the relevant end/star where I want to insert the tab. What I don't know how to so is replace the space I need and have highlighted (last letter source ALL CAP source space first letter ALL CAP target with a tab and NOT CHANGING the end/first letter)

ii)Find paragraph mark at end of line in UPPER CASE ONLY (i.e not the ^p in rest of document) and replace with tab
iii) move the next line (which begins with a Capital letter) up so that it follows on after the inserted tab at the end of UPPER CASE line.

Provided that each definition ends with a full stop:
Replace ".^p" with yabbadabba.
Replace ^p with tab.
Replace yabbadabba with ".^p".
[/quote]
Now to try it and I'll let you know.
Thanks so much. Report follows...:-)
Bambi


 
Aleksandr Okunev (X)
Aleksandr Okunev (X)
Local time: 21:23
English to Russian
I was wrong Apr 18, 2005

Bilingualduo wrote: Thank you so far.


I cannot make it work here even on English and Cyrillic (easy to distinguish) I think it is possible but very hard.

I would do the following:

= go to the space between the source and the target
= start recording the macro
= do everything you need to
= save macro and assign a shortcut to it
= navigate to all such spaces and hit the shortcut which will run your macro and do the work for you.

My 2 copecks...

HTH
Alex


 
Hynek Palatin
Hynek Palatin  Identity Verified
Czech Republic
Local time: 20:23
Member (2003)
English to Czech
+ ...
Spaces Apr 18, 2005

In fact the space between source and target is always 1. It is the number of source words and target words which is variable.


And the source and target words are also separated by spaces, right? So the (total) number of spaces is variable. That's what I was trying to say.


 
Bilingualduo
Bilingualduo  Identity Verified
Italy
Local time: 20:23
English to Italian
+ ...
TOPIC STARTER
Can [A-Z] [A-Z] find also be used to replace? Apr 18, 2005

Hynek Palatin wrote:
In fact the space between source and target is always 1. It is the number of source words and target words which is variable.


And the source and target words are also separated by spaces, right? So the (total) number of spaces is variable. That's what I was trying to say.


Yes, that is so. I hadn't understood that's what you meant...:)

But what I don't understand/know is: if I can search and successfuly find 'the' space I need which is between the last letter of the source word ('the space) and the first letter of the term word (which I can), is there not then a way to replace the space which has been found with a tab, but *not* alterning the last source letter and the first source letter?
I have this feeling that it can be done- I just don't know how
What I've been doing so far:
1/ find " ^p" (space before ^)
replace: "^p" (no space)
2/find:[A-Z] [A-Z]
wildcard
replace (nothing)
Format: highligh

When it highights the 'wrong' space, I have to "find next". What I would love to be able to do when I find the right space, is replace it with a tab, without changing the letters.
Can this be done, do you know?

Meanwhile, *thank you very much* for your previous suggestion which worked great in dramatically cutting down on manual replacement...
Bambi


 
Bilingualduo
Bilingualduo  Identity Verified
Italy
Local time: 20:23
English to Italian
+ ...
TOPIC STARTER
Thanks for the idea Apr 18, 2005

Aleksandr Okunev wrote:
I cannot make it work here even on English and Cyrillic (easy to distinguish) I think it is possible but very hard.
I would do the following:
uote]

Glad you didn't run too far away Aleksandr,
Thanks for the suggestion and verification that your first didn't quite...ahem...work.lol
A macro is an option, but it won't really cut down the manual labour I don't think as I would still have to find each relevant space. As I just wrote in my other post, I can S&R to find what I want, I am now trying to see if there's a way of replacing the space I need with a tab.
Thanks for your help and as always, your humour
Cheers
Bambi


 
Hynek Palatin
Hynek Palatin  Identity Verified
Czech Republic
Local time: 20:23
Member (2003)
English to Czech
+ ...
Spaces Apr 18, 2005

But what I don't understand/know is: if I can search and successfuly find 'the' space I need which is between the last letter of the source word ('the space) and the first letter of the term word (which I can), is there not then a way to replace the space which has been found with a tab, but *not* alterning the last source letter and the first source letter?


I'm still not sure if I understand. We agreed that you have to find "the" space manually. (Maybe just by searching for one space and hitting Ctrl+PgDn to find next.) So, when "the" space is finally highlighted by the search function, why don't you just hit Tab to replace it with the tab character? No need for regular expressions...


 
Rossana Triaca
Rossana Triaca  Identity Verified
Uruguay
Local time: 15:23
English to Spanish
:) Apr 18, 2005

Search:^([A-Z]^) ^([A-Z]^)
Replace:^1^t^2

The ^(whatever^) preserves the match, so you can invoke it later by using ^1, ^2, etc.

I've just tried it and it works, obviously matching & preserving the case.

You'll still have to skip a lot of spaces, but no doubt you'll hit that button a lot less

Good luck!


 
Bilingualduo
Bilingualduo  Identity Verified
Italy
Local time: 20:23
English to Italian
+ ...
TOPIC STARTER
It Works! Apr 18, 2005

Rossana Triaca wrote:
Search:^([A-Z]^) ^([A-Z]^)
Replace:^1^t^2
The ^(whatever^) preserves the match, so you can invoke it later by using ^1, ^2, etc.
I've just tried it and it works,
t no doubt you'll hit that button a lot less
Good luck![/quote]

I don't need luck ....I needed the help I found here because as you wrote, IT WORKS! It does *exactly* what I was looking for, and I am most certanly going to be hitting that button a lot, a lot less.
I just wish I understood better what all the 'bits' mean, so I could bother people less....:(
Thank you indeed! And again, thanks to everyonw who lent a hand in helping me sort through this.
Bambi

[Edited at 2005-04-18 23:08]


 
Aleksandr Okunev (X)
Aleksandr Okunev (X)
Local time: 21:23
English to Russian
HOW 8-0 Apr 19, 2005

Bilingualduo wrote: I can S&R to find what I want, I am now trying to see if there's a way of replacing the space I need with a tab.


Do you mean you can find THE space?

I cannot. The problem is all spaces in a Russian-English document here are either Russian or English only, therefore I do not have an uninterrupted string of English letters and spaces and cannot search by language ID. Your situation is even worse: your character sets overlap, which is different in Cyrillic - it has an entirely different set, very advantageous.

The rest of F/R is child play to me, but I cannot figure out how you managed to find the space. 8-0

Cheers
Alex


 
Bilingualduo
Bilingualduo  Identity Verified
Italy
Local time: 20:23
English to Italian
+ ...
TOPIC STARTER
Can't find THE space but... Apr 19, 2005


Bilingualduo wrote: I can S&R to find what I want, I am now trying to see if there's a way of replacing the space I need with a tab.

Do you mean you can find THE space?


Hi Alex,

Not THE space, but since in this case, all my CAP Source and Target lines are on one line (for each entry, granted), Rossana's solution has *dramatically* cut down the number of CTRL+Pg Down I have to hit as it specifically finds spaces between ALL CAP words only. I have to CTRL+Pg Down until reaching THE space, but it's still way easier and very time saving.
Wish S&R were child's play to me though,
Cheers
Bambi


 
Aleksandr Okunev (X)
Aleksandr Okunev (X)
Local time: 21:23
English to Russian
Please, try this Apr 19, 2005

Aleksandr Okunev wrote: I cannot.


Now I can!
(the forum engine eats some chars)

Here's how to do everything you need in several FR passes:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
I got a bit angry seeing that manual labour is still present in the FR and this yielded some results, I hope it is not too late or you will use them in the future. The secret here is that I know that Word uses the lazy principle in search: once the nearest match is found, it considers the search successful.
Once again, you should activate automatic language recognition and spell check your glossary so that each term had its proper language ID. All CAPITAL words should not be skipped in the speller settings.

Match wildcard active throughout the search. My notes are separated with 3 dashes: "---".

1.)
Find the beginning of every target term and add some rubbish in front of it:
F (<?) --- (Specify target language in More-Format-Language)
R +RuSmArK+\1
You will see a scary text, be patient.

2.)
Find the string between the 1-st target language marker (the rubbish) and the end of the line (paragraph mark) and add a tab before it (it will separate the terms in the pair)
F (+RuSmArK+[!^13]@^13) --- clear formatting here
R ^t\1
Word will find the ***first*** marker in the string and will start looking for the nearest paragraph mark, which confines the search to the target term plus gibberish.

3.)
Protect the paragraph marks after definitions from deletion by bolding them.
F .^13
R --- nothing here, but go to More-Format-Font and select 'bold'

4.)
Delete the paragraph marks after the target term, they are not bold
F ^13 --- go to More-Format-Font and select 'not bold'
R nothing = no string, no formatting
(I keep spotting errors here, please, if you stumble, check if my comments coincide with my FR strings and settings, the comments will previal)

5.)
Remove the rubbish
F +RuSmArK+
R --- nothing = no sting, clear formatting

6.)
Rub your chest, giggle and delete THE space, which is still there, before the tab, looking at you beggarly.
F ^32^9
R ^9

I did test this and it works here: I get SOURCE-tab-TARGET-tab-Some definition in Russian-period-newparagraph

Stay well!
Alex


[Edited at 2005-04-19 08:04]

[Edited at 2005-04-19 09:21]

[Edited at 2005-04-19 09:42]

[Edited at 2005-04-19 09:44]


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

S&R regex help please






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »