testing a regex code
Thread poster: Lenart

Lenart  Identity Verified
Luxembourg
Local time: 20:23
Jul 15

Hello everybody,

Could someone please explain me, why the code „sa\s*(?!\w* question)“ matches the word „sa“ in a phrase „par sa question“.

the condition of negative look ahead is fulfilled and I would think that the word „sa“ shouldn't match in this case.

Thank you,

L


 

NeoAtlas
Spain
Local time: 20:23
English to Spanish
+ ...
regex engine Jul 15

First, the regex engine matches “sa”.

Then the engine tries to match spaces, as many as possible. It matches 1 space and tries with this first (it it failed, it would try with no spaces at all). So trying with 1 space, note the actual position is after the space of “sa ”.

Then the engine processes the negative lookahead. Inside the lookahead it can't match “\w* question” (remember where is actual position) so the engine notes success and captures “sa ” (with the mentioned space). The engine doesn’t try more possibilities (I mean, zero spaces) once notes success (lookarounds work in this way).

Note that if the string were “par sa question question” the lookahead would match “ question” (I mean, the second ”question“), and because it’s a negative lookahead, the same regex wouldn’t match anything.

I hope this helps,

… Jesús Prieto …


 

Lenart  Identity Verified
Luxembourg
Local time: 20:23
TOPIC STARTER
thank you Jul 16

thank you Jesús, this was very helpful!

 

Lenart  Identity Verified
Luxembourg
Local time: 20:23
TOPIC STARTER
another related question Jul 16

I wrote this code (?negative look behind!EU:)EU:(?!EU:)

I would like a code to identify segments where the word „EU:“ appears only once. However in a following segment „EU:“ appears 3x and the code is still applied. So something is wrong. Can somebody please explain me why the criteria of this code is fulfilled in a following case?

source segment: „(voir, en ce sens, arrêts du 15 mai 2014, Briels e.a., C-521/12, EU:C:2014:330, points 28 et 29 ; du 21 juillet 2016, Orleans e.a., C-387/15 et C-388/15, EU:C:2016:583, point 48, ainsi que du 26 avril 2017, Commission/Allemagne, C-142/16, EU:C:2017:301, points 34 et 71).“

Thank you,

L

[Edited at 2018-07-16 12:41 GMT]

[Edited at 2018-07-16 12:42 GMT]

[Edited at 2018-07-16 12:43 GMT]


 

NeoAtlas
Spain
Local time: 20:23
English to Spanish
+ ...
Your lookahead doens't work… Jul 17

Your lookahead doesn't work for the same reason that
EU:(?=EU:)
doesn't match
EU:XXXEU:
Lookaheads don't mean “anywhere ahead”, only “just ahead”.

You'd need this regex:
EU:(?=.*?EU:)
to match:
EU:XXXEU:

Same thing about lookbehind.

Once explaind, this regex may work:
(?[minor than sign]!EU:.*?)EU:(?!.*?EU:)
to match “EU:” appearing only once.

You may need to change it, but It's a good starting point.

Please let us know whether it works, otherwise, I'm curious to know your final regex.

… Jesús Prieto …


 

Lenart  Identity Verified
Luxembourg
Local time: 20:23
TOPIC STARTER
thank you! Jul 27

Jesús, your code seems to be working great and I don't think I'll change anything to it.

But I am not sure I understand all of it. For example if I focus on the last part of the code EU:(?!.*?EU:)

Could you please tell me what does the second question mark stand for?

I think I am trying to ask what is a difference between .*? and .*


 

NeoAtlas
Spain
Local time: 20:23
English to Spanish
+ ...
Greedy versus Lazy quantifiers Jul 28

.* is a greedy quantifier, the default behaviour in regex (to match as many characters as possible)
.*? is lazy (same as above, but it matches as few characters as possible)

Glad the regex worked for you!


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

testing a regex code

Advanced search







SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running, helps experienced users make the most of the powerful features, ensures new

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search