testing a regex code
Thread poster: Lenart

Lenart  Identity Verified
Luxembourg
Local time: 03:15
Jul 15, 2018

Hello everybody,

Could someone please explain me, why the code „sa\s*(?!\w* question)“ matches the word „sa“ in a phrase „par sa question“.

the condition of negative look ahead is fulfilled and I would think that the word „sa“ shouldn't match in this case.

Thank you,

L


 

NeoAtlas
Spain
Local time: 03:15
English to Spanish
+ ...
regex engine Jul 15, 2018

First, the regex engine matches “sa”.

Then the engine tries to match spaces, as many as possible. It matches 1 space and tries with this first (it it failed, it would try with no spaces at all). So trying with 1 space, note the actual position is after the space of “sa ”.

Then the engine processes the negative lookahead. Inside the lookahead it can't match “\w* question” (remember where is actual position) so the engine notes success and captures “sa ” (
... See more
First, the regex engine matches “sa”.

Then the engine tries to match spaces, as many as possible. It matches 1 space and tries with this first (it it failed, it would try with no spaces at all). So trying with 1 space, note the actual position is after the space of “sa ”.

Then the engine processes the negative lookahead. Inside the lookahead it can't match “\w* question” (remember where is actual position) so the engine notes success and captures “sa ” (with the mentioned space). The engine doesn’t try more possibilities (I mean, zero spaces) once notes success (lookarounds work in this way).

Note that if the string were “par sa question question” the lookahead would match “ question” (I mean, the second ”question“), and because it’s a negative lookahead, the same regex wouldn’t match anything.

I hope this helps,

… Jesús Prieto …
Collapse


 

Lenart  Identity Verified
Luxembourg
Local time: 03:15
TOPIC STARTER
thank you Jul 16, 2018

thank you Jesús, this was very helpful!

 

Lenart  Identity Verified
Luxembourg
Local time: 03:15
TOPIC STARTER
another related question Jul 16, 2018

I wrote this code (?negative look behind!EU:)EU:(?!EU:)

I would like a code to identify segments where the word „EU:“ appears only once. However in a following segment „EU:“ appears 3x and the code is still applied. So something is wrong. Can somebody please explain me why the criteria of this code is fulfilled in a following case?

source segment: „(voir, en ce sens, arrêts du 15 mai 2014, Briels e.a., C-521/12, EU:C:2014:330, points 28 et 29 ; du 21 juillet
... See more
I wrote this code (?negative look behind!EU:)EU:(?!EU:)

I would like a code to identify segments where the word „EU:“ appears only once. However in a following segment „EU:“ appears 3x and the code is still applied. So something is wrong. Can somebody please explain me why the criteria of this code is fulfilled in a following case?

source segment: „(voir, en ce sens, arrêts du 15 mai 2014, Briels e.a., C-521/12, EU:C:2014:330, points 28 et 29 ; du 21 juillet 2016, Orleans e.a., C-387/15 et C-388/15, EU:C:2016:583, point 48, ainsi que du 26 avril 2017, Commission/Allemagne, C-142/16, EU:C:2017:301, points 34 et 71).“

Thank you,

L

[Edited at 2018-07-16 12:41 GMT]

[Edited at 2018-07-16 12:42 GMT]

[Edited at 2018-07-16 12:43 GMT]
Collapse


 

NeoAtlas
Spain
Local time: 03:15
English to Spanish
+ ...
Your lookahead doens't work… Jul 17, 2018

Your lookahead doesn't work for the same reason that
EU:(?=EU:)
doesn't match
EU:XXXEU:
Lookaheads don't mean “anywhere ahead”, only “just ahead”.

You'd need this regex:
EU:(?=.*?EU:)
to match:
EU:XXXEU:

Same thing about lookbehind.

Once explaind, this regex may work:
(?[minor than sign]!EU:.*?)EU:(?!.*?EU:)
to match “EU:” appearing only once.

You may need to change it, but It's
... See more
Your lookahead doesn't work for the same reason that
EU:(?=EU:)
doesn't match
EU:XXXEU:
Lookaheads don't mean “anywhere ahead”, only “just ahead”.

You'd need this regex:
EU:(?=.*?EU:)
to match:
EU:XXXEU:

Same thing about lookbehind.

Once explaind, this regex may work:
(?[minor than sign]!EU:.*?)EU:(?!.*?EU:)
to match “EU:” appearing only once.

You may need to change it, but It's a good starting point.

Please let us know whether it works, otherwise, I'm curious to know your final regex.

… Jesús Prieto …
Collapse


 

Lenart  Identity Verified
Luxembourg
Local time: 03:15
TOPIC STARTER
thank you! Jul 27, 2018

Jesús, your code seems to be working great and I don't think I'll change anything to it.

But I am not sure I understand all of it. For example if I focus on the last part of the code EU:(?!.*?EU:)

Could you please tell me what does the second question mark stand for?

I think I am trying to ask what is a difference between .*? and .*


 

NeoAtlas
Spain
Local time: 03:15
English to Spanish
+ ...
Greedy versus Lazy quantifiers Jul 28, 2018

.* is a greedy quantifier, the default behaviour in regex (to match as many characters as possible)
.*? is lazy (same as above, but it matches as few characters as possible)

Glad the regex worked for you!


 

Lenart  Identity Verified
Luxembourg
Local time: 03:15
TOPIC STARTER
same same but different Oct 5, 2018

I wrote this code: (?

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

testing a regex code

Advanced search







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search