Pages in topic:   [1 2] >
Bug in Studio 2014 regular expressions engine
Thread poster: Piotr Bienkowski

Piotr Bienkowski  Identity Verified
Poland
Local time: 10:39
Member (2005)
English to Polish
+ ...
Feb 14, 2015

I entered this regex on the review tab:

^\p{Lu}+\s\d+\:?$

Which means start with the beginning of a string, match as many uppercase letters as you can, then match a space, then as many digits as you can, and if there is a colon at the end, match it too, and then match the end of string.

It matched what I wanted just fine but it also match the following:

August 2012.

I tested my regex in RegexBuddy and my regex is correct. This is a bug in Studio 2014.


Direct link Reply with quote
 

Dan Lucas  Identity Verified
United Kingdom
Local time: 09:39
Member (2014)
Japanese to English
Can't immediately reproduce that Feb 14, 2015

Piotr Bienkowski wrote:
I tested my regex in RegexBuddy and my regex is correct. This is a bug in Studio 2014.

"August 2012" does match for me in RegexBuddy - which I suppose is right, since the colon is optional - using the .NET flavour of regexes but "August 2012." does not.

I had a try but cannot get Studio 2014 SP2 to find "August 2012." as per your example. Can you give us some more context or example text?

Dan



Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 10:39
Member (2005)
English to Polish
+ ...
TOPIC STARTER
It must not match lowercase Feb 14, 2015

^\p{Lu}+\s\d+\:?$

is designed to match only uppercase letters followed by a single space some digits and optionally a colon. It can't match lowercase letters, because there is no symbol for that in it.

It does not match August 2012 for me in Regexbuddy and I am using the Net flavor of regexes, too. I can't see why it should match for you, but it does in Studio, when entered in the text box on the Review tab in Studio. I did not check the Find function, because what I'm interested in is filtering segments.


Direct link Reply with quote
 

NeoAtlas
Spain
Local time: 10:39
English to Spanish
+ ...
Not sure if it's a bug or a feature… Feb 14, 2015

…of the SDL regex engine, but your regex pattern works fine if you tick the "Case sensitive" option in the filters or in the Find&Replace dialog box.

Anyway, these 2 modified patterns works the way you wish:

(?-i)^\p{Lu}+\s\d+\:?$
(?-i:^\p{Lu}+\s\d+\:?$)


where (?-i) or (?-i:XXXXXXXXXXXX) have been added to force case sensitivity.

I think that the first option is easier and faster.

Regards,

... Jesús Prieto ...


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 10:39
Member (2005)
English to Polish
+ ...
TOPIC STARTER
Thanks NeoAtlas Feb 14, 2015

You're right. So I have to tell Studio twice what I want

NeoAtlas wrote:

…of the SDL regex engine, but your regex pattern works fine if you tick the "Case sensitive" option in the filters or in the Find&Replace dialog box.

Anyway, these 2 modified patterns works the way you wish:

(?-i)^\p{Lu}+\s\d+\:?$
(?-i:^\p{Lu}+\s\d+\:?$)


where (?-i) or (?-i:XXXXXXXXXXXX) have been added to force case sensitivity.

I think that the first option is easier and faster.

Regards,

... Jesús Prieto ...


Direct link Reply with quote
 

Dan Lucas  Identity Verified
United Kingdom
Local time: 09:39
Member (2014)
Japanese to English
You're quite right Feb 14, 2015

Piotr Bienkowski wrote:
It can't match lowercase letters, because there is no symbol for that in it.

Must not comment on technical issues before I'm fully awake!

Dan


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 10:39
English
I don't understand the problem here at all... Feb 15, 2015

...the exact same thing can happen in regex buddy if you check the case insensitive option. In Studio it's the same thing... this is there to make it easier for anyone less adept with regular expressions.

Maybe also worth noting this one:

^\p{Ll}+\s\d+\:?$

The lowercase version (I think you mentioned there wasn't one)?

Regards

Paul


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 10:39
Member (2005)
English to Polish
+ ...
TOPIC STARTER
Problem: Regex wasn't doing what it should Feb 15, 2015

SDL Support wrote:

Maybe also worth noting this one:

^\p{Ll}+\s\d+\:?$

The lowercase version (I think you mentioned there wasn't one)?


No, I said that my regex should not match lowercase. After all \p{Lu} is for matching uppercase, isn't it?

I am aware of \p{Ll} and many other such \p{}s


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 10:39
English
It is... Feb 15, 2015

... unless you turn off the case sensitivity. Exactly the same as regex buddy.

Apologies on the lowercase bit... I misread this part "It can't match lowercase letters, because there is no symbol for that in it." I see what you meant now.

Regards

Paul


Direct link Reply with quote
 
FarkasAndras
Local time: 10:39
English to Hungarian
+ ...
I'm sorry, but... Feb 16, 2015

Piotr Bienkowski wrote:

SDL Support wrote:

Maybe also worth noting this one:

^\p{Ll}+\s\d+\:?$

The lowercase version (I think you mentioned there wasn't one)?


No, I said that my regex should not match lowercase. After all \p{Lu} is for matching uppercase, isn't it?

I am aware of \p{Ll} and many other such \p{}s


The regex engine was doing exactly what it was designed to do. You didn't know that the match case checkbox affects regex searches as well, so you were caught by suprrise, but that's not SDL's fault. The regex engine works fine.
As an example, [[:upper:]] matches upper-case letters in perl.
/[[:upper:]]+/ only matches APRIL, but /[[:upper:]]+/i also matches April. The /i is analogous to the checkbox in Studio.

Also, I have no idea where you got the idea that regex buddy is the gold standard for regex engines and anything that works in any way differently from regex buddy is buggy. There is no gold standard, every regex engine works slightly differently. You have to RTFM to know exactly how a specific regex engine is supposed to behave in specific edge cases. In this case, it appears that you weren't even right about Studio behaving differently from regex buddy, though.

[Edited at 2015-02-16 07:49 GMT]


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 10:39
Member (2005)
English to Polish
+ ...
TOPIC STARTER
Regex Buddy not gold standard Feb 16, 2015

but a convenient ‘sandbox’ for testing various flavors of regular expressions. Point me to another tool that will let me do this, preferably one that I won't have to pay anything for (I have already paid for regexbuddy) and I will be happy to use it.

BTW I have learned my lesson. Even though I use a symbol that specifically asks for uppercase, it is overridden, because unless I tick "case sensitive", a switch is active to the opposite effect, that I was not aware of.


Direct link Reply with quote
 

Meta Arkadia
Local time: 16:39
English to Indonesian
+ ...
RegExr Feb 16, 2015

Piotr Bienkowski wrote:
Point me to another tool that will let me do this, preferably one that I won't have to pay anything for (I have already paid for regexbuddy) and I will be happy to use it.


http://regexr.com

There's also a downloadable tool:



And http://www.regular-expressions.info can serve as Andras' RTFM.

Cheers,

Hans


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 10:39
Member (2005)
English to Polish
+ ...
TOPIC STARTER
Thanks Feb 16, 2015

Meta Arkadia wrote:

Piotr Bienkowski wrote:
Point me to another tool that will let me do this, preferably one that I won't have to pay anything for (I have already paid for regexbuddy) and I will be happy to use it.


http://regexr.com

There's also a downloadable tool:

# image snipped #

And http://www.regular-expressions.info can serve as Andras' RTFM.

Cheers,

Hans



I have been interested in regular expressions for more than 10 years, read the whole "owl book", and I have a bookmark for the other link you mentioned, BTW the author of that site is the author of RegexBuddy


Direct link Reply with quote
 
Dmitry Pakidov
Russian Federation
Local time: 12:39
Another bug in the regex engine… Feb 17, 2015

…which is not even directly related to using regular expressions, is caused by the “incorrect regular expression” highlights in the Find and Replace window which were introduced in T2014 SP2. When “Regular Expressions” are selected in the dropdown menu near “Use” (notwithstanding the presence or absence of the checkmark near “Use”!), regex formatting rules are automatically enforced in respect of both “Find what” AND “Replace with” fields: i.imgur.com/Nr9t73E.png. The latter is especially infuriating, as it makes such simple auto-replacements as shown in the screenshot above impossible without additional replacements to remove the escape symbol from target segments.

The only way to circumvent this behaviour is to check “Use”, replace “Regular expressions” back to “Wildcards”, uncheck “Use” and reopen the Find and Replace window, which is rather annoying and shouldn’t be needed in the first place.


Direct link Reply with quote
 

Selcuk Akyuz  Identity Verified
Turkey
Local time: 12:39
Member (2006)
English to Turkish
+ ...
downloadable tool? Feb 17, 2015

Meta Arkadia wrote:

Piotr Bienkowski wrote:
Point me to another tool that will let me do this, preferably one that I won't have to pay anything for (I have already paid for regexbuddy) and I will be happy to use it.


http://regexr.com

There's also a downloadable tool:



Hi Hans,

Is it for Mac only? I could not find where to download it. https://github.com/gskinner/regexr/ Many files here, hope there is a compiled version.

Selcuk


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Bug in Studio 2014 regular expressions engine

Advanced search







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search