regex query to change US dates into European formatting?
Thread poster: Jan Sundström

Jan Sundström  Identity Verified
Sweden
Local time: 11:05
English to Swedish
+ ...
May 24, 2007

Hi all,

I know you guys (Vito, Jerzy, Alexej, Samuel) are brilliant with advanced regex searches, so I'm shamelessly relying on your help.

I have a bunch of html files containing dates in US formatting:
...text... 9/11/2001 ...text....
...text... 12/31/2006 ...text...

I'm trying to apply Vito's method, posted here:
http://www.proz.com/post/540222#540222

In plain text, it would be something like:
Search for nn[1]/nn[2]/nnnn[3]
Replace with [3]/[1]/[2], where the numbers are placeholders, calling the parameters in the order they were found in the search.

I'm just trying to figure out how to write the query, this is where I need your help.

Thanks a lot in advance for your addistance. I'll promise to reward it somehow!

/Jan


Direct link Reply with quote
 

Jabberwock  Identity Verified
Poland
Local time: 11:05
Member (2004)
English to Polish
Here it goes... May 24, 2007

I have not been called upon, yet I will try nevertheless

I assume you want it done in Word? (Other editors have somewhat different flavor of regexp).

It should go like this:

Find:
([0-9]@)/([0-9]@)/([0-9]@)>

Replace:
\3/\1/\2

I have checked it with several expressions, but let me know how it works...


Direct link Reply with quote
 

Robert Tucker
United Kingdom
Local time: 10:05
German to English
+ ...
Perl May 24, 2007

Using Perl, I think it is:

perl -pi -e 's/([0-9]*)\/([0-9]*)\/([0-9]*)/\3\/\1\/\2/g' filename.txt


Direct link Reply with quote
 

Jan Sundström  Identity Verified
Sweden
Local time: 11:05
English to Swedish
+ ...
TOPIC STARTER
Text editor suggestions?! May 24, 2007

Hey Jabberwock, Robert,

You guys are great!

Perl is very powerful, but I'm afraid that I've hardly every used it, so it's a non-starter until I learn more.

For this time, I'd hope do it in a text editor. Any suggestions?
I've been using TextPad out of old habit, but I'm getting fed up with the lack of Unicode support (copyright marks and other extended chars show up as black boxes).

Would any text editor use the same expressions like Word? Any suggestions?

/Jan


Direct link Reply with quote
 

Jabberwock  Identity Verified
Poland
Local time: 11:05
Member (2004)
English to Polish
Haven't found one... May 24, 2007

I haven't found a text editor which would suit exactly my tastes and my needs... I am trying out one after another, but still missing something...

For now I have settled for PSPad, but I am not sure I would recommend it to someone. The interface is somewhat muddled and the search feature is somewhat inconvenient... The codepage changes, on the other hand, are a breeze!

For PSPad the relevant expressions are:

([0-9]+)/([0-9]+)/([0-9]+)

and:

$3/$1/$2

For TextPad:

\([0-9]+\)/\([0-9]+\)/\([0-9]+\)

and:

\3/\1/\2


With other editors there might be a problem with greediness (so that the search catches only the first digit of the date). Also, be careful whether the replace function gets you to the next expression, as naturally the newly-replaced expression will also be caught by the search. You can go around it by specifying the number of digits, e.g.

([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})

but I do not think this is really necessary...


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 11:05
Member (2004)
English to Slovenian
+ ...
few hints... May 24, 2007

Hi Jan:

Note that slash has its own meta meaning in Word regex, so what I do in such cases, borders on obscene; I replace it in the subject contexts with ... er, hm ... §? (for instance - some character you will not miss, or worse hit unintentionally). Then it's easier to write the search and replace entries....

The Tortoise Tagger is a good help for this kind of stuff. But the subject by itself is hard - the documentation however is good -. In essence you write a batch file to do all this "first global search and replace / by $, then hide anything but ..." etc etc

PS: awk IS supposed to do this kind of transliterations - but try to do that on a word file...

[Edited at 2007-05-24 20:32]


Direct link Reply with quote
 

Robert Tucker
United Kingdom
Local time: 10:05
German to English
+ ...
Text editors v. command line May 24, 2007

The thing about using a command line editor like Perl is that it can do the Search and Replace on all the files in a folder in one go.

If you were to change directory to the folder containing all the html files you and just issue:

perl -pi -e 's/([0-9]*)\/([0-9]*)\/([0-9]*)/\3\/\1\/\2/g' *.html

they would all be done.

If you use a text editor, I suspect you are going to need to open all the html files individually, edit them and then close them again unless you can write some batch script. Also text editors tend to be slower than command line ones doing a single file – though you might need a file of a few hundred lines to notice it much.

That said (and having just looked how "grep" used much on Unix/Linux systems can be used on Windows) I found Windows Grep which you might like to look at.

Essentially though, you might find it quicker in the long run to look into command line than to try to do it all with a text editor.

Regarding awk, I'm not sure it does backreferences:
Grouping support is present in Perl together with various backreference mechanisms. Grouping is also supported by all awk variants, but backreferences are not. GNU grep, egrep and sed support both grouping and backreferences.

http://snow.nl/dist/htmlc/ch13.html


Direct link Reply with quote
 

justin C
United States
Local time: 05:05
English
The vim solution May 24, 2007

Hi Jan,

I use the vim text editor (usually found in *nix systems) quite a bit. There is a nice Windows equivalent called GVim -- http://www.vim.org/download.php#pc -- ftp://ftp.vim.org/pub/vim/pc/gvim71.exe

You can use this global search and replace regex in vim to do the date switch that you are trying to do
:s/\(\d\+\)\/\(\d\+\)\/\(\d\+\)/\3\/\1\/\2/g

That should do it!

Best regards,
Justin


Direct link Reply with quote
 

Jan Sundström  Identity Verified
Sweden
Local time: 11:05
English to Swedish
+ ...
TOPIC STARTER
Thanks a million! May 25, 2007

Jabberwock wrote:
For TextPad:

\([0-9]+\)/\([0-9]+\)/\([0-9]+\)

and:

\3/\1/\2



For this time I chose to go with Jabberwock's solution, because I'm familiar with the interface and search dialogue of Textpad. It works like a charm!

Textpad sort of handles batch processing (taking care of all open files), but AFAIK you can't let it work on a path or directory structure.

At my previous workplace, we had a perl guru who could do anything and everything thru the command line, and vim was of course also used.

So for future jobs, I'll definitely look into the other options mentioned.

I owe all of you a big one!!!

/Jan


Direct link Reply with quote
 

Jabberwock  Identity Verified
Poland
Local time: 11:05
Member (2004)
English to Polish
One more tool... May 25, 2007

Glad it works!

The mention of Windows Grep (which costs money) made me remember another tool, which is freeware:

http://www.orbit.org/replace/

I think it is a nice intermediate step between editor search and advanced S&R languages, as it is both quite capable and yet easy to use.

Of course, if you can't live a day without a search and replace, then you might need much bigger tool: TextPipe. Unfortunately, this costs a lot of money, so make sure first you really need it...


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

regex query to change US dates into European formatting?

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums