Looking for assistance in creating a regex filter
Thread poster: Omer Shani

Omer Shani
Local time: 20:40
Member (2012)
Feb 27

Hi memoQ engineering experts
I'm looking for someone who can assist in creating a regex filter or a txt filter for a txt file, which had been exported from WordPress website, and there CDATA and other factors to distinguish the source (Hebrew) from the other tags

Omer Shani

Direct link Reply with quote


Local time: 19:40
Regex filter Mar 1

I am preparing a book that discusses such a problem. Unfortunately, I cannot include the figures, but it may help.

18.4 Use of regex text filters
This example shows how regex text filters can be defined for a tagged file. The tagged file has the form: text, where “tag” should not be translated (see 21.3.1 Tagged file example).

Click “Import With Options…” to import the document.

Figure 18-4.1: Import With Options…
The “Document import options” screen opens.

Figure 18-4.2: Filter → Regex text filter
New: Filter → Regex text filter
Click “Change filter and configuration” to edit the filter settings.

Or select an appropriate existing “Configuration”.
Click “OK” to initiate the import.

Figure 18-4.3: Document import settings
General → Custom regex
Specify the regexes that define the “Paragraph end” and “Paragraph start”, as appropriate.
Our example file needs only a “Paragraph start” regex: \s*?<
This searches for < that may be preceded by 0 or more whitespaces.

An optional reference file can be added with which the settings can be checked later (see “Preview”).

Figure 18-4.4: Add reference file
If required, add reference file(s), and click “Paragraph” to continue the filter definition.
The paragraph rules define the regex that parses the source file.

Figure 18-4.5: Define paragraph rule
Enter the appropriate regex rule to parse the file, in this case:

This regex rule includes the “Paragraph start” from the previous screen (\s*?-character (\w+>), and finally capture the remaining text as capturing group (as there is only one capturing group, $1).

Click “Add” to add the paragraph rule.

Figure 18-4.6: Specify the effect of the selected rule
Click the paragraph to add as “Effect of selected rule”. If required, the maximum text length can be limited. The “Context” and “Comment” fields are only documentary.

To check the rule(s), click “Preview”, otherwise “OK” to close the definitions.

The “Preview” uses the specified reference file to show how the paragraph rules act.

Figure 18-4.7: Preview effect of the rule(s)
Click the hospital icon ( ) to save the filter configuration, “OK” to return to the file import, or one of the tabs (General,…) to edit the settings.

To allow the filter configuration to be reused, it be saved globally in the memoQ environment, otherwise the configuration is stored as part of the project.

Figure 18-4.8: Save document import settings
Click the hospital icon ( ) to open a screen in which a file name can be entered where the configuration should be saved (this file can be selected later as “Configuration file”, rather than “Custom”). Because the usual Windows handling is involved, the procedure is not further described.

Figure 18-4.9: Create new filter configuration
Click “OK” to return to the previous screen, from where the file import can be initiated.

Once the document import options have been configured (or a predefined filter configuration selected), the specified file can be imported from the “Document import options” screen.

Figure 18-4.10: Document import options
Click “OK” to import the selected file with the specified regex text filter (see “Configuration”).

After being “imported with options”, the usual memoQ translation screen opens.

Figure 18-4.11: File imported with options
After which the document (file) can be translated as usual. Whereby, the defined tags are protected.

Figure 18-4.12: Document after being translated
The export removes the tags from the translated document, i.e. it has the original format.

Figure 18-4.13: Exported (translated) document
18.5 Use of Regex Tagger
This example shows how the Regex Tagger can be used to code regexes to describe protected terms in a file and so change them to tags.

The edit screen shows a file with product names (two uppercase letters followed by three or more digits) opened in the editor. Such product names must not be changed and must be present appropriately in the target (translated) document.

The Regex Tagger is invoked from Preparation → Regex Tagger

Figure 18-5.1: Edit screen with source document containing product names

Figure 18-5.2: Regex Tagger configuration screen
Enter the appropriate regex rule to match (identify) the product name, in this case, [A-Z]{2}\d2+.

Click “Add” the add the rule. The “Result” pane shows the form of the “tagged” document. If necessary, correct the rule and click “Change”. Once the rule is correct, click “OK” to the return to the edit screen.

Figure 18-5.3: Edit screen with tagged document
Make the necessary translations as usual (note: the tags are protected).

Figure 18-5.4: Edit screen with translated tagged document
If required, click “Review” to perform a QA (quality assurance).

Figure 18-5.5: QA showing an error detected because of a missing tag
Once the translated document is correct, save the target document (the tags are converted back to the original text when the document is saved).

Figure 18-5.6: Translated document
The handling is very similar to that for regex text filters, see 18.4 Use of regex text filters.

Direct link Reply with quote

To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Looking for assistance in creating a regex filter

Advanced search

Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »

  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search