The Amazing Power Of Regex Search And Replace

toggle-button

Over the past few weeks I've been helping a friend investigate the contents of a PC we're working on, in order to troubleshoot some weird problems.  It's actually quite fun to do some in-depth examinations of the contents of a hard disk, but you soon discover that such tasks require some specific software tools.

A couple of days ago, for example, I needed to look through the contents of some binary files which mostly contained garbage characters, but in which were also some occasional ocurrences of readable words and sentences.  The files were actually the hibernation files that Windows uses when it goes to sleep.  Before nodding off, the PC dumps the contents of its memory to a hibernation file, and I needed to know what was in that file.

To separate the garbage characters from the readable ones, I needed to delete every character in the file that wasn't a letter or a digit.  WIth 26 letters, 2 cases, and 10 digits, that's 38 characters to look for.  Or to put it another way, 38 characters not to delete from the file, out of a repertoire of 250-odd.  This wasn't something I intended to do with a standard search and replace facility.

Luckily, my mind was cast back to a really neat text editor that I actually wrote about in this column back in August.  EmEditor (http://www.techsupportalert.com/content/fastest-text-editor-ive-ever-see...) supports a feature in its search/replace facility called Regular Expressions.  Such things, commonly known as regex, are a widely used standard within the IT industry for specifying patterns of characters in order to do searches or filters.  For example, when you type a credit card number or an email address into a web form and the submission is rejected because what you typed doesn't look like a credit card number or email address, the form is using a regex to check whether what you typed conforms to the expected pattern.  Which in the case of an email address, for example, would be 2 words or short sentences, without spaces, separated by an "@" symbol.

The syntax is regex is horribly complicated.  If you want to learn it, just type "regex" into Google and follow the endless tutorials.  But as an example, the expression [^a-zA-Z] means "any character which isn't alphabetica"  (the caret symbol at the start means "not").  So if, say, you search for such an expression within EmEditor, and tick the "use regular expressions" box, and then choose to replace all matches with a space or even nothing at all, you quickly end up with a file that now contains no garbage characters at all.  Which is just what I wanted.

If a search and replace feature with regex support isn't something that you need right now, keep it in mind.  One day, you might just need to do it, and knowing about such things can save you a load of time.

 

 

Please rate this article: 

Your rating: None
5
Average: 5 (2 votes)
toggle-button

Comments

Email regex ? Not that simple, maybe...

See: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

Or a "simple", common or garden version...

^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$

Which reads as:

The start of something (e.g. a line or string)
followed by
1 or more combinations of letters, numerals, dots, underscores, or plus or minus signs
followed by
@
followed by
one or more combinations of letters, numerals, or hyphens
followed by
a dot
followed by
one or more combinations of letters, numerals, hyphens or dots
followed by
the end of something.
-- except that that allows some invalid emails (e.g. successive dots)..

Sometimes with this stuff, you just wing it... and slide with the %ages.

Or you can just use the Windows port of the unix/linux "strings" command by Russinovich (part of his Sysinternals suite).

http://technet.microsoft.com/en-us/sysinternals/bb897439.aspx

You can use regex expressions in NotePad++.

Rob, thanks for the tip. I downloaded and downgraded the program to the free version. A couple of things I found out:

First, EmEditor inserts itself into the StartUp folder without asking. If you do a Custom Install, I think you can prevent that from happening if you unselect that part of the install. I couldn't find a setting from within the program after installing it, to not have it start up automatically. I disabled the auto-start feature from within WinPatrol.

Second, the program took over the .txt file type association without asking. WinPatrol alerted me to this, so I was able to deny the change. Every 5 minutes, however, the message kept popping up. EmEditor won't take "no" for an answer. Unacceptable. I can't find a setting within the program to make it stop doing this, so I will be uninstalling EmEditor. I use a different text editor which I want to continue to be my default editor for .txt files.

Too bad. I wanted to have the regex character-replacement feature available. But free or not, I won't use a program that won't behave.

So this editor is an annual subscription model?

As sated in the article referenced. EmEditor is available as a Pro and a Lite version. The Pro version costs $39. If you only want the Lite one, which is free, download the main installer and run the editor. Then, once the editor is running, press Ctrl-Q to bring up the list of commands and search for "downgrade". This will downgrade your 14-day trial copy of the paid-for version to the free-forever version. Which is still just as fast, and has pretty much all the features you'll probably ever need. MC - Site Manager.