Advertisement

TUAW Tip: Regular Expressions for Beginners

Sometimes I think Regular Expressions are like the tax code: if someone professes to know everything about them, they're probably not telling the truth. In reality, Regular Expressions (or RegEx) is a syntax to help you construct very precise search terms to find and replace bits of text in a variety of applications.

In applications like Coda, BBEdit, and TextMate, you can search for a "string" -- meaning just any old collection of letters next to each other -- using a Regular Expression. For example, I could search for the string "laugh" and it would show up in laughter, slaughter, and Laughlin.

While I can't show you everything about Regular Expressions, I can at least start you off. Keep reading for more about how you can integrate Regular Expressions into your workflow.

Let's pretend I have a list of items. They happen to be domain names, in this case:

  1. tuaw.com

  2. apple.com

  3. last.fm

  4. navy.mil

  5. google.com

  6. code.google.com

Personally, I think the most handy search term for me is .+. It works like a wildcard. In our list, if I searched for .+.com it would show hits on lines 1, 2, 5 and 6.

Of course, I could just search for .com and it would hit on the same lines. The difference is that with the RegEx in place, text editors will frequently highlight the entire line, making it easy to find and replace things. For example, if I wanted to delete lines that contained .com, I would search for .+.com and replace it with an empty string.

(Commenter Eric notes I can make the expression .+.com$ to ensure that it's not catching something like www.commons.org. You can read more about the $ character in a little bit. Thanks, Eric!)

I can also search for something like g.+g, and get the string "goog" on lines 5 and 6.

Next is the pipe character: |, which means "or." If I search for fm|mil, it will hit on both lines 3 and 4. I can highlight the entire line if I search for .+fm|.+mil.

You can also use Regular Expressions to add text in a repeatable sort of way. For example, if I wanted to add "http://" to the beginning of every line, I would search for ^ (that's shift + 6), and type "http://" in the replace box. After clicking "Replace All," I'd get a list that looked like this:

  1. http://tuaw.com

  2. http://apple.com

  3. http://last.fm

  4. http://navy.mil

  5. http://google.com

  6. http://code.google.com

You can do the same thing for adding text to the ends of lines by searching for $. Just as ^ is the way to find the beginning of a line, $ is the way to find the end of a line.

This is just the very tippy top of the massive iceberg that is Regular Expressions. For example, you could find any email address in a text document by searching for \b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}\b. Scary! But there are plenty of sites to help you learn about RegEx, and even help you build search queries.

  • regular-expressions.info is a great resource for learning about RegEx, including an excellent tutorial. (This is also where I got that giant email search query.)

  • You can download a fantastic RegEx cheat sheet from Added Bytes.

  • RegExr is an excellent web-based utility that helps you construct a RegEx query by showing you results in real time. Hits are highlighted as you write your expression.

If you have a favorite RegEx tip aimed at beginners, feel free to share in comments!