Advertisement

Mac 101: Use Automator to extract text from PDFs

More Mac 101, tips and tricks for novice (and expert) Mac users.

Have you every tried to copy and paste text from a PDF into a word processor document like Pages or Microsoft Word? Most of the time the text loses all its formatting from the PDF, which can be a real pain.

Too often I've spent a frustrating amount of time putting text back into a coherent order after copying it from a PDF, while wondering why there isn't a simpler way of doing this on my Mac. Thankfully, a friend (who discovered how to from MacWorld) showed me, using Automator. And provided the text in the PDF is formatted correctly (and you're not trying to extract text that is actually an image), it's foolproof as well as free!

Here's how. On your Mac, open Automator from Applications. Automator will ask you to select a type for your document. Select Workflow, then hit return. In the far-left column of Automator, click on Files and Folders. In the second column, select Ask for Finder Items and drag and drop it into the far-right space which reads "Drag actions or files here to build your workflow." This becomes your first action.

Now click on PDFs in the far-left column and select Extract PDF Text from the second column. Drag and drop Extract PDF Text into the space to the right, where you dragged Find Files and Folders. You'll now see that Automator has created a workflow or one action following another.

You're almost there. In the Extract PDF Text bubble of the workflow, select Rich Text instead of Plain Text (next to Output -- this will retain formatting like italics and bold) and choose where you want Automator to place your extracted text files from Save Output To.

To finish, simply title and save, but make sure you've save as an application and not a workflow. Now open your new Automator application and select the PDF you want to grab the text from. A new Rich Text document will be created. From there, simply open this document and copy and paste the text into your preferred word processor.