Extract text from PDF and reuse it later using Workflows

By Nishanth Asokan | Automation

Extract text from PDF and reuse it later using Workflows

Digital documents are replacing traditional paper documents very fast. We receive a lot of them as PDF files these days. PDF documents like contracts, legal documents, or digital books may contain hundreds or thousands of pages. We automate a lot of these documents. We might want to copy text from a specific area of a PDF. Also, we might have to use that text at a later stage while processing the PDF. Well, we have the perfect solution for you. The PDF4me Workflows actions cater to all such document logics.

Use the Extract Text action from PDF4me Workflows to automate the process of extracting text data from PDF documents. Moreover, use additional steps to reuse this data partially or completely at a later stage. Let us look at a sample Workflow where we extract text from a PDF document and use it later to rename the file.

How to extract text from PDF for reuse?

With no additional integration, you can configure a Workflow to automatically extract text from a PDF. Let us look with a sample Workflow, at how we can automate PDF text extraction and renaming.

Add a trigger to start your Workflow

Add a trigger to kick-start your automation. Currently, Workflows provide 2 triggers. Dropbox and Google Drive. For e.g. let us create a Dropbox trigger.

Configure the connection and choose the folder where the input files are expected.

Dropbox trigger for Extract text action

Add Extract Text action

Add the Extract Text action and enable the action. The action extracts full text from the PDF. If you want to extract from each page separately, please add a Split PDF action before the Extract action. Also, add the Extract text action inside the For each control.

Add and enable Extract Text action

Add a Save to action

The output files needed to be saved to cloud storage. In our use-case let us configure a Save to Dropbox action. You can use a regular expression for getting a particular text from the ‘Extract Text’ action. You can copy-paste the same below given regular expression in the Output File Name parameter and add the condition to match the required text.

${file.pages[0].PageText.match(<condition>).pdf

Save dropbox action with Regular expression

The expression will pass the text matching the condition from the PDF and pass it to the output filename parameter so that the files are renamed based on the read text.

A sample to try

Let us look at a workflow to extract text from a sample Invoice PDF and use a specific part of the text - invoice number - to rename the PDF file before saving it to the cloud.

Sample invoice PDF for extracting text

Let us briefly look at the steps -

  1. Add and configure the trigger of your choice
  2. Add the Extract Text action and enable it.
  3. Upload the sample PDF Invoice in the source folder of the trigger - Download sample file{target=_blank}
  4. Add the storage to which you want to save the file and in the output file name parameter, pass the following regular expression -

${file.pages[0].PageText.match('INVOICE #(.*)')[1].trim()}.pdf

Sample Workflow for extracting text and renaming PDF

The above workflow will extract the text from the PDF, trim the required part and rename the file with the same before saving it to the storage.

For getting access to Workflows you would require a PDF4me Subscription. You can even get a Daypass and try out Workflows to see how it can help automate your document jobs.

Related Blog Posts