Scan and Split PDF by specific text contained in them

By Nishanth Asokan | Automation

Scan and Split PDF by specific text contained in them

We automate a lot of document processes nowadays. Want to recognize specific text from a PDF? Want to use that text to Split the file at the pages containing the specific text? Do you want to rename the split files using the text used for splitting? Well, we have the perfect solution for you.

The PDF4me Workflows Split By Text action caters to all such document logic. Workflows uniquely focus on delivering the best automation solution for your document processes. The action can also store the text in the clipboard for renaming the files with the text if required while saving it to storage. Let us look now with a sample Workflow, how we can set up this action.

How to Scan and Split PDF by specific text?

In our following example, we will be creating a Workflow to split a PDF file using specific text contained in it and use the text to rename the split files.

Start by launching the PDF4me Dashboard.

  • Select the Create Workflow button.
Create PDF4me Workflow interface

Add a trigger to start your Workflow

Add a trigger to kick-start your automation.

  • Currently, Workflows provide 2 triggers - Dropbox and Google Drive. For E.g. let us create a Dropbox trigger.

Configure the connection and choose the folder where the input files are expected.

Dropbox trigger for Workflow

For testing the exact flow, you can make use of this sample PDF - Download sample file

Add the Split By Text action

Add and configure a Split By Text action to separate the file pages using the required text. Here we use a regular expression to detect the unique text.

Serial#:(.*)

The regex will find the text value starting with ‘Serial#:’ and split them based on the condition.

Split By Text action configuration

Add a For Each Documet Control

Since the Split By Text generates multiple documents, a For Each Document control is necessary to handle the output files one by one. The rest of the actions should be included inside this control.

For each control for controlling multiple output

Add a Save to action

The output files needed to be saved to cloud storage. In our use-case let us configure a Save to Dropbox action. In the above image, you can see an expression for getting a text from the ‘Split By Text’ action. You can use the below given regular expression in the Output File Name parameter to rename the files.

${file.pages[0].PageText}.pdf

Save to Dropbox after renaming

The expression will pass the text from the Split By PDF action to the output filename parameter so that the files are renamed based on the read text.

For getting access to Workflows you would require a PDF4me Subscription. You can even get a Daypass and try out Workflows to see how it can help automate your document jobs.

Related Blog Posts