Rymden - stock.adobe.com

Adobe launches PDF extraction, generation APIs

Adobe deconstructs the PDF creation and content extraction processes with the release of APIs for developers available on AWS marketplace.

Today, Adobe released PDF extraction and document generation APIs for developers, and made them available on the AWS Marketplace. The company enters an already crowded sector with competing tools -- but as the inventor of the PDF format and the Acrobat application, Adobe holds distinct advantages.

The new Adobe PDF APIs utilize Liquid Mode, a PDF feature Adobe released last year that applies machine learning to analyze PDF structure and recognize graphic elements such as tables and images in PDFs. Liquid Mode is a display tool to reflow and make PDFs more readable on mobile devices, but in the new APIs it helps tag and add structure to content going into and out of PDFs.

Recognizing and tagging different elements of a PDF for the purposes of automation has been a vexing technical problem over the years for both Adobe and third-party vendors of PDF tools, said Vibhor Kapoor, senior director of marketing for Adobe Document Cloud.

The new APIs join existing Adobe PDF APIs that perform such tasks as applying digital signatures to documents or in applications, creating a PDF from Microsoft Word documents, combining PDFs, compressing large PDF files, performing optical character recognition (OCR) on a PDF, and rotating and deleting pages. Previous Adobe extraction and document generation APIs existed -- but didn't employ Liquid Mode.

Adobe's enterprise play

Duff Johnson, CEO of the PDF Association standards and vendor group, said the functions the Adobe PDF APIs perform have been available for some time through third-party vendors. Companies such as Abbyy that specialize in OCR also employ AI tools to accomplish PDF structure tagging in order to digitize documents.

Adobe's not the first -- but they don't need to be the first, from their point of view.
Duff JohnsonCEO, PDF Association

But the fact that Adobe -- which originated the PDF format in the 1990s -- released its own versions and will put them on the AWS Marketplace is significant. Kapoor said the users of the APIs typically fall under three main groups: enterprise customers, systems integrators and third-party software vendors that will incorporate the APIs into their own products.

Adobe will also connect the Document Generation API to Microsoft Power Automate, which enables the quick creation of document templates and workflows for invoices, contracts and other common document types. Using the APIs in this setting as well as templates to create branded documents in Word and export to static PDFs shows the market Adobe is going after, Johnson said.

"It's another enterprise play for people who are deeply invested in Microsoft technology," Johnson said. "Adobe's not the first -- but they don't need to be the first, from their point of view."

Document services market grows, still

In 1993, Adobe promoted open development of PDF tools after it made public the file specification, and eventually handed it over to ISO in the 2000s. Many third-party tools automate the processes of PDF creation, editing, content extraction and mass processing of PDFs. Through the years those tools evolved from desktop to server and cloud apps, and now are morphing into web services and APIs.

Johnson said it's likely the third-party vendors of PDF tools will welcome Adobe's entry into the market. Rather than cannibalize the market, it will draw attention to all the vendors as technology buyers shop for document tools. The market is only expanding; Adobe claims users create 2.5 trillion PDFs each year and has seen an 80% increase in developer signups and 50% increase in integrations with Adobe Document Services so far this year, between the first and second quarters.

"The world doesn't yet use this stuff," Johnson said. "[The Adobe PDF APIs] expand people's consciousness about what you can do with PDF, and that's good for everybody in my industry. So if Adobe makes this play, then all the people who start thinking about this will also wonder what else they can get."

The Adobe PDF APIs are generally available and priced with a consumption model.

Don Fluckinger covers enterprise content management, CRM, marketing automation, e-commerce, customer service and enabling technologies for TechTarget.

Next Steps

Adobe adds new Acrobat extension PDF tools in Google Chrome

Dig Deeper on Content management software and services

Business Analytics
Data Management
ERP
SearchOracle
SearchSAP
Close