How to Extract Text From a PDF (Even Scanned Ones)

Some PDFs let you select and copy text with no trouble. Others fight you at every turn: you drag to highlight a paragraph and nothing happens, because the page is really just a picture. This guide shows you how to extract text from any PDF, including the scanned, image-only kind, so you can edit, search, and reuse the contents instead of retyping them.

Two kinds of PDF, and why it matters

Understanding which type you have explains everything about how extraction will go.

Digital (text-based) PDFs

These are created from a word processor, a web page, or a design tool. The characters are stored as real text under the hood, so copying is instant and perfect. A straightforward PDF to text extraction pulls the words out cleanly with no recognition needed.

Scanned (image-only) PDFs

These are made by scanning paper or photographing pages. Each page is just an image, with no text layer at all. Highlighting does nothing because there are no characters to select. To get text out of these, you need OCR, the same technology covered in our overview of what OCR is.

The challenge is that both kinds look identical when you open them. A good tool detects which is which automatically and applies OCR only where it's actually needed.

How to extract text from a PDF

Open the converter. Go to the free PDF to text tool. It runs in your browser with nothing to install.
Upload your PDF. Drag it in or browse to it. Multi-page documents are fine.
Let it detect the type. The tool reads any existing text layer directly and falls back to OCR on pages that are image-only, so a mixed PDF is handled in one go.
Review the text. The extracted words appear, page by page. On scanned pages, proofread for the usual OCR slips around look-alike characters.
Copy or download. Save the text, or if you want a fully editable, formatted document, use PDF to word to get a .docx instead.

Getting the best results from scanned PDFs

When OCR is doing the work, the quality of the original scan drives everything:

Start from a clean scan

A straight, well-lit, high-resolution scan reads far better than a dim, crooked one. If you're scanning paper yourself, aim for a higher DPI and good contrast. Our accuracy guide goes into the details.

Watch out for multi-column layouts

Newspapers, journals, and forms with columns can confuse reading order. Skim the output to make sure paragraphs didn't get interleaved, and reorder if needed.

Mind the numbers

In scanned tables and invoices, digits are the most error-prone characters. Double-check totals and reference numbers. If the PDF is mostly tabular, the PDF to excel tool extracts rows and columns rather than a flat text block.

When the PDF is actually a stack of photos

Sometimes a PDF is just camera photos of pages bundled together. These behave exactly like photographed images, so the same capture advice applies. Our guides on extracting text from a photo and from a screenshot explain how the source image quality shapes the result.

What you can do with the extracted text

Once a PDF's contents are editable text, you can search across it, copy quotes with citations, translate it, or paste it into a new document. For documents you intend to keep editing, PDF to word gives you a formatted, editable file rather than raw text, which is usually the better starting point for contracts, reports, and letters.

Frequently asked questions

Why can't I select text in my PDF?

Almost certainly because it's a scanned, image-only PDF. The page is a picture with no underlying text layer, so there's nothing to highlight. Running it through the PDF to text tool applies OCR and gives you selectable, editable text.

Does the tool know whether my PDF is scanned or digital?

Yes. The PDF to text tool reads any existing text layer directly and falls back to OCR on image-only pages automatically, so you don't have to figure out the type yourself.

Can I get a Word document instead of plain text?

You can. Use PDF to word to convert the PDF, including scanned ones, into an editable .docx with formatting preserved, which is ideal when you need to keep editing the document.

How accurate is OCR on a scanned PDF?

On a clean, high-resolution scan of printed text, very accurate. Accuracy falls with low-resolution, skewed, or faded scans and with handwriting. Starting from a better scan and following our accuracy tips makes a noticeable difference.

Need the words out of a PDF, scanned or not? Open the free PDF to text tool and get editable text in seconds.