How to do Bulk OCR – text extraction

You will find the Bulk OCR tool in the Folder Tools sidebar. This feature is currently available to anyone as an addon.

bulk ocr filedrop

The Bulk OCR tools has a few options you need to be aware off:

Folder ID – this is the folder id where the files are located.

Language – the language of the documents, if you don’t see your language, leave it to English.

Extraction options:

  • Extract all files in a Sheet:  the extracted text will be added in a Google Sheet cell on a row with file name and link to the original file.
  • Extract all files in a Doc: the extracted text will be added to a single Google Doc as pages.
  • Extract each file in a Google Sheet: each file will be extracted in its own Google Sheet.
  • Extract each file in a Google Doc: each file will be extracted in its own Google Doc.

Treat as table: if you have tables(like invoices, receipts) in your files this function will use a different engine to extract the data. Some languages might not be supported.

How it works?

To start a bulk ocr task add your files to a Google Drive folder. The tool supports jpg,png and pdf files.

  1. Copy the folder id and paste it in the input box.
  2. Optionally you can select the language, extraction type or treat as table.
  3. Click the Start OCR button and a confirmation will appear that the process has started. An email will be sent when the process is done with a link to the index file.

Depending on the number and size of the files you have in your Google Drive folder the bulk OCR process can take a few minutes. A file will be created with the name and date of the folder where the files are. The results will look something like this:

bulk ocr results

Limitations:

Files must be no bigger than 3MB.

File types: PNG, JPG, PDF.

2000 files per month, resets every month on the first day.

This feature is currently in Beta, the OCR results might differ from case to case, we are still testing different options and extraction engines to improve the results.

If you have feedback or you want to suggest an improvement please contact us.