Bulk OCR – text extraction

Please enable the paid features in the settings tab before using this feature.

You will find the Bulk OCR tool in the Plus Tools sidebar. This feature is currently available to Business+ users and as an addon.

The OCR text extraction engine is for documents mostly, we make no guarantee on the quality of the extraction as this varies a lot due to files inputed.

bulk ocr filedrop

The Bulk OCR tool has a few options you need to be aware of:

Folder ID – this is the folder ID where the files are located.

Language – the language of the documents, if you don’t see your language, leave it to English.

Extraction options:

  • Extract all files in a Google Sheet: the extracted text will be added to a Google Sheet cell on a row with the file name and link to the original file.
  • Extract all files in aGoogle Doc: the extracted text will be added to a single Google Doc as pages.
  • Extract each file in a Google Sheet: each file will be extracted in its own Google Sheet.
  • Extract each file in a Google Doc: each file will be extracted in its own Google Doc.

Treat as table: if you have tables (like product rows on invoices, receipts) in your files, this function will use a different engine to extract the data. Some languages might not be supported.

How it works?

To start a bulk OCR task add your files to a Google Drive folder. The tool supports jpg, png, and pdf files.

  1. Copy the folder id and paste it in the input box.
  2. Optionally, you can select the language, extraction type, or treat as table.
  3. Click the Start OCR button, and a confirmation will appear that the process has started. An email will be sent when the process is done with a link to the index file.

Depending on the number and size of the files you have in your Google Drive folder, the bulk OCR process can take a few minutes. A file will be created with the name and date of the folder where the files are. The results will look something like this:

bulk ocr results

Limitations:

  • Files size: 100MB.
  • PDF Page limit: 999
  • File types: PNG, JPG, PDF.
  • 2000 files per month, resets every month on the first day.
  • gmail accounts might cause a timeout error sooner than Workspace acccounts, use folders with 100 files or less in them to prevent this.

This feature is currently in Beta, the OCR results might differ from case to case, we are still testing different options and extraction engines to improve the results.

If you have feedback or want to suggest an improvement, please contact us.