--words

Extracts text and builds hidden text chunk at the word level - each word is defined as a separate text chunk.

NOTE: DjVu® documents are primarily images, but can contain the text information as a hidden text layer. With the pdftodjvu command line utility, this text layer can automatically be extracted without OCR. Some PDF files will contain uncommon text encoding that creates issues, which can be fixed by running OCR on the document. Kanji, Chinese and Korean are not supported. Whether using the --words or --lines options, all words on the text layer will be searchable.

Example

pdftodjvu --words document.pdf [document.djvu]