Documents lifecycle

When you upload a document to the Sikoia platform against an entity (e.g. a case or person), Sikoia performs an elaborate flow of pre-processing, document validation, classification, multiple levels of data extraction, and data normalisation, while checking for tampering and irregularities throughout the process.

We call this the document lifecycle. The main steps you need to understand are summarised below:

Classification
Tampering checks
Integrity checks
Standard extraction
Deep extraction

The moment you upload a document, the status of the source_id is Pending, reflecting the processing that is about to take place.

If a document is supported for Standard extraction only, it will move from Pending to Provided, and this will be its final status.

If the document is also supported for Deep extraction, it will move from Pending to Completed.

This is an important distinction to keep in mind when building your flow.

You can use our webhooks to receive a notification when a document finishes processing and reaches Provided, Completed, or Failed status, read more here. Alternatively, you can poll to check the status at any time, here.

Classification

The document is classified into one of the supported document types. You can read more about this here.

Sikoia can detect files that contain multiple documents. Each file you upload has a single document_id (or source_id), and every individual document identified within it is assigned a document_source_id. You can read more here.

📘
document_id and source_id can be used interchangeably.

Tampering checks

Tampering checks are part of the Sikoia Integrity checks endpoint, which you can review here. They examine the entire file to detect signs of tampering, structural irregularities, and inconsistencies. Read more here.

Integrity checks

Integrity checks is an umbrella term covering both file-level integrity checks (document_id/source_id tampering checks) and checks on the individual documents inside the file (data_source_ids).

data_source_id integrity checks go one level deeper than the file as a whole, detecting irregularities in the underlying document itself, whether it originated as a scanned physical document or was generated digitally. You can read more here.

Standard extraction

You can find the documents supported for each level of extraction here.

For documents where we don't support deep extraction, you can still retrieve a subset of data fields, for example the document type, recipient, date, and the providers specified on the document. For a full list, please contact our team.

Deep extraction

Deep extraction is an advanced level of extraction that Sikoia supports for certain documents with near-perfect accuracy. It goes beyond simple data field extraction: the system understands the contextual intricacies of a document and can extract and infer the key data points that enable customers to make financial decisions. You can read more here.