API

Supported Extensions

The following file extensions are supported:

Extension

.pdf

.tif

.tiff

.png

.jpeg

.jpg

Important

To add support for a specific format, please contact us at support@docsquality.com.

Endpoints

POST /engine/quality

Description

API Endpoint

POST /engine/quality

The endpoint allows to predict the quality of the document. A document file is required as a parameter of the request. In the case of a password-protected document, the endpoint will return relevant information about it and ask for the password, which must also be sent as a request parameter. In addition, it is possible to define a permission to store the uploaded document in our database by setting the save_access boolean value. Optional parameter of the request:

  • ocr_index enables evaluating of the OCR index of the input document

  • doc_category enables prediction of document category (for the list of recognized categories see description of POST /engine/document_category)

  • page_category enables prediction of category of each page separately

  • signature enables detection of the signature on the document (only PDF files*)

  • first_page sets the number of the first page to be predicted. Only pages with numbers between first_page and last_page will be predicted.

  • last_page sets the number of the last page to be predicted.

Request body

Important

The request body should be of type multipart/form-data and contain the following fields:

  • file (File, required): The document file to quality predict.

  • password (string, optional): The password for the protected file.

  • ocr_index (string, optional): The flag enabling the estimation of the OCR index. Set ocr_index=’true’ to include OCR index in API response.

  • doc_category (string optional): The flag enabling categorization of the entire document. Set doc_category=’true’ to include document category in API response.

  • page_category (string optional): The flag enabling categorization of each page in the document separately. Set page_category=’true’ to include the category of each page in API response.

  • signature (string optional): The flag enabling the detection of the signature on the document. Set signature=’true’ to include the signature detection in API response.

  • first_page (int, optional): The first page to be predicted. Set to 1 by default.

  • last_page (int, optional): The last page to be predicted. Set to maximum by default.

Example Response

{
    "prediction": {
        "attribute": "READABILITY",
        "date": "2024-02-01 13:00:00",
        "execution_time": 1.0,
        "message": "The document file is of poor quality due to low readability",
        "ocr_index": 38.16,
        "quality": 45.0,
        "signed": false,
        "version": "1.1.3"
    },
    "qualities": [
        {
            "attribute": "READABILITY",
            "details": [
                {
                    "attribute": "SHARPNESS",
                    "detailed_attributes": [
                        {
                            "attribute": "GENERAL_SHARPNESS",
                            "quality": 100.0
                        }
                    ],
                    "quality": 100.0
                },
                {
                    "attribute": "READABILITY",
                    "detailed_attributes": [
                        {
                            "attribute": "UNREADABLE_HANDWRITING",
                            "quality": 45.0
                        },
                        {
                            "attribute": "UNREADABLE_PRINTED_TEXT",
                            "quality": 100.0
                        }
                    ],
                    "quality": 45.0
                },
                {
                    "attribute": "EXPOSURE",
                    "detailed_attributes": [
                        {
                            "attribute": "OVEREXPOSED",
                            "quality": 100.0
                        },
                        {
                            "attribute": "LOW_BRIGHTNESS",
                            "quality": 100.0
                        }
                    ],
                    "quality": 100.0
                },
                {
                    "attribute": "SHAPE",
                    "detailed_attributes": [
                        {
                            "attribute": "CONTOURS",
                            "quality": 100.0
                        },
                        {
                            "attribute": "PERSPECTIVE",
                            "quality": 100.0
                        }
                    ],
                    "quality": 100.0
                }
            ],
            "message": "The document file is of poor quality due to low readability",
            "ocr_index": 38.16,
            "page": 1,
            "quality": 45.0
        }
    ]
}

Error Responses

  • 400 Bad Request: If the request is malformed or missing required parameters.

  • 413 Request Entity Too Large: If the request file exceeds the limit.

  • 500 Internal Server Error: If an unexpected error occurs.

POST /engine/ocr_index

Description

API Endpoint

POST /engine/ocr_index

The endpoint allows to predict the OCR index of the document. A document file is required as a parameter of the request. In the case of a password-protected document, the endpoint will return relevant information about it and ask for the password, which must also be sent as a request parameter. In addition, it is possible to define a permission to store the uploaded document in our database by setting the save_access boolean value.

Request body

Important

The request body should be of type multipart/form-data and contain the following fields:

  • file (File, required): The document file to quality predict.

  • password (string, optional): The password for the protected file.

Example Response

{
"date": "2024-02-01 10:08:57.692068",
"execution_time": 1.87,
"ocr_index": 61.81,
"pages": [
    {
        "ocr_index": 61.81,
        "page": 1
    }
],
"version": "1.2.0"
}

Error Responses

  • 400 Bad Request: If the request is malformed or missing required parameters.

  • 413 Request Entity Too Large: If the request file exceeds the limit.

  • 500 Internal Server Error: If an unexpected error occurs.

POST /engine/document_category

Description

API Endpoint

POST /engine/document_category

The endpoint allows to categorize the document. A document file is required as a parameter of the request. In the case of a password-protected document, the endpoint will return relevant information about it and ask for the password, which must also be sent as a request parameter. In addition, it is possible to define a permission to store the uploaded document in our database by setting the save_access boolean value.

DocsQuality recognizes the following document categories:

Category

invoice

CMR consignment note

delivery note

payment confirmation

agreement

insurance policy

Request body

Important

The request body should be of type multipart/form-data and contain the following fields:

  • file (File, required): The document file to quality predict.

  • password (string, optional): The password for the protected file.

  • page_category (string optional): The flag enabling categorization of each page in the document separately (default is ‘false’). Set page_category=’true’ to include the category of each page in API response.

Example Response

{
    "date": "2024-03-15 11:30:11.045844",
    "document_category": "CMR",
    "execution_time": 5.22,
    "version": "1.2.1"
}
{
"date": "2024-03-15 11:43:52.176217",
"document_category": "INVOICE",
"execution_time": 9.83,
"pages": [
    {
        "page": 1,
        "category": "INVOICE"
    },
    {
        "page": 2,
        "category": "INVOICE"
    },
    {
        "page": 3,
        "category": "INVOICE"
    },
    {
        "page": 4,
        "type": "UNKNOWN"
    }
],
"version": "1.2.1"
}

Error Responses

  • 400 Bad Request: If the request is malformed or missing required parameters.

  • 413 Request Entity Too Large: If the request file exceeds the limit.

  • 500 Internal Server Error: If an unexpected error occurs.

POST /engine/embed_text

Description

API Endpoint

POST /engine/embed_text

The endpoint allows to recognize and embed text in PDF files. The document file is required as a parameter of the request - only PDF files are accepted. In the case of a password-protected document, the endpoint will return relevant information about it and ask for the password, which must also be sent as a request parameter. In addition, it is possible to define a permission to store the uploaded document in our database by setting the save_access boolean value. Returns content of pdf file in response.

Request body

Important

The request body should be of type multipart/form-data and contain the following fields:

  • file (File, required): The document file to quality predict.

  • password (string, optional): The password for the protected file.

Error Responses

  • 400 Bad Request: If the request is malformed or missing required parameters.

  • 413 Request Entity Too Large: If the request file exceeds the limit.

  • 500 Internal Server Error: If an unexpected error occurs.

POST /engine/signature

Description

API Endpoint

POST /engine/signature

The endpoint allows to detect the signature on the document. A document file is required as a parameter of the request - only PDF files are accepted.

Request body

Important

The request body should be of type multipart/form-data and contain the following fields:

  • file (File, required): The document file to quality predict.

  • password (string, optional): The password for the protected file.

Example Response

true

false

Error Responses

  • 400 Bad Request: If the request is malformed or missing required parameters.

  • 413 Request Entity Too Large: If the request file exceeds the limit.

  • 500 Internal Server Error: If an unexpected error occurs.

Main functions and models

server.quality_document()

Returns the quality of the document included in the request

Raises

server.PredictionException – Raised if prediction validation failed.

Returns

Predicted document quality JSON object

Return type

DocumentQuality

exception server.PredictionException(message)

Raised if prediction validation failed

class data.models.document_quality.DocumentQuality(prediction: Prediction, qualities: List[Quality])

Represents the document quality complex response generated by the engine.

Parameters
  • prediction (Prediction) – Overall document prediction results.

  • qualities (List[Quality]) – A list of qualities per document page.

class data.models.ocr_index.OcrIndex(date, ocr_index, execution_time, version, qualities: List[Quality])

Represents OCR index of the document.

Parameters
  • date (datetime.datetime) – The date of the prediction.

  • ocr_index (float) – overall document OCR index as number value (0-100).

  • execution_time (float) – The execution time of the prediction in seconds.

  • version (str) – The current version of the DocsQuality prediction engine.

  • qualities (List[Quality]) – A list of qualities per document page.

class data.models.prediction.Prediction(date: datetime, attribute: str, message: str, quality: float, ocr_index: float, document_category: Category, signed: bool, execution_time: float, version: str)

Represents overall document prediction results.

Parameters
  • date (datetime.datetime) – The date of the prediction.

  • attribute (str) – high-level classified attribute for the document.

  • message (str) – output message for the overall prediction.

  • quality (float) – overall document quality as number value (0-100).

  • ocr_index (float) – overall OCRindex of the document

  • document_category – category of the document

  • document_category – Category

  • signed (bool) – Whether the document is digitally signed.

  • execution_time (float) – The execution time of the prediction in seconds.

  • version (str) – The current version of the DocsQuality prediction engine.

class data.models.quality.Quality(page: int, attribute: str, quality: float, message: str, ocr_index: float, page_category: Category, details: List[QualityAttribute], lang='en')

Represents document quality with low-level details.

Parameters
  • page (int) – Document page number.

  • attribute (str) – High-level classified attribute for the document for page.

  • quality (float) – Document page quality as number value (0-100).

  • message (str) – Output message for specific page.

  • ocr_index (float) – OCR index

  • page_category – category of the single page

  • page_category – Category

  • details (List[QualityAttribute]) – List of document page quality details.

class data.models.quality.QualityAttribute(attribute: HighLevelAttribute, quality: float, detailed_attributes: List[DetailedAttribute], additional_remarks: Optional[List] = None, lang: str = 'en', internationalize: bool = False)

Represents document quality high-level attribute

Parameters
  • attribute (HighLevelAttribute) – Name of high-level attribute.

  • quality (float) – Document page quality for specified high-level attribute (0-100).

  • detailed_attributes (List[DetailedAttribute]) – List of low-level quality details for specific high-level attribute.

class data.models.quality.DetailedAttribute(attribute: LowLevelAttribute, quality: float)

Represents document quality low-level attribute

Parameters
  • attribute (LowLevelAttribute) – Name of high-level attribute.

  • quality (float) – Document page quality for specified low-level attribute (0-100).

class data.models.quality.HighLevelAttribute(value)

Represents high-level document quality attributes

  • SHARPNESS

  • EXPOSURE

  • READABILITY

  • SHAPE

  • GOOD_QUALITY

class data.models.quality.LowLevelAttribute(value)

Represents low-level document quality attributes

  • GENERAL_SHARPNESS

  • LOW_BRIGHTNESS

  • OVEREXPOSED

  • PERSPECTIVE

  • CONTOURS

  • UNREADABLE_HANDWRITING

  • UNREADABLE_PRINTED_TEXT

class data.models.document_category.Category(value)

Represents labels for document categories - CMR - INVOICE - DELIVERY_NOTE - OCP_OCS - PAYMENT_CONFIRMATION - AGREEMENT - UNKNOWN - UNABLE_TO_CLASSIFY