API
Supported Extensions
The following file extensions are supported:
Extension |
---|
.tif |
.tiff |
.png |
.jpeg |
.jpg |
Important
To add support for a specific format, please contact us at support@docsquality.com.
Endpoints
POST /engine/quality
Description
API Endpoint
POST /engine/quality
The endpoint allows to predict the quality of the document. A document file is required as a parameter of the request. In the case of a password-protected document, the endpoint will return relevant information about it and ask for the password, which must also be sent as a request parameter. In addition, it is possible to define a permission to store the uploaded document in our database by setting the save_access boolean value. Optional parameter of the request:
ocr_index enables evaluating of the OCR index of the input document
doc_category enables prediction of document category (for the list of recognized categories see description of POST /engine/document_category)
page_category enables prediction of category of each page separately
signature enables detection of the signature on the document (only PDF files*)
first_page sets the number of the first page to be predicted. Only pages with numbers between first_page and last_page will be predicted.
last_page sets the number of the last page to be predicted.
Request body
Important
The request body should be of type multipart/form-data and contain the following fields:
file (File, required): The document file to quality predict.
password (string, optional): The password for the protected file.
ocr_index (string, optional): The flag enabling the estimation of the OCR index. Set ocr_index=’true’ to include OCR index in API response.
doc_category (string optional): The flag enabling categorization of the entire document. Set doc_category=’true’ to include document category in API response.
page_category (string optional): The flag enabling categorization of each page in the document separately. Set page_category=’true’ to include the category of each page in API response.
signature (string optional): The flag enabling the detection of the signature on the document. Set signature=’true’ to include the signature detection in API response.
first_page (int, optional): The first page to be predicted. Set to 1 by default.
last_page (int, optional): The last page to be predicted. Set to maximum by default.
Example Response
{
"prediction": {
"attribute": "READABILITY",
"date": "2024-02-01 13:00:00",
"execution_time": 1.0,
"message": "The document file is of poor quality due to low readability",
"ocr_index": 38.16,
"quality": 45.0,
"signed": false,
"version": "1.1.3"
},
"qualities": [
{
"attribute": "READABILITY",
"details": [
{
"attribute": "SHARPNESS",
"detailed_attributes": [
{
"attribute": "GENERAL_SHARPNESS",
"quality": 100.0
}
],
"quality": 100.0
},
{
"attribute": "READABILITY",
"detailed_attributes": [
{
"attribute": "UNREADABLE_HANDWRITING",
"quality": 45.0
},
{
"attribute": "UNREADABLE_PRINTED_TEXT",
"quality": 100.0
}
],
"quality": 45.0
},
{
"attribute": "EXPOSURE",
"detailed_attributes": [
{
"attribute": "OVEREXPOSED",
"quality": 100.0
},
{
"attribute": "LOW_BRIGHTNESS",
"quality": 100.0
}
],
"quality": 100.0
},
{
"attribute": "SHAPE",
"detailed_attributes": [
{
"attribute": "CONTOURS",
"quality": 100.0
},
{
"attribute": "PERSPECTIVE",
"quality": 100.0
}
],
"quality": 100.0
}
],
"message": "The document file is of poor quality due to low readability",
"ocr_index": 38.16,
"page": 1,
"quality": 45.0
}
]
}
Error Responses
400 Bad Request: If the request is malformed or missing required parameters.
413 Request Entity Too Large: If the request file exceeds the limit.
500 Internal Server Error: If an unexpected error occurs.
POST /engine/ocr_index
Description
API Endpoint
POST /engine/ocr_index
The endpoint allows to predict the OCR index of the document. A document file is required as a parameter of the request. In the case of a password-protected document, the endpoint will return relevant information about it and ask for the password, which must also be sent as a request parameter. In addition, it is possible to define a permission to store the uploaded document in our database by setting the save_access boolean value.
Request body
Important
The request body should be of type multipart/form-data and contain the following fields:
file (File, required): The document file to quality predict.
password (string, optional): The password for the protected file.
Example Response
{
"date": "2024-02-01 10:08:57.692068",
"execution_time": 1.87,
"ocr_index": 61.81,
"pages": [
{
"ocr_index": 61.81,
"page": 1
}
],
"version": "1.2.0"
}
Error Responses
400 Bad Request: If the request is malformed or missing required parameters.
413 Request Entity Too Large: If the request file exceeds the limit.
500 Internal Server Error: If an unexpected error occurs.
POST /engine/document_category
Description
API Endpoint
POST /engine/document_category
The endpoint allows to categorize the document. A document file is required as a parameter of the request. In the case of a password-protected document, the endpoint will return relevant information about it and ask for the password, which must also be sent as a request parameter. In addition, it is possible to define a permission to store the uploaded document in our database by setting the save_access boolean value.
DocsQuality recognizes the following document categories:
Category |
---|
invoice |
CMR consignment note |
delivery note |
payment confirmation |
agreement |
insurance policy |
Request body
Important
The request body should be of type multipart/form-data and contain the following fields:
file (File, required): The document file to quality predict.
password (string, optional): The password for the protected file.
page_category (string optional): The flag enabling categorization of each page in the document separately (default is ‘false’). Set page_category=’true’ to include the category of each page in API response.
Example Response
{
"date": "2024-03-15 11:30:11.045844",
"document_category": "CMR",
"execution_time": 5.22,
"version": "1.2.1"
}
{
"date": "2024-03-15 11:43:52.176217",
"document_category": "INVOICE",
"execution_time": 9.83,
"pages": [
{
"page": 1,
"category": "INVOICE"
},
{
"page": 2,
"category": "INVOICE"
},
{
"page": 3,
"category": "INVOICE"
},
{
"page": 4,
"type": "UNKNOWN"
}
],
"version": "1.2.1"
}
Error Responses
400 Bad Request: If the request is malformed or missing required parameters.
413 Request Entity Too Large: If the request file exceeds the limit.
500 Internal Server Error: If an unexpected error occurs.
POST /engine/embed_text
Description
API Endpoint
POST /engine/embed_text
The endpoint allows to recognize and embed text in PDF files. The document file is required as a parameter of the request - only PDF files are accepted. In the case of a password-protected document, the endpoint will return relevant information about it and ask for the password, which must also be sent as a request parameter. In addition, it is possible to define a permission to store the uploaded document in our database by setting the save_access boolean value. Returns content of pdf file in response.
Request body
Important
The request body should be of type multipart/form-data and contain the following fields:
file (File, required): The document file to quality predict.
password (string, optional): The password for the protected file.
Error Responses
400 Bad Request: If the request is malformed or missing required parameters.
413 Request Entity Too Large: If the request file exceeds the limit.
500 Internal Server Error: If an unexpected error occurs.
POST /engine/signature
Description
API Endpoint
POST /engine/signature
The endpoint allows to detect the signature on the document. A document file is required as a parameter of the request - only PDF files are accepted.
Request body
Important
The request body should be of type multipart/form-data and contain the following fields:
file (File, required): The document file to quality predict.
password (string, optional): The password for the protected file.
Example Response
true
false
Error Responses
400 Bad Request: If the request is malformed or missing required parameters.
413 Request Entity Too Large: If the request file exceeds the limit.
500 Internal Server Error: If an unexpected error occurs.
Main functions and models
- server.quality_document()
Returns the quality of the document included in the request
- Raises
server.PredictionException – Raised if prediction validation failed.
- Returns
Predicted document quality JSON object
- Return type
- exception server.PredictionException(message)
Raised if prediction validation failed
- class data.models.document_quality.DocumentQuality(prediction: Prediction, qualities: List[Quality])
Represents the document quality complex response generated by the engine.
- Parameters
prediction (Prediction) – Overall document prediction results.
qualities (List[Quality]) – A list of qualities per document page.
- class data.models.ocr_index.OcrIndex(date, ocr_index, execution_time, version, qualities: List[Quality])
Represents OCR index of the document.
- Parameters
date (datetime.datetime) – The date of the prediction.
ocr_index (float) – overall document OCR index as number value (0-100).
execution_time (float) – The execution time of the prediction in seconds.
version (str) – The current version of the DocsQuality prediction engine.
qualities (List[Quality]) – A list of qualities per document page.
- class data.models.prediction.Prediction(date: datetime, attribute: str, message: str, quality: float, ocr_index: float, document_category: Category, signed: bool, execution_time: float, version: str)
Represents overall document prediction results.
- Parameters
date (datetime.datetime) – The date of the prediction.
attribute (str) – high-level classified attribute for the document.
message (str) – output message for the overall prediction.
quality (float) – overall document quality as number value (0-100).
ocr_index (float) – overall OCRindex of the document
document_category – category of the document
document_category – Category
signed (bool) – Whether the document is digitally signed.
execution_time (float) – The execution time of the prediction in seconds.
version (str) – The current version of the DocsQuality prediction engine.
- class data.models.quality.Quality(page: int, attribute: str, quality: float, message: str, ocr_index: float, page_category: Category, details: List[QualityAttribute], lang='en')
Represents document quality with low-level details.
- Parameters
page (int) – Document page number.
attribute (str) – High-level classified attribute for the document for page.
quality (float) – Document page quality as number value (0-100).
message (str) – Output message for specific page.
ocr_index (float) – OCR index
page_category – category of the single page
page_category – Category
details (List[QualityAttribute]) – List of document page quality details.
- class data.models.quality.QualityAttribute(attribute: HighLevelAttribute, quality: float, detailed_attributes: List[DetailedAttribute], additional_remarks: Optional[List] = None, lang: str = 'en', internationalize: bool = False)
Represents document quality high-level attribute
- Parameters
attribute (HighLevelAttribute) – Name of high-level attribute.
quality (float) – Document page quality for specified high-level attribute (0-100).
detailed_attributes (List[DetailedAttribute]) – List of low-level quality details for specific high-level attribute.
- class data.models.quality.DetailedAttribute(attribute: LowLevelAttribute, quality: float)
Represents document quality low-level attribute
- Parameters
attribute (LowLevelAttribute) – Name of high-level attribute.
quality (float) – Document page quality for specified low-level attribute (0-100).
- class data.models.quality.HighLevelAttribute(value)
Represents high-level document quality attributes
SHARPNESS
EXPOSURE
READABILITY
SHAPE
GOOD_QUALITY
- class data.models.quality.LowLevelAttribute(value)
Represents low-level document quality attributes
GENERAL_SHARPNESS
LOW_BRIGHTNESS
OVEREXPOSED
PERSPECTIVE
CONTOURS
UNREADABLE_HANDWRITING
UNREADABLE_PRINTED_TEXT
- class data.models.document_category.Category(value)
Represents labels for document categories - CMR - INVOICE - DELIVERY_NOTE - INSURANCE_POLICY - PAYMENT_CONFIRMATION - AGREEMENT - UNKNOWN - UNABLE_TO_CLASSIFY