Changelog

Version 1.2.4 (2024-05-17)

Added

added binary builds for Python 3.11 and 3.12.

Changed

updated core dependencies,
removed binary builds for Python 3.8 (end-of-security in 10/2024).

Version 1.2.3 (2024-05-07)

Added

new feature - checking whether the document is digitally signed
added the ability to select pages to predict via request parameters

Version 1.2.2 (2024-04-10)

Added

new feature - recognition and embedding text in pdf files

Version 1.2.1 (2024-03-22)

Responding to the needs of our partners, we have introduced another innovation to DocsQuality. In addition to evaluating quality, readability, and the OCRindex rate, the program will also categorize the analyzed documents. The feature relies on two components: visual document inspection and Natural Language Processing (NLP).

Added

document categorization feature

Changed

improved detection of document contours

Version 1.2.0 (2024-02-01)

The latest version of DocsQuality has been released! We’ve introduced a new feature: the OCRIndex. It enables users to verify whether a file will be accurately processed by the Optical Character Recognition (OCR) engine and determine the proportion of text that may be considered unreadable.

Added

evaluating OCR index of the input document
improved recognition of vector documents

Version 1.1.9 (2023-12-19)

Changed

updated model for evaluation of quality of printed text

Version 1.1.8 (2023-11-15)

Changed

updated method for calculating attribute GENERAL SHARPNESS

Version 1.1.7 (2023-10-23)

Changed

improved models for handwriting classification
updated algorithm for calculation of final handwritten text quality
modified image preprocessing before handwriting detection and classification

Version 1.1.6 (2023-09-08)

Changed

improved models for handwriting detection and classification
filtering out irrelevant handwriting detections located at the corners of documents

Version 1.1.5 (2023-07-19)

Added

added detection of cropped edges of documents

Version 1.1.4 (2023-05-29)

Added

added new attribute: UNREADABLE_PRINTED_TEXT

Version 1.1.3 (2023-05-10)

Changed

changed the processing of PDF files depending on the data they contain (vector or raster)

Version 1.1.2 (2023-05-05)

Changed

new version of YOLO model for handwriting detection
RCNN classifier replaced by EfficientNet classification model

Version 1.1.1 (2023-04-26)

Changed

changed list of quality attributes

Version 1.1.0 (2023-04-13)

Added

added unit tests to check the algorithm methods in the docsquality engine

Version 1.0.9 (2023-04-05)

Changed

changed formats of two YOLO models (model for contour detection and model for handwriting detection) - PyTorch replaced with ONNX.

Version 1.0.8 (2023-03-28)

Added

added additional DocsQuality engine service with GPU acceleration.

Changed

changed engine base image.

Version 1.0.7 (2023-03-21)

Changed

implemented new method to calculate SMALL_CONTOURS - YOLOv5 network is used
implemented new method to calculate PERSPECTIVE_DEVIATION - based on HoughLines, instead of GrubCut

Version 1.0.6 (2023-03-02)

Changed

implemented new algorithm based on neural network for overexposure prediction.

Version 1.0.5 (2023-02-02)

Changed

engine performance optimized by changing input files preprocessing. Operations on temporary files with separate pages replaced with computations on numpy.ndarray,
memory leak fixed by deleting all temporary files and modified tensorflow.model inference.

Version 1.0.4 (2022-10-14)

Added

support for possible resolution for receipts in the output metric of small contours.

Changed

implemented new algorithm for finding document contours,
quality metrics calculation improvements.

Version 1.0.3 (2022-10-04)

Changed

backends divided into a server for prediction and a separate one for db and frontend use.

Version 1.0.2 (2022-08-31)

Added

support for password-protected file,
added OpenAPI Flask-Restx version 0.5.1.

Version 1.0.1 (2022-07-25)

Added

YoloV5 model for receipts detection.

Changed

Updated Python to 3.8.10.