Changelog
Version 1.2.4 (2024-05-17)
Added
added binary builds for Python 3.11 and 3.12.
Changed
updated core dependencies,
removed binary builds for Python 3.8 (end-of-security in 10/2024).
Version 1.2.3 (2024-05-07)
Added
new feature - checking whether the document is digitally signed
added the ability to select pages to predict via request parameters
Version 1.2.2 (2024-04-10)
Added
new feature - recognition and embedding text in pdf files
Version 1.2.1 (2024-03-22)
Responding to the needs of our partners, we have introduced another innovation to DocsQuality. In addition to evaluating quality, readability, and the OCRindex rate, the program will also categorize the analyzed documents. The feature relies on two components: visual document inspection and Natural Language Processing (NLP).
Added
document categorization feature
Changed
improved detection of document contours
Version 1.2.0 (2024-02-01)
The latest version of DocsQuality has been released! We’ve introduced a new feature: the OCRIndex. It enables users to verify whether a file will be accurately processed by the Optical Character Recognition (OCR) engine and determine the proportion of text that may be considered unreadable.
Added
evaluating OCR index of the input document
improved recognition of vector documents
Version 1.1.9 (2023-12-19)
Changed
updated model for evaluation of quality of printed text
Version 1.1.8 (2023-11-15)
Changed
updated method for calculating attribute GENERAL SHARPNESS
Version 1.1.7 (2023-10-23)
Changed
improved models for handwriting classification
updated algorithm for calculation of final handwritten text quality
modified image preprocessing before handwriting detection and classification
Version 1.1.6 (2023-09-08)
Changed
improved models for handwriting detection and classification
filtering out irrelevant handwriting detections located at the corners of documents
Version 1.1.5 (2023-07-19)
Added
added detection of cropped edges of documents
Version 1.1.4 (2023-05-29)
Added
added new attribute: UNREADABLE_PRINTED_TEXT
Version 1.1.3 (2023-05-10)
Changed
changed the processing of PDF files depending on the data they contain (vector or raster)
Version 1.1.2 (2023-05-05)
Changed
new version of YOLO model for handwriting detection
RCNN classifier replaced by EfficientNet classification model
Version 1.1.1 (2023-04-26)
Changed
changed list of quality attributes
Version 1.1.0 (2023-04-13)
Added
added unit tests to check the algorithm methods in the docsquality engine
Version 1.0.9 (2023-04-05)
Changed
changed formats of two YOLO models (model for contour detection and model for handwriting detection) - PyTorch replaced with ONNX.
Version 1.0.8 (2023-03-28)
Added
added additional DocsQuality engine service with GPU acceleration.
Changed
changed engine base image.
Version 1.0.7 (2023-03-21)
Changed
implemented new method to calculate SMALL_CONTOURS - YOLOv5 network is used
implemented new method to calculate PERSPECTIVE_DEVIATION - based on HoughLines, instead of GrubCut
Version 1.0.6 (2023-03-02)
Changed
implemented new algorithm based on neural network for overexposure prediction.
Version 1.0.5 (2023-02-02)
Changed
engine performance optimized by changing input files preprocessing. Operations on temporary files with separate pages replaced with computations on numpy.ndarray,
memory leak fixed by deleting all temporary files and modified tensorflow.model inference.
Version 1.0.4 (2022-10-14)
Added
support for possible resolution for receipts in the output metric of small contours.
Changed
implemented new algorithm for finding document contours,
quality metrics calculation improvements.
Version 1.0.3 (2022-10-04)
Changed
backends divided into a server for prediction and a separate one for db and frontend use.
Version 1.0.2 (2022-08-31)
Added
support for password-protected file,
added OpenAPI Flask-Restx version 0.5.1.
Version 1.0.1 (2022-07-25)
Added
YoloV5 model for receipts detection.
Changed
Updated Python to 3.8.10.