Changelog

Version 1.2.4 (2024-05-17)

Added

  • added binary builds for Python 3.11 and 3.12.

Changed

  • updated core dependencies,

  • removed binary builds for Python 3.8 (end-of-security in 10/2024).

Version 1.2.3 (2024-05-07)

Added

  • new feature - checking whether the document is digitally signed

  • added the ability to select pages to predict via request parameters

Version 1.2.2 (2024-04-10)

Added

  • new feature - recognition and embedding text in pdf files

Version 1.2.1 (2024-03-22)

Responding to the needs of our partners, we have introduced another innovation to DocsQuality. In addition to evaluating quality, readability, and the OCRindex rate, the program will also categorize the analyzed documents. The feature relies on two components: visual document inspection and Natural Language Processing (NLP).

Added

  • document categorization feature

Changed

  • improved detection of document contours

Version 1.2.0 (2024-02-01)

The latest version of DocsQuality has been released! We’ve introduced a new feature: the OCRIndex. It enables users to verify whether a file will be accurately processed by the Optical Character Recognition (OCR) engine and determine the proportion of text that may be considered unreadable.

Added

  • evaluating OCR index of the input document

  • improved recognition of vector documents

Version 1.1.9 (2023-12-19)

Changed

  • updated model for evaluation of quality of printed text

Version 1.1.8 (2023-11-15)

Changed

  • updated method for calculating attribute GENERAL SHARPNESS

Version 1.1.7 (2023-10-23)

Changed

  • improved models for handwriting classification

  • updated algorithm for calculation of final handwritten text quality

  • modified image preprocessing before handwriting detection and classification

Version 1.1.6 (2023-09-08)

Changed

  • improved models for handwriting detection and classification

  • filtering out irrelevant handwriting detections located at the corners of documents

Version 1.1.5 (2023-07-19)

Added

  • added detection of cropped edges of documents

Version 1.1.4 (2023-05-29)

Added

  • added new attribute: UNREADABLE_PRINTED_TEXT

Version 1.1.3 (2023-05-10)

Changed

  • changed the processing of PDF files depending on the data they contain (vector or raster)

Version 1.1.2 (2023-05-05)

Changed

  • new version of YOLO model for handwriting detection

  • RCNN classifier replaced by EfficientNet classification model

Version 1.1.1 (2023-04-26)

Changed

  • changed list of quality attributes

Version 1.1.0 (2023-04-13)

Added

  • added unit tests to check the algorithm methods in the docsquality engine

Version 1.0.9 (2023-04-05)

Changed

  • changed formats of two YOLO models (model for contour detection and model for handwriting detection) - PyTorch replaced with ONNX.

Version 1.0.8 (2023-03-28)

Added

  • added additional DocsQuality engine service with GPU acceleration.

Changed

  • changed engine base image.

Version 1.0.7 (2023-03-21)

Changed

  • implemented new method to calculate SMALL_CONTOURS - YOLOv5 network is used

  • implemented new method to calculate PERSPECTIVE_DEVIATION - based on HoughLines, instead of GrubCut

Version 1.0.6 (2023-03-02)

Changed

  • implemented new algorithm based on neural network for overexposure prediction.

Version 1.0.5 (2023-02-02)

Changed

  • engine performance optimized by changing input files preprocessing. Operations on temporary files with separate pages replaced with computations on numpy.ndarray,

  • memory leak fixed by deleting all temporary files and modified tensorflow.model inference.

Version 1.0.4 (2022-10-14)

Added

  • support for possible resolution for receipts in the output metric of small contours.

Changed

  • implemented new algorithm for finding document contours,

  • quality metrics calculation improvements.

Version 1.0.3 (2022-10-04)

Changed

  • backends divided into a server for prediction and a separate one for db and frontend use.

Version 1.0.2 (2022-08-31)

Added

  • support for password-protected file,

  • added OpenAPI Flask-Restx version 0.5.1.

Version 1.0.1 (2022-07-25)

Added

  • YoloV5 model for receipts detection.

Changed

  • Updated Python to 3.8.10.