Document processing

From Wikitia
Jump to navigation Jump to search

Document processing is a subject of study as well as a collection of industrial techniques that are aimed at converting an analogue document into a digital document. Document processing is not only concerned with photographing or scanning a document in order to create a digital picture, but it is also concerned with making the document digitally readable. This comprises extracting the document's structure or layout, followed by the content, which might be in the form of text or graphics, depending on the format. Traditional computer vision techniques, convolutional neural networks, and human labour are all possible methods of completing the task. Among the issues addressed are semantic segmentation, object identification, optical character recognition (OCR), handwritten text recognition (HTR), and transcription in general (automated or manual). Besides the scanning phase, there is also the phase of reading the document, which may be accomplished via the use of natural language processing (NLP) or image classification technologies, among other methods. It is used in a wide range of industrial and scientific disciplines to improve the efficiency of administrative procedures, mail processing, and the digitization of analogue archives and historical documents, among other things.

Document processing was originally, and continues to be to some extent, a kind of assembly-line labour that dealt with the handling of documents, such as letters and packages, with the goal of sorting, extracting, or extracting data in large quantities from them. In-house or business process outsourcing are also options for doing this activity successfully. Indeed, document processing may involve some form of externalised manual labour, such as that provided by Mechanical Turk.

Consider the manual document processing used in "millions of visa and citizenship applications" in 2007. At the time, "about 1,000 contract employees" were employed to "handle the mail room and data input," which was a relatively recent example of manual document processing.