Data extraction

From Wikitia
Jump to navigation Jump to search

Data extraction is the act or process of obtaining information from data sources that are often unstructured or poorly organised in order to use the information for subsequent data processing or storage (data migration). Consequently, data transformation and possibly the addition of metadata are usually performed after the import into the intermediate extracting system before the data is exported to a subsequent stage in the data workflow.

When data from primary sources, such as measuring or recording equipment, is initially imported into a computer, the phrase "data extraction" is often used to describe the process of doing so. Today's electronic gadgets will often have an electrical connection (such as a USB port) that allows 'raw data' to be streamed into a personal computer from the device.

The unstructured data sources that are utilised for sales or marketing leads include web pages and emails as well as documents and PDFs, scanned text, mainframe reports, spool files, classified advertising, and other similar sources. It has become a significant technical challenge to extract data from these unstructured sources; whereas historically data extraction had to deal with changes in physical hardware formats, the majority of current data extraction deals with extracting information from these unstructured sources and from a variety of different software formats. The term "Web data extraction" or "Web scraping" refers to the technique of extracting information from the internet, which is becoming more popular.