Semantic Metadata often relies on human users' understanding of natural language to comprehend the meaning, or semantics. If metadata is data that describes data, 'Semantic metadata' is data that describes the meaning of that data or content. In other words, it is making explicit the meaning of the metadata so that machines, and not just humans, can infer or interpret information about that metadata.
All metadata has semantics but much or all of it is implicit and relies on human understanding of natural language. For example, HTML is a well-known language for descriptive metadata. It describes elements on a web-based documents so that browsers can display the website consistently and reliably. The <address> tag is used for the contact information for the author/owner, but this depends on a shared understanding of what constitutes an 'address', contact information and the conventions for how those are written. The only thing a computer understands about the tag is to usually make it italicized.
The most well-known and most widely used descriptive metadata languages used for semantic metadata are Resource Description Framework (RDF) and Web Ontology Language (OWL) there are many others such as JSON-LD. RDF is the recognized W3C standard for specifying semantic metadata on the web, otherwise known as the Semantic Web. Its conceptual antecedents come fromDescription Logics.
The meaning of the data in semantic metadata comes from deep interlinking and contextualization. The subject–predicate–object format of RDF statements inevitably provides these traits as the subject, or entity of one RDF statement is used as the object of another statement creating a labelled and directed graph. For example, some of the statement depicted in the illustration could be represented informally as Bob–knows–Alice, Alice–knows–Bob, Chess–members–Alice, Chess–members–Bob, Alice–is_member–Chess, Bob–is_member–Chess, Chess–Type–Group, etc. This interlinking creates the context from which meaning can be inferred by machines as well as humans. This representation of content is at a higher-level of abstraction and closer to how humans understand and use the data that the semantic metadata is about.
When this graph includes a schema or ontology defined by RDF statements, a general definition of the domain's entities, categorizations, their properties, and their relationships further enabling an unambiguous logical interpretation. This has led to the implementation of knowledge graphs and graph databases leveraging traversal algorithms, inferential engines and Natural language processing pipelines to produce 'smart data', a marketing phrase for semantic and AI-driven applications.
Uses for Semantic Metadata
As content volumes increased, improving search was the use-case that drove the uptake of semantic metadata as search algorithms were no longer dependent on optimizing keyword searching and could use semantic metadata to understand user search intention. For example, a google search for 'The wives of the Beatles' will utilize semantic metadata to understand that the user was actually searching for "Nancy Shevell, Heather Mills and Linda McCartney". Additionally, search using semantic metadata provide a number of benefits:
- Personalization. By including metadata associated with the user's profile, location, and search history to the context of the search to provide personalized user experience.
- Disambiguation. Homographs resulting in bad search results are avoided. Semantic metadata enables the search to differentiate between words that are the same but mean different things such as the fruit 'apple' versus the technology company 'Apple' or the defunct record company 'Apple'.
- Richer results. An example of this is Google Search 'Knowledge Box'. Due to the interlinking and semantic context that it provides semantic metadata enables search to return the relevant auxiliary information.
Between 2010 and 2013, Google Search moved away from optimizing keyword search to relying on semantic metadata to return search results. Every company in Big Tech utilizes semantic metadata internally and externally. Google recommends using Schema.orgs's vocabulary implemented in JSON-LD, RDFa or Microdata. Due to this, search engine optimization is often touted as an attractive benefit for using semantic metadata.
When integrating data from multiple sources, including external and internal data, semantic metadata ensures the data aligns in a meaningful and useful way irrespective of the administrative or structural metadata at the data's source. The time it takes to Extract, Transform and Load (ETL) is significantly reduced when the semantic metadata and its schema are there. That means more accurate data and more timely decision-making based on that data.
Improving text analytics
The percentage of unstructured data vastly outweighs structured data. 80% is often quoted. Text analytics would be impractical if not impossible without semantic metadata. Content producers are increasingly leveraging knowledge graphs to enhance text analytic pipelines to enrich unstructured data. A knowledge graph will be able to use the deep interlinking and context of identified terms to provide more accurate results. In turn, text analysis is able to identify new concepts and their relationship, which can be added to the knowledge graph thus further improving text analytical performance.
- "Google Acquires Metaweb To Make Search Smarter". TechCrunch. Retrieved 2020-09-28.
- "Google Hummingbird", Wikipedia, 2020-07-27, retrieved 2020-09-28
- "Semantic Metadata - an overview | ScienceDirect Topics". www.sciencedirect.com. Retrieved 2020-11-09.
- Pickell, Devin. "Structured vs Unstructured Data – What's the Difference?". learn.g2.com. Retrieved 2020-11-09.
- "What is Text Analysis? | Ontotext Fundamentals Series". Ontotext. Retrieved 2020-11-09.
- "What is Deep Text Analytics - Extract Insights from Unstructured Data". PoolParty Semantic Suite. Retrieved 2020-11-09.
This article "Semantic Metadata" is from Wikipedia. The list of its authors can be seen in its historical. Articles taken from Draft Namespace on Wikipedia could be accessed on Wikipedia's Draft Namespace.