|
Data Quality refers to the quality of data. Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J.M. Juran). Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. These two views can often be in disagreement, even about the same set of data used for the same purpose. For the Talib Kweli album Quality (album) Quality can refer to a. ...
For other uses, see Data (disambiguation). ...
The word operation can mean any of several things: The method, act, process, or effect of using a device or system. ...
Decision making is the cognitive process of selecting a course of action from among multiple alternatives. ...
For planning in AI, see automated planning and scheduling. ...
Definitions
1. Data Quality refers to the degree of excellence exhibited by the data in relation to the portrayal of the actual phenomena. GIS Glossary 2. The state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use. Government of British Columbia 3. The totality of features and characteristics of data that bears on their ability to satisfy a given purpose; the sum of the degrees of excellence for factors related to data. Glossary of Quality Assurance Terms
History Before the rise of the inexpensive server, massive mainframe computers were used to maintain name and address data so that the mail could be properly routed to its destination. The mainframes used business rules to correct common misspellings and typographical errors in name and address data, as well as to track customers who had moved, died, gone to prison, married, divorced, or experienced other life-changing events. Government agencies began to make postal data available to a few service companies to cross-reference customer data with the National Change of Address registry (NCOA). This technology saved large companies millions of dollars compared to manually correcting customer data. Large companies saved on postage, as bills and direct marketing materials made their way to the intended customer more accurately. Initially sold as a service, data quality moved inside the walls of corporations, as low-cost and powerful server technology became available. For other uses, see Mainframe. ...
USPS and Usps redirect here. ...
Companies with an emphasis on marketing often focus their quality efforts on name and address information, but data quality is recognized as an important property of all types of data. Principles of data quality can be applied to supply chain data, transactional data, and nearly every other category of data found in the enterprise. For example, making supply chain data conform to a certain standard has value to an organization by: 1) avoiding overstocking of similar but slightly different stock; 2) improving the understanding of vendor purchases to negotiate volume discounts; and 3) avoiding logistics costs in stocking and shipping parts across a large organization. While name and address data has a clear standard as defined by local postal authorities, other types of data have few recognized standards. There is a movement in the industry today to standardize certain non-address data. The non-profit group GS1 is among the groups spearheading this movement. GS1 is a global organization dedicated to the design and implementation of global standards and solutions to improve the efficiency and visibility of supply and demand chains globally and across multiple sectors. ...
For companies with significant research efforts, data quality can include developing protocols for research methods, reducing measurement error, bounds checking of the data, cross tabulation, modeling and outlier detection, verifying data integrity, etc. In the natural sciences, the terms protocol and method (as distinct from scientific method) are often used interchangeably. ...
Measurement is the determination of the size or magnitude of something. ...
In computer programming, bounds checking is the name given to any method of detecting whether or not an index given lies within the limits of an array. ...
A cross tabulation (often abbreviated as cross tab) displays the joint distribution of two or more variables. ...
Figure 1. ...
In computer science and telecommunications, the term data integrity has the following meanings: The condition in which data is identically maintained during any operation, such as transfer, storage, and retrieval. ...
Overview There are a number of theoretical frameworks for understanding data quality. One framework seeks to integrate the product perspective (conformance to specifications) and the service perspective (meeting consumers' expectations) (Kahn et al 2002). Another framework is based in semiotics to evaluate the quality of the form, meaning and use of the data (Price and Shanks, 2004). One highly theoretical approach analyzes the ontological nature of information systems to define data quality rigorously (Wand and Wang, 1996). This article is about a term used in economics. ...
Semiotics, semiotic studies, or semiology is the study of signs and symbols, both individually and grouped into sign systems. ...
This article is about the philosophical meaning of ontology. ...
Information System (example) An Information System (IS) is the system of persons, data records and activities that process the data and information in a given organization, including manual processes or automated processes. ...
A considerable amount of data quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. These lists commonly include accuracy, correctness, currency, completeness and relevance. Nearly 200 such terms have been identified and there is little agreement in their nature (are these concepts, goals or criteria?), their definitions or measures (Wang et al, 1993). Software engineers may recognise this as a similar problem to the so-called Ilities. In the fields of science, engineering, industry and statistics, accuracy is the degree of conformity of a measured or calculated quantity to its actual (true) value. ...
In theoretical computer science, correctness of an algorithm is asserted when it is said that the algorithm is correct with respect to a specification. ...
Look up completeness in Wiktionary, the free dictionary. ...
Relevance is a term used to describe how pertinent, connected, or applicable some information is to a given matter. ...
Within systems engineering, -ilities are aspects or non-functional requirements. ...
MIT has a Total Data Quality Management program, led by Professor Richard Wang, which produces a large number of publications and hosts a significant international conference in this field. Mapúa Institute of Technology (MIT, MapúaTech or simply Mapúa) is a private, non-sectarian, Filipino tertiary institute located in Intramuros, Manila. ...
In practice, data quality is a concern for professionals involved with a wide range of information systems, ranging from datawarehousing and business intelligence to customer relationship management and supply chain management. One industry study estimated the total cost to the US economy of data quality problems at over US$600 billion per annum (Eckerson, 2002). In fact, the problem is such a concern that companies are beginning to set up a data governance team whose sole role in the corporation is to be responsible for data quality. In some organisations, this data governance function has been established as part of a larger Regulatory Compliance function - a recognition of the importance of Data/Information Quality to organisations A data warehouse is a record of an enterprises past transactional and operational activities, stored in a database. ...
Business intelligence (BI) is a business management term, which refers to applications and technologies that are used to gather, provide access to, and analyze data and information about company operations. ...
Customer relationship management (CRM) is a broad term that covers concepts used by companies to manage their relationships with customers, including the capture, storage and analysis of customer, vendor, partner, and internal process information. ...
Supply chain management (SCM) is the process of planning, implementing, and controlling the operations of the supply chain as efficiently as possible. ...
Data governance encompasses the people, processes and procedures required to create a consistent, enterprise view of a companys data in order to: Increase consistency & confidence in decision making Decrease the risk of regulatory fines Improve data security Data Governance initiatives improve data quality by assigning a team responsible solely...
Data governance encompasses the people, processes and procedures required to create a consistent, enterprise view of a companys data in order to: Increase consistency & confidence in decision making Decrease the risk of regulatory fines Improve data security Data Governance initiatives improve data quality by assigning a team responsible solely...
Problems with data quality don't only arise from incorrect data. Inconsistent data is a problem as well. Eliminating data shadow systems and centralizing data in a warehouse is one of the inititatives a company can take to ensure data consistency. Shadow system is a term used in Information Services for any application relied upon for business processes that is not under the jurisdiction of a centralized Information Systems department. ...
The market is going some way to providing data quality assurance. A number of vendors make tools for analysing and repairing poor quality data in situ, service providers can clean the data on a contract basis and consultants can advise on fixing processes or systems to avoid data quality problems in the first place. Most data quality tools offer a series of tools for improving data, which may include some or all of the following: Data quality assurance is the process of profiling the data to discover inconsistencies, and other anomalies in the data and performing Data cleansing activities to improve the data quality. ...
- Data profiling - initially assessing the data to understand its quality challenges
- Data standardization - a business rules engine that ensures that data conforms to quality rules
- Geocoding - for name and address data. Corrects data to US and Worldwide postal standards
- Matching or Linking - a way to compare data so that similar, but slightly different records can be aligned. Matching may use "fuzzy logic" to find duplicates in the data. It often recognizes that 'Bob' and 'Robert' may be the same individual. It might be able to manage 'householding', or finding links between husband and wife at the same address, for example. Finally, it often can build a 'best of breed' record, taking the best components from multiple data sources and building a single super-record.
- Monitoring - keeping track of data quality over time and reporting variations in the quality of data. Software can also auto-correct the variations based on pre-defined business rules.
- Batch and Real time - Once the data is initially cleansed (batch), companies often want to build the processes into enterprise applications to keep it clean.
There are several well-known authors and self-styled experts, with Larry English perhaps the most popular guru. In addition, the International Association for Information and Data Quality (IAIDQ) was established in 2004 to provide a focal point for professionals and researchers in this field. Data profiling is a process where by one examines the data available in an existing database and collects statistics and information about that data. ...
For other uses, see Guru (disambiguation). ...
References - Eckerson, W. (2002) "Data Warehousing Special Report: Data quality and the bottom line", Article
- Kahn, B., Strong, D., Wang, R. (2002) "Information Quality Benchmarks: Product and Service Performance," Communications of the ACM, April 2002. pp. 184-192. Article
- Price, R. and Shanks, G. (2004) A Semiotic Information Quality Framework, Proc. IFIP International Conference on Decision Support Systems (DSS2004): Decision Support in an Uncertain and Complex World, Prato. Article
- Redman, T. C. (2004) Data: An Unfolding Quality Disaster Article
- Wand, Y. and Wang, R. (1996) “Anchoring Data Quality Dimensions in Ontological Foundations,” Communications of the ACM, November 1996. pp. 86-95. Article
- Wang, R., Kon, H. & Madnick, S. (1993), Data Quality Requirements Analysis and Modelling, Ninth International Conference of Data Engineering, Vienna, Austria. Article
- Fournel Michel, Accroitre la qualité et la valeur des données de vos clients, éditions Publibook, 2007. ISBN 978-2748338478.
See also Information quality (IQ) is a term to describe the quality of the content of information systems. ...
Data profiling is a process where by one examines the data available in an existing database and collects statistics and information about that data. ...
Data cleansing is the act of detecting and correcting (or removing) corrupt or inaccurate records from a record set. ...
Master Data Management (MDM), also known as Reference Data Management, is a discipline in Information Technology (IT) that focuses on the management of reference or master data that is shared by several disparate IT systems and groups. ...
Noisy Text Analytics is a process of information extraction whose goal is to automatically extract structured or semistructured information from noisy unstructured text data. ...
There are very few or no other articles that link to this one. ...
Data governance encompasses the people, processes and procedures required to create a consistent, enterprise view of a companys data in order to: Increase consistency & confidence in decision making Decrease the risk of regulatory fines Improve data security Data Governance initiatives improve data quality by assigning a team responsible solely...
External links - "Accroitre la qualité et la valeur des données de vos clients", Michel Fournel, éditions Publibook (2007), ISBN 978-2748338478. That book on Publibook That book on Amazon.fr
|