FACTOID # 103: The ten most generous countries are all in Europe.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Data warehouse

A data warehouse is the main repository of an organization's historical data, its corporate memory. It contains the raw material for management's decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems... Corporate Memory (CM) can be defined as the total body of data, information and knowledge required to deliver the strategic aims and objectives of an organization. ... Decision support systems are a class of computer-based information systems including knowledge based systems that support decision making activities. ... Data mining is the principle of sorting through large amounts of data and picking out relevant information. ... In Data Warehousing, an Operational System is the Software System which uses a Database for doing daily operations of an organization. ...


Bill Inmon, an early and influential practitioner, has formally defined a data warehouse in the following terms; Bill Inmon (William Harvey Inmon) (b. ...

Subject-oriented 
The data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;
Time-variant 
The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;
Non-volatile 
Data in the database is never over-written or deleted - once committed, the data is static, read-only, but retained for future reporting; and
Integrated 
The database contains data from most or all of an organization's operational applications, and that this data is made consistent.

A data warehouse might be used to find the day of the week on which a company sold the most widgets in May 1992, or how employee sick leave the week before the winter break differed between California and New York from 2001–2005. Look up widget in Wiktionary, the free dictionary. ...


While operational systems are optimized for simplicity and speed of modification (see OLTP) through heavy use of database normalization and an entity-relationship model, the data warehouse is optimized for reporting and analysis (online analytical processing, or OLAP). Frequently data in data warehouses are heavily denormalised, summarised or stored in a dimension-based model. However, this is not always required to achieve acceptable query response times. OLTP (Online Transaction Processing) is a form of transaction processing conducted via computer network. ... Database normalization is a design technique for structuring relational database tables. ... Entity-relationship model is a data modeling method used to model a system and its requirements in a top-down approach. ... Online Analytical Processing, or OLAP (IPA: ), is an approach to quickly provide answers to analytical queries that are multidimensional in nature. ... Denormalization is the process of attempting to optimize the performance of a database by adding redundant data. ... The star schema (sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a single fact table with a compound primary key, with one segment for each dimension and with additional columns of additive, numeric facts. ...

Contents

History

Data Warehouses are a distinct type of computer database that were first developed during the late 1980s and early 1990s. They were developed to meet a growing demand for management information and analysis that could not be met by operational systems. Operational systems were unable to meet this need for a range of reasons: This article is about the machine. ... This article is about computing. ...

  • The processing load of reporting reduced the response time of the operational systems,
  • The database designs of operational systems were not optimized for information analysis and reporting,
  • Most organizations had more than one operational system, so company-wide reporting could not be supported from a single system
  • Development of reports in operational systems often required writing specific computer programs which was slow and expensive

As a result, separate computer databases began to be built that were specifically designed to support management information and analysis purposes. These data warehouses were able to bring in data from a range of different data sources, such as mainframe computers, minicomputers, as well as personal computers and office automation software such as spreadsheet, and integrate this information in a single place. This capability, coupled with user-friendly reporting tools and freedom from operational impacts, has led to a growth of this type of computer system. For other uses, see Mainframe. ... Minicomputer (colloquially, mini) is a largely obsolete term for a class of multi-user computers which make up the middle range of the computing spectrum, in between the largest multi-user systems (traditionally, mainframe computers) and the smallest single-user systems (microcomputers or personal computers). ... Screenshot of a spreadsheet under OpenOffice A spreadsheet is a rectangular table (or grid) of information, often financial information. ...


As technology improved (lower cost for more performance) and user requirements increased (faster data load cycle times and more features), data warehouses have evolved through several fundamental stages:

Off line Operational Databases 
Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system's performance.
Off line Data Warehouse 
Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data structure.
Real Time Data Warehouse 
Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.)
Integrated Data Warehouse 
Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.

Architecture

The concept of "data warehousing" dates back at least to the mid-1980s, and possibly earlier. In essence, it was intended to provide an architectural model for the flow of data from operational systems to decision support environments. It attempted to address the various problems associated with this flow, and the high costs associated with it. In the absence of such an architecture, there usually existed an enormous amount of redundancy in the delivery of management information. In larger corporations it was typical for multiple decision support projects to operate independently, each serving different users but often requiring much of the same data. The process of gathering, cleaning and integrating data from various sources, often legacy systems, was typically replicated for each project. Moreover, legacy systems were frequently being revisited as new requirements emerged, each requiring a subtly different view of the legacy data. In Data Warehousing, an Operational System is the Software System which uses a Database for doing daily operations of an organization. ... Decision support systems are a class of computerized information systems that support decision making activities. ... A legacy system is an antiquated computer system or application program which continues to be used because the user (typically an organisation) does not want to replace or redesign it. ...


Based on analogies with real-life warehouses, data warehouses were intended as large-scale collection/storage/staging areas for corporate data. From here data could be distributed to "retail stores" or "data marts" which were tailored for access by decision support users (or "consumers"). While the data warehouse was designed to manage the bulk supply of data from its suppliers (e.g. operational systems), and to handle the organization and storage of this data, the "retail stores" or "data marts" could be focused on packaging and presenting selections of the data to end-users, to meet specific management information needs. A data mart (DM) is a specialized version of a data warehouse (DW). ... Decision support systems are a class of computerized information systems that support decision making activities. ... A data mart (DM) is a specialized version of a data warehouse (DW). ... Economics and commerce define an end-user as the person who uses a product. ...


Somewhere along the way this analogy and architectural vision was lost, as some vendors and industry speakers redefined the data warehouse as simply a management reporting database. This is a subtle but important deviation from the original vision of the data warehouse as the hub of a management information architecture, where the decision support systems were actually the data marts or "retail stores". Decision support systems are a class of computerized information systems that support decision making activities. ... A data mart (DM) is a specialized version of a data warehouse (DW). ...


Storage

In OLTP — online transaction processing systems relational database design use the discipline of data modeling and generally follow the Codd rules of data normalization in order to ensure absolute data integrity. Less complex information is broken down into its most simple structures (a table) where all of the individual atomic level elements relate to each other and satisfy the normalization rules. Codd defines 5 increasingly stringent rules of normalization and typically OLTP systems achieve a 3rd level normalization. Fully normalized OLTP database designs often result in having information from a business transaction stored in dozens to hundreds of tables. Relational database managers are efficient at managing the relationships between tables and result in very fast insert/update performance because only a little bit of data is affected in each relational transaction. OLTP (Online Transaction Processing) is a form of transaction processing conducted via computer network. ... A relational database is a database that conforms to the relational model, and refers to a databases data and schema (the databases structure of how that data is arranged). ... In computer science, data modeling is the process of creating a data model by applying a data model theory to create a data model instance. ... Edgar Ted Codd Edgar F. Ted Codd (August 23, 1923 – April 18, 2003) was a British computer scientist who made seminal contributions to the theory of relational databases. ... // Database normalization is a design technique by which relational database tables are structured in such a way as to make them less vulnerable to certain types of logical inconsistencies and anomalies. ... In computer science and telecommunications, the term data integrity has the following meanings: The condition in which data is identically maintained during any operation, such as transfer, storage, and retrieval. ... Database normalization is a design technique for structuring relational database tables. ... Edgar Ted Codd Edgar F. Ted Codd (August 23, 1923 – April 18, 2003) was a British computer scientist who made seminal contributions to the theory of relational databases. ... OLTP (Online Transaction Processing) is a form of transaction processing conducted via computer network. ...


OLTP databases are efficient because they are typically only dealing with the information around a single transaction. In reporting and analysis, thousands to billions of transactions may need to be reassembled imposing a huge workload on the relational database. Given enough time the software can usually return the requested results, but because of the negative performance impact on the machine and all of its hosted applications, data warehousing professionals recommend that reporting databases be physically separated from the OLTP database. OLTP (Online Transaction Processing) is a form of transaction processing conducted via computer network. ...


In addition, data warehousing suggests that data be restructured and reformatted to facilitate query and analysis by novice users. OLTP databases are designed to provide good performance by rigidly defined applications built by programmers fluent in the constraints and conventions of the technology. Add in frequent enhancements, and too many a database is just a collection of cryptic names, seemingly unrelated and obscure structures that store data using incomprehensible coding schemes; all factors that while improving performance, complicate use by untrained people. Lastly, the data warehouse needs to support high volumes of data gathered over extended periods of time and are subject to complex queries and need to accommodate formats and definitions inherited from independently designed package and legacy systems.


Designing the data warehouse data Architecture synergy is the realm of Data Warehouse Architects. The goal of a data warehouse is to bring data together from a variety of existing databases to support management and reporting needs. The generally accepted principle is that data should be stored at its most elemental level because this provides for the most useful and flexible basis for use in reporting and information analysis. However, because of different focus on specific requirements, there can be alternative methods for design and implementing data warehouses. There are two leading approaches to organizing the data in a data warehouse: the dimensional approach advocated by Ralph Kimball and the normalized approach advocated by Bill Inmon. Whilst the dimension approach is very useful in data mart design, it can result in a rats nest of long term data integration and abstraction complications when used in a data warehouse . Ralph Kimball, PhD, is a widely published author on the subject of data warehousing and business intelligence. ... Bill Inmon (William Harvey Inmon) (b. ...


In the "dimensional" approach, transaction data is partitioned into either a measured "facts" which are generally numeric data that captures specific values or "dimensions" which contain the reference information that gives each transaction its context. As an example, a sales transaction would be broken up into facts such as the number of products ordered, and the price paid, and dimensions such as date, customer, product, geographical location and salesperson. The main advantages of a dimensional approach is that the data warehouse is easy for business staff with limited information technology experience to understand and use. Also, because the data is pre-joined into the dimensional form, the data warehouse tends to operate very quickly. The main disadvantage of the dimensional approach is that it is quite difficult to add or change later if the company changes the way in which it does business. Information and communication technology spending in 2005 Information technology (IT), as defined by the Information Technology Association of America (ITAA), is the study, design, development, implementation, support or management of computer-based information systems, particularly software applications and computer hardware. ...


The "normalized" approach uses database normalization. In this method, the data in the data warehouse is stored in third normal form. Tables are then grouped together by subject areas that reflect the general definition of the data (customer, product, finance, etc.) The main advantage of this approach is that it is quite straightforward to add new information into the database — the primary disadvantage of this approach is that because of the number of tables involved, it can be rather slow to produce information and reports. Furthermore, since the segregation of facts and dimensions is not explicit in this type of data model, it is difficult for users to join the required data elements into meaningful information without a precise understanding of the data structure. Database normalization is a design technique for structuring relational database tables. ... The third normal form (3NF) is a normal form used in database normalization to check if all the non-key attributes of a relation depend only on the candidate keys of the relation. ... A data model is a model that describes how data are represented and used in an abstract way. ... A binary tree, a simple type of branching linked data structure. ...


Subject areas are just a method of organizing information and can be defined along any lines. The traditional approach has subjects defined as the subjects or nouns within a problem space. For example, in a financial services business, you might have customers, products and contracts. An alternative approach is to organize around the business transactions, such as customer enrollment, sales and trades.


Advantages

There are many advantages to using a data warehouse, some of them are:

  • Data warehouses enhance end-user access to a wide variety of data.
  • Decision support system users can obtain specified trend reports, e.g. the item with the most sales in a particular area within the last two years.
  • Data warehouses can be a significant enabler of commercial business applications, particularly customer relationship management (CRM) systems.

Customer relationship management (CRM) is a broad term that covers concepts used by companies to manage their relationships with customers, including the capture, storage and analysis of customer, vendor, partner, and internal process information. ...

Concerns

  • Extracting, transforming and loading data consumes a lot of time and computational resources.
  • Data warehousing project scope must be actively managed to deliver a release of defined content and value.
  • Compatibility problems with systems already in place.
  • Security could develop into a serious issue, especially if the data warehouse is web accessible.
  • Data Storage design controversy warrants careful consideration and perhaps prototyping of the data warehouse solution for each project's environments.

See also

A bitmap index is a special kind of index that stores the bulk of its data as bit arrays (commonly called bitmaps) and answers most queries by performing bitwise logical operations on these bitmaps. ... The term business intelligence (BI) dates to 1958. ... Business performance management (BPM) is a set of processes that help organizations optimize their business performance. ... Data integration is the process of combining data residing at different sources and providing the user with a unified view of these data [1]. This process emerges in a variety of situations both commercial (when two similar companies need to merge their databases) and scientific (combining research results from different... A data mart (DM) is a specialized version of a data warehouse (DW). ... Data mining is the principle of sorting through large amounts of data and picking out relevant information. ... A database management system (DBMS) is computer software designed for the purpose of managing databases. ... An Executive Information System (EIS) is a computer-based system intended to facilitate and support the information and decision making needs of senior executives by providing easy access to both internal and external information relevant to meeting the strategic goals of the organization. ... ETL also means Express Toll Lanes, see Express Toll Lanes. ... Intelligent Documents is a general term to describe web documents with more functionality than a page designed to emulate paper. ... Master Data Management (MDM), also known as Reference Data Management, is a discipline in Information Technology (IT) that focuses on the management of reference or master data that is shared by several disparate IT systems and groups. ... Online Analytical Processing, or OLAP (IPA: ), is an approach to quickly provide answers to analytical queries that are multidimensional in nature. ... OLTP (Online Transaction Processing) is a form of transaction processing conducted via computer network. ... According to Bill Inmon, an operational data store (ODS) is a subject-oriented, integrated, volatile, current-valued, detailed-only collection of data in support of an organizations need for up-to-the-second, operational, integrated, collective information. ... Shadow system is a term used in Information Services for any application relied upon for business processes that is not under the jurisdiction of a centralized Information Systems department. ... The introduction to this article provides insufficient context for those unfamiliar with the subject matter. ... The star schema (sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a single fact table with a compound primary key, with one segment for each dimension and with additional columns of additive, numeric facts. ...

References

  • William H. Inmon, Richard D. Hackathorn: Using the Data Warehouse, John Wiley & Son's, ISBN 0-471-05966-8
  • Pyle, Dorian. Business Modeling and Data Mining. Morgan Kaufmann, 2003. ISBN 1-55860-653-X
  • Ralph Kimball, Margy Ross: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition), John Wiley & Sons, ISBN 0-471-20024-7
  • Stephen Haag, Maeve Cummings, Donald J. McCubbery, Alain Pinsonneault, Richard Donvan:Managements Information System for the Information Age, Third Canadian Edition, McGraw-Hill Ryerson, ISBN 0-07-095569-7
  • Data Warehousing Gotchas
  • ODP Resources on Data Warehouse
  • ODP Articles on Data Warehouse
  • Greenfield, Larry. DW InfoCenter
  • Manning, Ian Data Warehousing - What Is It. http://www.datawarehouse.com/article/?articleId=3204&searchTerm=dilemma
  • Agile Best Practices for Data Warehousing
  • Darmawikarta, Djoni (2007); Dimensional Data Warehousing with MySQL, Pub. BrainySoftware. ISBN 0975212826.

  Results from FactBites:
 
MySQL AB :: Data Warehouse & Berichtswesen (0 words)
Während noch vor wenigen Jahren die meisten Daten in zentralen Data-Warehouse- und wenigen abteilungsinternen Datamart-Systemen gespeichert wurden, registrieren wir heute den Einsatz kleinerer verteilter Datendepots überall in Unternehmen.
Jetzt Webseminar "MySQL for Data Warehouse" abrufen »
Und da MySQL sich durch seine hohe Benutzerfreundlichkeit auszeichnet, können sie dies mit nur wenigen oder gar keinen Data-Warehouse-Spezialisten und Datenbankadministratoren bewerkstelligen.
Data Warehousing Technology (3513 words)
Universal data access means that, theoretically at least, end-users, regardless of location or Information Access tool, should be able to access any or all of the data in the enterprise that is necessary for them to do their job.
Data Staging is also called copy management or replication management, but in fact, it includes all of the processes necessary to select, edit, summarize, combine and load data warehouse and information access data from operational and/or external databases.
The scope of a data warehouse may be as broad as all the informational data for the entire enterprise from the beginning of time, or it may be as narrow as a personal data warehouse for a single manager for a single year.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.