Translate

Data warehousing Architecture

Saturday, January 26, 2013

Bottom-up design (Ralph Kimball)


In the bottom-up approach, data marts are first created to provide reporting and analytical capabilities for specific business processes. It is important to note that in Kimball methodology, the bottom-up process is the result of an initial business-oriented top-down analysis of the relevant business processes to be modelled.
Data marts contain, primarily, dimensions and facts. Facts can contain either atomic data and, if necessary, summarized data. The single data mart often models a specific business area such as "Sales" or "Production." These data marts can eventually be integrated to create a comprehensive data warehouse. The integration of data marts is managed through the implementation of what Kimball calls "a data warehouse bus architecture".[10] The data warehouse bus architecture is primarily an implementation of "the bus", a collection of conformed dimensions and conformed facts, which are dimensions that are shared (in a specific way) between facts in two or more data marts.
The integration of the data marts in the data warehouse is centered on the conformed dimensions (residing in "the bus") that define the possible integration "points" between data marts. The actual integration of two or more data marts is then done by a process known as "Drill across". A drill-across works by grouping (summarizing) the data along the keys of the (shared) conformed dimensions of each fact participating in the "drill across" followed by a join on the keys of these grouped (summarized) facts.
Maintaining tight management over the data warehouse bus architecture is fundamental to maintaining the integrity of the data warehouse. The most important management task is making sure dimensions among data marts are consistent. In Kimball's words, this means that the dimensions "conform".
Some consider it an advantage of the Kimball method, that the data warehouse ends up being "segmented" into a number of logically self-contained (up to and including The Bus) and consistent data marts, rather than a big and often complex centralized model. Business value can be returned as quickly as the first data marts can be created, and the method gives itself well to an exploratory and iterative approach to building data warehouses. For example, the data warehousing effort might start in the "Sales" department, by building a Sales-data mart. Upon completion of the Sales-data mart, the business might then decide to expand the warehousing activities into the, say, "Production department" resulting in a Production data mart. The requirement for the Sales data mart and the Production data mart to be integrable, is that they share the same "Bus", that will be, that the data warehousing team has made the effort to identify and implement the conformed dimensions in the bus, and that the individual data marts links that information from the bus. Note that this does not require 100% awareness from the onset of the data warehousing effort, no master plan is required upfront. The Sales-data mart is good as it is (assuming that the bus is complete) and the Production-data mart can be constructed virtually independent of the Sales-data mart (but not independent of the Bus).
If integration via the bus is achieved, the data warehouse, through its two data marts, will not only be able to deliver the specific information that the individual data marts are designed to do, in this example either "Sales" or "Production" information, but can deliver integrated Sales-Production information, which, often, is of critical business value.

Meta data

Metadata are data about data

Data Warehouse Environment


  • Source systems that provide data to the warehouse or mart
  • Data integration technology and processes that are needed to prepare the data for use
  • Different architectures for storing data in an organization's data warehouse or data marts
  • Different tools and applications for the variety of users
  • Metadata, data quality, and governance processes must be in place to ensure that the warehouse or mart meets its purposes

Benefits of a data warehouse


  • Maintain data history, even if the source transaction systems do not.
  • Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger
  • Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
  • Provide a single common data model for all data of interest regardless of the data's source
  • Restructure the data so that it makes sense to the business users.
  • Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems
  • Add value to operational business applications, notably customer relationship management (CRM) systems.

What is ETL

The typical ETL-based data warehouse uses staging, data integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database

This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, cataloged and made available for use by managers and other business professionals for data miningonline analytical processing, market research and decision support (Marakas & O'Brien 2009). However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.

What is Data Mart

Data warehouses can be subdivided into data marts. Data marts store subsets of data from a warehouse.

What is a Data Warehouse

It is a central repository of data which is created by integrating data from one or more disparate sources. Data warehouses store current as well as historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.
The data stored in the warehouse are uploaded from the operational systems (such as marketing, sales etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before they are used in the Data Warehouse for reporting.

Sign up for PayPal and start accepting credit card payments instantly.