- Business Objects
- IBM Cognos
- SQL Server Analysis Services
- MicroStrategy
- Palo OLAP Server
Translate
Friday, February 8, 2013
Popular OLAP Tools
OLAP Tool Functionalities
Before we speak about OLAP tool selection criterion, we must first distinguish between the two types of OLAP tools, MOLAP (Multidimensional OLAP) and ROLAP (Relational OLAP).
1. MOLAP: In this type of OLAP, a cube is aggregated from the relational data source (data warehouse). When user generates a report request, the MOLAP tool can generate the create quickly because all data is already pre-aggregated within the cube.
2. ROLAP: In this type of OLAP, instead of pre-aggregating everything into a cube, the ROLAP engine essentially acts as a smart SQL generator. The ROLAP tool typically comes with a 'Designer' piece, where the data warehouse administrator can specify the relationship between the relational tables, as well as how dimensions, attributes, and hierarchies map to the underlying database tables.
Right now, there is a convergence between the traditional ROLAP and MOLAP vendors. ROLAP vendor recognize that users want their reports fast, so they are implementing MOLAP functionalities in their tools; MOLAP vendors recognize that many times it is necessary to drill down to the most detail level information, levels where the traditional cubes do not get to for performance and size reasons.
So what are the criteria for evaluating OLAP vendors? Here they are:
ETL PROCESS
The three-stage ETL process and the ETL tools implementing the concept might be a response for the needs described above.
The ‘ETL’ shortcut comes from 'Extract, transform, and load' – the words that describe the idea of the system. The ETL tools were created to improve and facilitate data warehousing.
The ‘ETL’ shortcut comes from 'Extract, transform, and load' – the words that describe the idea of the system. The ETL tools were created to improve and facilitate data warehousing.
- The Etl process consists of the following steps:
- Initiation
- Build reference data
- Extract from sources
- Validate
- Transform
- Load into stages tables
- Audit reports
- Publish
- Archive
- Clean up
Sometimes those steps are supervised and performed indirectly but its very time-consuming and may be not so accurate.
The purpose of using ETL Tools is to save the time and make the whole process more reliable.
The purpose of using ETL Tools is to save the time and make the whole process more reliable.
Data Warehousing tools
- Comercial ETL Tools:
- IBM Infosphere DataStage
- Informatica PowerCenter
- Oracle Warehouse Builder (OWB)
- Oracle Data Integrator (ODI)
- SAS ETL Studio
- Business Objects Data Integrator(BODI)
- Microsoft SQL Server Integration Services(SSIS)
- Ab Initio
- Freeware, open source ETL tools:
- Pentaho Data Integration (Kettle)
- Talend Integrator Suite
- CloverETL
- Jasper ETL
Saturday, January 26, 2013
Bottom-up design (Ralph Kimball)
In the bottom-up approach, data marts are first created to provide reporting and analytical capabilities for specific business processes. It is important to note that in Kimball methodology, the bottom-up process is the result of an initial business-oriented top-down analysis of the relevant business processes to be modelled.
Data marts contain, primarily, dimensions and facts. Facts can contain either atomic data and, if necessary, summarized data. The single data mart often models a specific business area such as "Sales" or "Production." These data marts can eventually be integrated to create a comprehensive data warehouse. The integration of data marts is managed through the implementation of what Kimball calls "a data warehouse bus architecture".[10] The data warehouse bus architecture is primarily an implementation of "the bus", a collection of conformed dimensions and conformed facts, which are dimensions that are shared (in a specific way) between facts in two or more data marts.
The integration of the data marts in the data warehouse is centered on the conformed dimensions (residing in "the bus") that define the possible integration "points" between data marts. The actual integration of two or more data marts is then done by a process known as "Drill across". A drill-across works by grouping (summarizing) the data along the keys of the (shared) conformed dimensions of each fact participating in the "drill across" followed by a join on the keys of these grouped (summarized) facts.
Maintaining tight management over the data warehouse bus architecture is fundamental to maintaining the integrity of the data warehouse. The most important management task is making sure dimensions among data marts are consistent. In Kimball's words, this means that the dimensions "conform".
Some consider it an advantage of the Kimball method, that the data warehouse ends up being "segmented" into a number of logically self-contained (up to and including The Bus) and consistent data marts, rather than a big and often complex centralized model. Business value can be returned as quickly as the first data marts can be created, and the method gives itself well to an exploratory and iterative approach to building data warehouses. For example, the data warehousing effort might start in the "Sales" department, by building a Sales-data mart. Upon completion of the Sales-data mart, the business might then decide to expand the warehousing activities into the, say, "Production department" resulting in a Production data mart. The requirement for the Sales data mart and the Production data mart to be integrable, is that they share the same "Bus", that will be, that the data warehousing team has made the effort to identify and implement the conformed dimensions in the bus, and that the individual data marts links that information from the bus. Note that this does not require 100% awareness from the onset of the data warehousing effort, no master plan is required upfront. The Sales-data mart is good as it is (assuming that the bus is complete) and the Production-data mart can be constructed virtually independent of the Sales-data mart (but not independent of the Bus).
If integration via the bus is achieved, the data warehouse, through its two data marts, will not only be able to deliver the specific information that the individual data marts are designed to do, in this example either "Sales" or "Production" information, but can deliver integrated Sales-Production information, which, often, is of critical business value.
Data Warehouse Environment
- Source systems that provide data to the warehouse or mart
- Data integration technology and processes that are needed to prepare the data for use
- Different architectures for storing data in an organization's data warehouse or data marts
- Different tools and applications for the variety of users
- Metadata, data quality, and governance processes must be in place to ensure that the warehouse or mart meets its purposes
Benefits of a data warehouse
- Maintain data history, even if the source transaction systems do not.
- Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger
- Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
- Provide a single common data model for all data of interest regardless of the data's source
- Restructure the data so that it makes sense to the business users.
- Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems
- Add value to operational business applications, notably customer relationship management (CRM) systems.
What is ETL
The typical ETL-based data warehouse uses staging, data integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database
This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, cataloged and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support (Marakas & O'Brien 2009). However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.
This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, cataloged and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support (Marakas & O'Brien 2009). However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.
What is Data Mart
Data warehouses can be subdivided into data marts. Data marts store subsets of data from a warehouse.
What is a Data Warehouse
It is a central repository of data which is created by integrating data from one or more disparate sources. Data warehouses store current as well as historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.
The data stored in the warehouse are uploaded from the operational systems (such as marketing, sales etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before they are used in the Data Warehouse for reporting.
The data stored in the warehouse are uploaded from the operational systems (such as marketing, sales etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before they are used in the Data Warehouse for reporting.
Subscribe to:
Posts (Atom)
