eSiksha
 Login    Password        Sign Up   Forgot Password
Friday, November 22, 2024


    

Site Search

 

 M.C.S.E.

 Home
 Core Papers
 
70-210 
 
70-215 
 
70-216 
 
70-217 
 
Core-Electives
 
70-219
 
70-220
 
70-221 
 
Elective Papers
 
70-219
 70-220
 
70-221
 
70-222
 
70-223
 
70-224
 
70-228
 
70-229 
 
Non-Retiring NT4
 Electives 
 
70-019
 
70-028 
 
70-029 
 
70-056
 
70-080 
 
70-081
 
70-086 
 
70-088 
 
Upgrading NT4 to 
 2000

 
70-240 
  

 

 COMPUTERS

 Home 
 
MCSD Cert.
 
Cisco Cert. 
 
Overview 
 
The Work 
 
Areas of Work 
 
Eligibility 
 
Career Prospects 
 
Remuneration 

 

T
R
A
C
K
S
 MBA
 
Engineering
 
Medical
 
Humanities
 
Sciences
 
Computers
 
Govt. Exams
 
Commerce
 
School/+2

Developing & Implementing Data Warehouses with SQL Server 7.0

 

Overview of Datawarehousing

  • a data warehouse is a database containing data that usually represents the business history of an organization

  • must store many years of information and be able to query quickly

  • data is organized to support analysis rather than to process real-time transactions

  • OLAP - provides a multidimensional presentation of data warehouse data

  • data mining - finding patterns among the information

  • data mart - smaller version of a data warehouse

  • when designing a data warehouse, use either:

    1. top down approach - design the big warehouse first, then divide it into smaller marts

    2. bottom up approach - design the smaller units first, then integrate them into one big warehouse

Characteristics of data

  • data is stored centrally

  • converts to the same format in order to be consistent

  • only includes subject oriented key information

  • historical

  • read only, fast query and retrieval is the key - need appropriate level of summarization

  • fact table contains only number and key to reference textual data

  • dimension table contains descriptions of data in facts table to provide a hierarchy structure

Warehousing Processes

  1. retrieve data from operational systems such as RDBMS, tape or external data

  2. transform data to appropriate format

  3. populate the warehouse

  4. user can then query data with tools such as EIS

 

Design Considerations

  • human factors: use default character plus unicode data types for all languages, or use same code page for client and server unless they use only the first 128 char

  • file group: use file groups to create database across drives to improve performance

  • fault tolerant: disk striping with parity, disk mirror

Business Analysis Process

  • identify objectives (eg. the need to know which factor led to a drop in sales)

  • gather and analyze information; find out from workers what sort of information is needed for the data warehouse

  • identify only one segment at first to build a data mart

  • create conceptual data model by pencil

  • identify datasources that have integrity

  • determine the duration of data kept in warehouse – recommended: 2 years (multi-duration for different data mart)

 

Schemas

Star schema

  • most popular

  • has one central fact table

  • dimension table has a 1-to-many relationship-to-fact table

  • dimension table can be updated, while fact table is more long term

  • fact table uses multipart key, can contain pre-calculated summarized value

  • larger fact table can be fast as it contains more pre-calculated value, but at the expense of higher storage and maintenance cost

Snow flake schema

  • variation of star, more levels of dimensional tables, more normalized, lower row count, more complex, more joins

  • once in the lowest level of detail, cannot go down further

System Requirements

Hardware/Software

Requirements

Computer

DEC Alpha AXP or X86

Memory

32 megabytes (MB) minimum (64 MB recommended)

Hard disk space

35 - 50 MB (85 MB including common files and samples)

Server Operating system

Microsoft Windows NT® Server 4.0 with Service Pack 4 or later (2)
Windows NT Workstation 4.0 with Service Pack 4 (2)

Client Operating system

Microsoft Windows® 95 + DCOM95
Windows 95 OSR2 + DCOM95
Windows 98 Internet Explorer is required for Microsoft Management Console (MMC) and HTML Help

Installation Options

  • OLAP server

  • OLAP Manager

  • Client Components (client components and other related files). Required to run the OLAP Manager and the OLAP server

  • Sample applications

Installation Parameters

  • -r causes Setup.exe to automatically generate a silent response file (.iss)

  • -s performs a silent unattended installation using the response file created with -r

  • -f1<path\ResponseFile> defines alternate location and name of the response file (.iss file)

  • -f2<path\LogFile> defines alternate location and name of the log file

OLAP Manager

  • console application that provides a robust user interface for accessing OLAP servers and the metadata repositories that define multidimensional database structures

Key functions of OLAP Manager

  • Define databases and data sources

  • Build and process cubes

  • Specify storage options and optimize query performance

  • Manage server security

  • Browse data sources, shared dimensions, and security roles

OLAP Cube Architecture

  • multidimensional representation of detail and summary data

  • consists of a data source, dimensions, measures, and partitions

  • data source identifies and connects to the database containing the data warehouse data that is the source of data for the cube

  • dimensions map data warehouse dimension table information into a hierarchy of levels, and can be created for use in an individual cube or multiple cubes

  • virtual dimension maps the properties of members of another dimension into a dimension that can then be used in cubes

  • measures which are displayed in rows and columns are to be used to identify the numerical values from the fact table that are summarized for analysis, and form the core of cube information presented to users

  • fact table columns can be additive or nonadditive, and OLAP Services can use both types as measures

  • partitions are the multidimensional storage containers that hold cube data

  • user-defined partitions are available only in SQL Server™ OLAP Services Enterprise Edition

  • design is based on the analytical requirements of users

  • a virtual cube is a logical view of portions of one or more cubes that can be used to join relatively unlike cubes that share a common dimension

  • cubes require substantial storage to contain the data and precalculated summary information in multidimensional structures - aggregations are precalculated summaries of data that provide the mechanism rapid query

  • precalculation of all possible aggregations in a cube results in the fastest possible response time for all queries at the expense of storage and processing time. If no aggregations are precalculated (0%), little storage space is required beyond that necessary to store the base data

  • another factor that affects storage requirements is sparsity, which is the amount of empty cells in a cube. SQL Server uses the following techniques to minimize the storage requirement: Storage is not allocated for empty cells, data compression is employed, a sophisticated algorithm designs efficient summary aggregations

Cube Storage Modes

MOLAP storage

  • multidimensional structure to contain aggregations and a copy of the base data

  • provides the most rapid query response times

  • more appropriate for cubes frequently used, and for rapid query response

ROLAP storage

  • uses tables in the data warehouse relational database to store a cube’s aggregations

  • does not store a copy of the base data

  • query response is generally slower

  • typically used for large data sets that are infrequently queried, such as historical data from less recent previous years

  • aggregations cannot be created for a cube with ROLAP storage if the data source is OLAP Services

HOLAP storage

  • combines attributes of both MOLAP and ROLAP in that aggregation data is stored in MOLAP structures, while base data is left in the data warehouse’s relational database

  • generally suitable for cubes that require rapid query response for summaries based on a large amount of base data

Partitions

  • cubes can be divided into partitions, each of which can be stored using a different mode

  • partitions are invisible to user, and can be stored on different servers, providing a clustered approach to cube storage

  • OLAP Services provides a Partition wizard to assist in creating partitions

  • cube may return incorrect results for some queries if a portion of the cube's data is included in more than one of its partitions

  • partitions of a cube can be merged

Processing Cube

  • process - a complete load of the cube, i.e. all dimension and fact table data is read and all specified aggregations are calculated

  • changes in the data warehouse schema that affect the structure of cubes require those cubes to have their structure changed and then be processed, while changes in (or additions to) data in the data warehouse do not require cubes to be completely processed

Incrementally updating a cube

  • appropriate when new data is to be added to a cube

  • does not require that the cube be processed.

  • does not affect the existing data that has already been processed

  • can be performed while users continue to query the cube

  • after update, users have access to the additional data without having to disconnect and reconnect

Refreshing a cube’s Data

  • causes a cube’s data to be cleared and reloaded and its aggregations recalculated

  • appropriate when the underlying data in the data warehouse has changed but the cube’s structure remains the same

  • faster, as aggregation tables do not have to be redesigned

  • can be performed while users continue to query the cube and users will have access to the updated data without having to disconnect and reconnect

Fact table and partition arrangement

  • with different fact tables for a partition, all fact tables and dimensions for a cube’s partitions must have the same structure as the cube’s fact tables and dimensions

  • you should ensure that no data is duplicated among the fact tables by using filter, although duplicated data is technically allowed

  • with same fact table for multiple partitions, data items should not be used in more than one partition – use filters for all partitions in a cube to extract mutually exclusive data sets from the fact table

Merging partitions

  • partitions can be merged only if they have the same structure, they are stored in the same mode (MOLAP, HOLAP, or ROLAP), and contain identical aggregation designs

  • you can copy the aggregation design from another of the cube’s partitions when creating the partition in the Partition wizard to ensure that these partitions have the same aggregation design

  • fact tables are not merged automatically when you merge partitions. You must do it manually

  • filters of both partitions are ORed together to create a filter for the resulting partition which specifies the set of facts used in the resulting partition

  • when merging partitions that use different fact tables, the resulting partition will refer only to the target partition’s fact table. Facts from the source partition’s fact table must be merged manually

  • a merged MOLAP (multidimensional OLAP) partition with an incomplete fact table contains an internally merged copy of fact table data and will operate correctly until it is processed merged; a HOLAP (hybrid OLAP) or ROLAP (relational OLAP) partition with an incomplete fact table contains accurate aggregations, but incomplete facts, which leads to incorrect returned data

  • absence of unavailable facts might not be noticed unless a user attempts to drill down to a fact in the unavailable table, or executes a query that requires a fact from the unavailable table

  • when merging partitions that were created by specifying data slices in the Partition wizard, the merged partition can contain unexpected incorrect data unless you create a filter that specifies the data in the resultant partition.

  • data Slice is specified when you create a partition using the Partition wizard. The wizard attempts to create a filter on the fact table to specify the data to be included in the partition if the level’s MemberKeyColumn and MemberNameColumn properties point to the same column. This is the default if the dimension or cube has not been edited

Dimension

Dimension tables

  • contain the detail and describe the fact table

  • has fewer rows than fact table

  • columns are mainly character type

  • conventional dimensions: eg, time dimension such as month, date.... location dimension base don country, city...etc

  • shared dimensions - data mart sharing each other's dimension table

  • degenerate dimensions: represent a business event but not a numeric fact, has no association to any dimension table

  • junk dimensions: attribute not related to business objective, but is still important

  • fact table is much larger then dimension table, so judge the required size based on fact table

Defining Dimensions

  • each column in a dimension contributes a level to the dimension, which are ordered by specificity and organized in a hierarchy that allows logical avenues for drill down

  • each level contains members that are the values within the column that defines the level

  • in tabular browsers, members provide the column headings, row headings, and subheadings by which measures are separated and displayed to cube users

  • in graphical browsers, members provide other types of descriptive labels that serve the same function as in tabular browsers

  • each dimension table’s primary key must join to a foreign key in a cube’s fact table or another dimension table, although key columns are not required in the dimension definition

  • multiple-hierarchy dimensions provide similar yet alternate views of cube data, as two or more dimensions with names that share the same prefix, followed by a period, with different suffixes

Rebuilding Structure

  • re-creates and loads the dimension:

  • required after the structure of the dimension is changed, or that relationships between members in the dimension hierarchy are changed

  • when a shared dimension is processed with the rebuild the dimension structure option, all cubes that incorporate the shared dimension immediately become unavailable to users and must be processed before they can be used again

  • when a shared dimension’s structure is edited and saved but not processed, it will automatically be processed when any cube incorporating the dimension is processed. Any other cubes that incorporate the dimension immediately become unavailable to users and must be processed before they can be used again

  • you can incrementally update a dimension, which updates a dimension when new members have been added but no structural changes have been made

  • cube that incorporates a shared dimension remains available to users while the dimension is incrementally updated

  • added dimension members are available in the cube after the update is complete

Calculated Members

  • dimension member whose value is calculated at run time using an expression that you specify when you define the calculated member

  • enables you to add members and measures to a cube without increasing its size

  • only the definitions for calculated members are stored, and values are calculated in memory

 

Virtual Cubes

  • a combination of multiple cubes in one logical cube

  • can also be based on a single cube to expose only selected subsets of its measures and dimensions

  • stores only their definitions, and not the data of their component cubes

  • requires virtually no physical storage space

  • provides security function by limiting some users’ view of the underlying cubes

  • after creation, you must process it in order to establish the internal links to the specified dimensions and measures in its underlying cube or cubes

Member Properties

  • attribute of a dimension member

  • can be used in queries and thus provide end users with more options when analyzing cube data

  • acts as the basis of virtual dimensions.

  • values for member properties must be read from a column in the same dimension table as the associated members

  • created in the Dimension editor by associating the column that contains values for the member property with the level that contains the members

 

Virtual Dimension

  • logical dimension based on a property of a level in a physical dimension

  • enables end users to analyze cube data based on the member properties of dimension level members in a cube

  • can be added to a cube only if the dimension that supplies its member property is also included in the cube

  • does not increase the cube’s size because, unlike an ordinary dimension, aggregation data is not stored in the cube

  • does not affect cube processing time because it is calculated in memory when needed

Write-Enabling

  • client applications can record changes to the cube’s data

  • allows end users to explore scenarios by changing cell values and analyzing the effects of the changes on cube data

  • end user’s change is stored in the write-back table as a difference from the currently displayed value, although original value in the cube is preserved and an audit trail of changes is recorded in the write-back table

  • changes can only be made to cells at the lowest level - cells that do not contain aggregated information

  • separate write back table facilitates conversion to a partition to permanently incorporate changes into the cube or to discard, which returns the cube to its original state

  • end user is permitted to record changes in a cube’s write-back table only if the user belongs to a role with read/write permissions assigned to the cube

Data and Structural Changes

  • data additions can be managed by carefully defining partition filters, and by designing a strategy to synchronize OLAP and data warehouse data

  • changes to correct errors in a data warehouse can be minimized by applying care during the data transformation, validation, and scrubbing operations

  • changes to correct errors in basic data should be incorporated in the source database and then migrated to the data warehouse in a controlled manner

  • changes to data in the fact table can affect the accuracy of queries to a cube until the cube is processed. The refresh data processing method can be used to reload the cube’s data and recalculate the aggregations

Visibility to clients during changes

  • when a cube that is currently online is processed, it remains online until the processing has been completed, at which time the online cube is replaced by the new cube version

  • when a cube is processed using the full process method, online clients will be disconnected from the cube when the switch is made to the new version of the cube, and the clients must individually reconnect to access the new version

  • when a cube is processed using either the incremental update or the refresh data method, online clients will not be disconnected from the cube when the processing completes. The new version of the cube will be immediately visible with no break in service

  • if a shared dimension is processed using the rebuild the dimension structure method, all cubes that use the dimension will immediately become unavailable to clients and must be processed before they can be used again

  • if a shared dimension is processed using the incremental update method, cubes that use the dimension remain available to clients and any new members added to the dimension automatically become available to clients when the dimension processing is complete

  • dimension hierarchies can be affected by changes to data in the data warehouse dimension tables as it is based on relationships between members in a dimension table. In this case, it must be rebuilt using the rebuild the dimension structure dimension processing method

  • structure of OLAP cubes and dimensions can be affected by changes to the design of the data warehouse such as the addition, deletion, or alteration of tables, or relationships between tables. In such cases, you must modify the design of affected cubes and dimensions, redefine partitions and aggregations, and completely process the modified cubes and dimensions

  • for each OLAP server, a repository is created to store metadata for multidimensional objects such as cubes, dimensions, and so on. This is a Microsoft Access (.mdb) database at \Program Files\OLAP Services\Bin\msmdrep.mdb. You can use the Migrate Repository wizard to migrate this repository to a SQL Server (.mdf) database, but the process cannot be undone

Security

  • security is controlled via access control rights that are managed using roles

  • enabled only if the server is installed on the Windows NT NTFS file system, which enforces access control lists

  • three levels of access control

    1. Read - supported functionality includes browsing of data and data structures (metadata), does not allow modifying data and processing of data

    2. Read/Write - all read access functionality, capable of modifying data in cubes designated and enabled for write-back

    3. Admin - by default, the user account used to install the server on a particular computer has Admin privileges on that computer. Only members of this group can start the OLAP Manager user interface and use administrative functions, including security management

Changing a User’s Access Rights

  • time that elapses between a change to a user’s access rights in OLAP Services and the actual effect of the change depends on the value of the Auto Synch Period initialization property

  • if the Auto Synch Period property is set to null or 0 (zero), synchronization does not occur at a constant interval but occurs due to users’ actions

  • if the Auto Synch Period property is set to a non-null, non-zero value, at the specified interval, users’ logon user names and authorizations are compared to their access rights defined in OLAP Services, so that changes to a user’s access rights that occurred since the last synchronization take effect immediately

Roles

  • map Microsoft® Windows NT® user accounts and user groups to security category assignments for cubes

  • created and managed at the database level via the Manage Roles dialog box

  • service name for OLAP Services is MSSQLServerOLAPService. Logon account associated with this service must have permissions to access data sources that OLAP Services administrators can access through the OLAP Manager

Building Custom Applications

  • custom applications can be created to manage OLAP server, create and maintain OLAP objects such as cubes, dimensions, and security roles, extend user interface and connect to the OLAP server, query data in cubes, and create local cubes

  • OLAP Services recognize both the SQL dialect and the multidimensional expressions (MDX) dialect

  • MDX - multidimensional expressions (MDX) for manipulating multidimensional data, supporting MDX functions in the definitions of calculated members, and in a full language implementation for building local cubes and querying cube data using PivotTableR Service with OLE DB and Microsoft ActiveXR Data Objects (ADO)

  • Cube editor's Calculated Member Builder can help developing MDX

  • Decision Support Objects (DSO) enable you to create applications that enhance, augment, and automate your OLAP installation

  • Add-in Programs - you can create and register add-in programs that will be called by the OLAP Add-In Manager in response to user activity in the OLAP Manager user interface. They can optionally use Decision Support Objects (DSO) to manage server objects, and multiple add-in programs can be registered

Nature of PivotTable Service

  • OLE DB provider that supports the optional OLE DB for OLAP extensions introduced in OLE DB 2.0 and ADO 2.0

  • in-process

  • designed to provide online and offline data analysis and online access to OLAP data

  • functions as a client of OLAP Services

  • can work with only a single local cube partition

Functions of PivotTable Service

  • connects to OLAP Services as a client component

  • functions as a service provider for relational data sources to implement multidimensional functionality

  • expose the OLE DB interfaces with OLAP extensions

  • functions as a tabular data provider by supporting a subset of SQL

  • functions as a multidimensional data provider by supporting MDX

  • enables client applications to create local cube directly from a relational data source.

  • functions as a mobile desktop OLAP client that enables users to download from data sources and store the data in a multidimensional structure on a local computer for offline analysis

Performance Tuning

  • database engine is largely self-tuning – you should let SQL Server take care of most of the tuning work

  • increase the amount of RAM - access to data in RAM cache is much faster than access from disk

  • create and maintain good indexes - ensure good indexes are created and maintained. Please refer to the SQL Server 7 Database Design Cramsession regarding Clustered and Non-clustered indexes, as well as the purpose of Fill Factor. Keep in mind that, because a clustered index physically orders the table data, clustered indexes are much better than non-clustered indexes for queries that match columns or search for ranges of columns that are mostly non-unique.

  • Monitor disk I/O subsystem performance – ensures that database server is running without disk queuing

  • Application and Query Tuning – use SQL Server Profiler to monitor and log a SQL Server's workload, then submit the logged workload to SQL Server Index Tuning Wizard to make appropriate index changes

  • SQL Server Performance Monitor - detects bottlenecks with the revised set of Performance Monitor objects and counters

  • SQL Server Query Analyzer and Graphical ShowPlan - visually analyze problematic SQL queries and statistics I/O

  • Max Async I/O - with RAID (Redundant Array of Inexpensive Disks) attached to a database server that is capable of very high disk I/O transfer rates, max async I/O should be set higher. The goal is to make Checkpoint fast enough to finish before another checkpoint is needed. Command to use is in SQL Server Query Analyzer: "sp_configure 'max async io', <value>"

  • Worker Threads - total number of these threads available to service all incoming command batches is dictated by the setting for the sp_configure option max worker threads. If not enough thread is available, thread sharing will occur, which will negatively affect performance

  • LazyWriter - aims to produce free buffers during periods of low disk I/O so that disk I/O resources are readily available for use. If it is having problems keeping the free buffer steady, or at least above zero, it is likely that the disk subsystem is not able to provide LazyWriter with the disk I/O performance that LazyWriter needs. You should then add more physical disk drives. Also, you can adjust LazyWriter disk I/O request behavior with the use of max async I/O

  • Checkpoint – writes dirty pages out to the SQL Server data files in order to even out SQL Server disk I/O activity over a longer time period. You may adjust Checkpoint's dirty page flushing behavior with max async I/O. For example, if disk queuing occurs at unacceptable levels, decrease max async I/O and add more disks to the disk subsystem

  • DBCC SHOWCONTIG can be used to reveal whether excessive page splitting has occurred on a table - Scan Density should be as close to 100 percent as possible, or rebuild the clustered index on that table using the DROP_EXISTING option to defragment the table

  • FILLFACTOR on the CREATE INDEX and DBCC REINDEX commands allow you to specify what percentage of open space to leave on index and data pages. Please note that PAD_INDEX option is for FILLFACTOR on the non-leaf-level index pages. Optimal value for FILLFACTOR depends upon how much new data will be inserted within a given time frame into an 8-KB index and data page

  • reads tend to outnumber writes for OLAP system - FILLFACTOR should be set at 100 percent so that all index and data pages are filled completely for maximum I/O performance

Other Issues

  • Linked server allows SQL Server to execute commands against OLAP Services using the OLE DB provider for OLAP

  • When setting up linked services to OLAP Server, keep in mind that

    1. If login is made to SQL Server using SQL Server security login name and password, setup SQL Server services to run under either a local or domain user account rather than using a SYSTEM account. Otherwise, if login is made to SQL Server using Microsoft Windows NT authentication, then SQL Server passes the credentials of this Microsoft Windows NT account to OLAP services

    2. Either the SQL Server services startup account or the Microsoft Windows NT account should have access to OLAP services. Make this account part of the 'OLAP Administrator' local group on OLAP server computer, or create a database role within OLAP Manager and provide this role with read/write access to the cubes.

  • OLAP Services internally uses the NT computer name for licensing information. Services may fail to start after you rename the Microsoft Windows NT computer name

    1. Re-run the SQL Server OLAP Services setup so that it uses the latest Windows NT computer name. Remember to reinstall any service pack that was in use

    2. Change the Windows NT computer name back to the original computer name

  • When using OLAP Services to pull data from SQL Server using trusted or Windows NT authentication security, the MSSQLServerOLAPService service must be configured to run under a domain or local user account; otherwise, OLAP may fail to process dimensions or cubes. By default MSSQLServerOLAPService service runs under a local system account

  • Apart from being a member of the OLAP Administrators group, the MSSQLServerOLAPService account must have the appropriate permissions on SQL Server: permissions required will vary depending on the type of storage structure selected

    1. When using MOLAP storage, the MSSQLServerOLAPService account must have at least SELECT permissions on source database.

    2. If you use ROLAP or HOLAP storage, the MSSQLServerOLAPService account must have at least SELECT and CREATE TABLE permissions on source database

  • SQL Server 7.0 can perform queries against OLE DB providers by using the OPENQUERY or OPENROWSET Transact-SQL functions or by using a query with four-part names including a linked-server name. Query is limited to the abbreviated SELECT syntax supported by OLAP Services, but can include MDX syntax to return "flattened rowsets"

  • When a client application connects to SQL Server and performance is not good, look at SQL Server and check how to optimize the query/schema for better performance - especially the star/snowflake schema, including schema design, indexing, key relationships, referential integrity, and insufficient data scrubbing

  • One possible solution to optimize schema: before processing an OLAP Services cube, declare a primary key in each dimension table. Then

    1. For star dimensions, declare foreign key (FK) relationships between each dimension table and the corresponding fact table.

    2. For snowflaked dimensions, you also declare FK relationships between each secondary dimension table and the primary dimension table that it augments.

    3. Define indexes on each of the primary keys in the dimension tables and in the fact tables, and also on each of the foreign keys in the fact table(s)

    4. Perform clean up: remove all NULLs for data items that are being moved into OLAP Services



 
Home | Abroad | Academics | Advice | Alumni Associations | Career Watch | Competitive Exams | Career Counseling | Distance Education | Forms | Organisations | Relax Zone | MBA | Engineering | Medical | Humanities | Sciences | Computers ICSE/ISC/CBSE | Scholarship | Loans
 
 Contact Us | Feedback | Advertise | Disclaimer | Privacy Policy
 
©2000-2001 All rights reserved "DD Web Vision Private Limited"

Site developed by