70-229-Designing and Implementing Databases with Microsoft
SQL Server 2000 Enterprise Edition
Before you start
This study guide provides you with information on the many
different aspects of "SQL Server 2000 Database Design". Before you
proceed with this subject, please read through the study material
for the following and make sure you are 100% comfortable with the
SQL Server architecture
The reason you need to be familiar with SQL 7 is that the new
version has many things similar to the version 7. Especially for the
database design concepts, they are about the same.
Version 7 and Version 2000 have too many things in common. In
fact, experience on 70-028 and 70-029 will be extremely helpful.
This study guide will have its focus on the new features of SQL
Server 2000.
Do NOT rely solely on this study notes for the exam. By all means
read more than one book on the subject and make sure you understand
the material well enough so that you could be ready for the
questions. There is no quick way to succeed for this topic. Ideally
you must work things out and gain experience before even trying to
sign up for the exam.
Your Study Track for SQL Server 2000 DB Design
Make sure you are comfortable with the concept of
Relational Database System. SQL Server is a relational database
system. Also know about SQL Structured Query Language, which is the
language you will use to control many of the database operations.
Make sure you know the basic concept of the Client Server
Computing Model. The client presents and manipulates data on the
desktop computer, while the server runs at the back end to store and
retrieve protected data. SQL Server mostly works in a multi tier set
up, but not as the front end though.
As of this writing many of the SQL Server 2000 resources
are still in BETA testing stage. Due to the lack of SQL 2000
material in the market, you may want to start with those of SQL 7
first. Version 7 and Version 2000 have too many things in common.
This study guide will have its focus on the new features of version
2000 of SQL Server. You should visit the following pages to get
yourself familiar with the version 7 fundamentals before proceeding
Information for Version 7 of SQL Server DB Design should have
covered the following identical information:
-
Developing a Logical Data Model
-
Create and alter databases
-
Create and alter database objects
-
Alter database objects to support replication
-
Troubleshoot failed object creation
-
Retrieving and Modifying Data
-
Programming Business Logic
-
Manage data manipulation
-
Tuning and Optimizing Data Access
-
Designing a Database Security Plan
-
Create and manage application roles
Know XML, as SQL Server 2000 has integrated into it the
support of XML. Short for Extensible Markup Language, XML is a
specification developed by the W3C. Being the pared-down version of
SGML, it is designed especially to allow designers to create their
own customized tags, enabling the definition, transmission,
validation, and interpretation of data between applications and
between organizations.
Data Types
SQL Server 2000 introduces the 64-bit integer (bigint), variant
(sql_variant), and table data types. The bigint data type is an
integer that supports data from -9,223,372,036,854,775,808 through
9,223,372,036,854,775,807 with a storage size of 8 bytes. Note that
functions will return bigint only if the parameter expression is a
bigint data type, and that SQL Server will not automatically promote
other integer data types to bigint. Sql_variant data type does not
store values of text, ntext, image, timestamp, and sql_ variant. It
allows a single column to store data values of different types.
To support the new bigint data type, we have two new built-in
functions that return integer value of type bigint:
BIG_COUNT
ROWCOUNT_ BIG
To work with a sql_variant data type, you should always cast the
variant with the CAST operator. For example:
SET @xyz_chr = CAST(@xyz_var AS VARCHAR(12))
SQL_VARIANT_PROPERTY is a function for returning the base data
type and other information about a sql_variant instance. The
syntax:
SQL_VARIANT_PROPERTY(expression, property)
The property parameter contains the name of the sql_variant
property that can take a value of BaseType, Precision, Scale,
TotalBytes, Collation, or MaxLength.
An extended property has a name and a value being as a
sql_variant with a size of max 7500 bytes. For operations related to
the adding, updating, and dropping extended properties, the
following SP can be used:
-
o sp_ setextendedproperty
-
o sp_dropextendedproperty
-
o sp_updateextendedproperty
In addition, you can use the user-defined function
fn_listextendedproperty to retrieve the extended properties if
needed.
Table data type is a local variable for temporarily storing a
rowset. It can be used in place of temporary tables of the tempdb
database. Keep in mind that table data type is managed in memory
only, it provides performance benefits over other disks based
alternatives. You define such a data type like this:
DECLARE @local_variable TABLE <table_definition>
Together with a user defined function that returns a table data
type, you can declare an internal table variable and return that
variable as the return value. We refer this as rowset functions.
This is an attractive alternative to views, given the fact that
views are limited to a single select statement, while user-defined
functions can contain multiple statements in views, as they can be
included in the FROM clause of a SQL statement.
User-defined Functions and Triggers
User-defined functions are similar to stored procedures. They
accept zero or more input parameters, and can return a scalar data
type such as int, decimal, varchar, or sql_variant as well as the
table data type. However, you must specify the RETURNS value and to
terminate with the RETURN statement.
To have a user-defined function returns a scalar data type, you
need to define the return type and specify the value in the return
statement:
CREATE FUNCTION dave.MyProductsTable ()
RETURNS TABLE
AS
RETURN SELECT TOP 5
ProductName AS
MyProducts, UnitPrice
FROM Products
ORDER BY Products.UnitPrice
When the return type is a local table variable, this allows you
to use the local table variable for inserts and other operations
prior to the return.
A special type of table exists that returns a user-defined
function (in-line function). It has a return type of table without
the table definition. It actually returns the resultset of a single
select statement from which the table definition is derived.
One reason of using function over stored procedure is that it
allows you to include the function in the select statement:
SELECT Products.*, Manu.*
FROM MyProductsTable() AS MyProducts
INNER JOIN …
You need to understand the determinism of a function, as all
functions are either deterministic or nondeterministic.
Deterministic functions always return the same result for a given
set of parameters, while nondeterministic functions do not.
SCOPE_IDENTITY is a function to be used for the @@IDENTITY global
variable to return the last value inserted into an identity column
that has the same scope. IDENT_CURRENT is a function to be used for
the @@IDENTITY global variable to return the last identity value
generated for a specified table in any session and any scope.
Triggers are special stored procedures for executing instructions
automatically when there is an update, insert, or delete statement
raised against a table. You can find two new trigger types in SQL
Server 2000, the AFTER trigger and the INSTEAD OF trigger. AFTER is
for table use only. For an AFTER trigger to fire, the statement must
complete without an error. You may specify multiple AFTER triggers
for each triggering action, and you may specify the order of trigger
execution using sp_settriggerorder. Note that all other AFTER
triggers will fire in undefined order. In contrast, INSTEAD OF
triggers can be specified on both tables and views.
Integrity Constraints
There are many different types of integrity. Domain Integrity
refers to the requirement that data values in each column must be
valid. To enforce, use Primary and Foreign Keys, Default definitions
and Check constraints. With Entity Integrity, each row in a given
table must be unique in the entity. To enforce, you use primary key.
With Referential Integrity, relationships between tables are
maintained through Primary Key and Foreign Key. In SQL Server 2000,
this referential integrity concept has been extended to support
cascading delete and update operations.
With the NO ACTION constraint, you can prevent data on a
referenced table from being deleted or updated if corresponding
records exist on the referring table. You can also prevent data on
the referring table from being inserted or updated if no
corresponding entry exists on the referenced table.
CONSTRAINT FK_myOrder_Details FOREIGN KEY
(OrderID) REFERENCES
dbo.myOrders(OrderID)
ON UPDATE NO ACTION
ON DELETE NO ACTION,
You may also allow update and delete operations to be
cascaded from the referenced table to the referred table. For
example, if you change an exam code, it may make more sense to
cascade this new code to all the related exam detail records. You do
this with:
ON UPDATE CASCADE
Performance
There are mainly two kinds of database systems: OLTP and DSS.
With OLTP you want to make sure update is fast and table structure
is optimized. You tend to normalize the tables to avoid data
redundancy. We classify data as redundant when data is irrelevant;
data has duplicates; or data is basically extracted from another
column of data. Note that with DSS you need fast retrieval speed,
and redundant data is unavoidable.
In a First Normal Form, no table has columns that define similar
attributes and no column contains multiple values in a single row.
In a Second Normal Form, each column that is not part of the primary
key should depend on all columns that make a composite Primary key
and not only depend on a subset of the composite Primary key. In a
Third Normal Form, columns that are not covered by the Primary Key
should not depend on each other. You may not want to go to Fourth
Normal Form and Fifth Normal Form, as they may involve too many
complex joins which are costly to process.
The more the joins involved in a query, the more workload a
processor has to handle. Inner Joins uses a comparison operator to
match rows from two columns based on the values in the common
columns to return rows that match. In contrast, Outer Joins returns
all rows from at least one of the tables or views specified in the
From clause. With Left Outer Joins, you may return all rows from the
left table regardless, and that the right table will return NULL
where there is no match. Right Outer Join is the other way around.
Full Outer Joins returns all rows from both tables, and that NULL
values will fill the non-matched fields. With Cross Joins, all rows
in the left table are matched with all rows in the right table to
produce a Cartesian product which is most likely meaningless.
Sometimes you may prefer to use the UNION clause to combine the
result sets from multiple SELECT statements into one result set if
the result sets contain the same number of columns with compatible
data types.
When we talk about UNION, we should also know that SQL Server
2000 supports updateable UNION views over local or remote linked
servers, known as distributed partitioned views. This allows servers
to cooperate to provide a single view of partitioned data and
provide unlimited scalability. To define distributed partitioned
views, you need to first create an image of the database on each of
the participating servers, then define the member tables, then
define each member server as linked server, and finally define the
distributed partitioned views in all of the member databases. This
type of partitioning is known as horizontal partitioning.
Generally, for OLTP, disk I/O is always the bottleneck. To
optimize I/O performance, you should place tables and indexes across
as many different physical disks as possible, and placing the
transaction log on its own disk. Also, for tables that are likely to
participate in joins, you should place them on different disks so
that they can be scanned in parallel.
Know when to use clustered index and when to use non-clustered
index. Since clustered index actually rearrange the actual order of
the records, there can be only one per table. Generally, clustered
index should be kept as narrow as possible to reduce disk space
consumption and reduce I/O burden caused by data modification in the
table. In contrast, Non-clustered indexes use the clustered index
keys as pointers to data. You should create clustered index on the
Primary key column and create non-clustered indexes on columns often
used in WHERE clauses. Always use the column that has more unique
values first in a composite index.
SQL Server Query Analyzer is a GUI tool tightly integrated with
the Index Tuning Wizard and SQL Server Profiler that enables you to
write queries, execute multiple queries simultaneously, view
results, analyze the query plan, and receive assistance to improve
the query performance. Database Maintenance Plan Wizard helps
setting up the core maintenance tasks necessary to ensure your
database performs well. Note that maintenance activates can also
include back up, check for integrity and consistency…etc.
New in SQL Server 2000 is indexed view. In SQL Server 2000, a
view that has a unique clustered index is referred to as an indexed
view. You may actually create a unique clustered index as well as
non-clustered indexes on a view to improve data access performance
on complex queries. However, keep in mind that not all queries will
benefit from indexed views. Most importantly, if the indexed views
are not used, there is no benefit.
You should consider using indexed view in schema that involves
any subset of tables referenced in the query, any subset of the
conditions in the query for that subset of tables, grouping columns
or aggregate functions.
Decision support queries usually reference large numbers of rows
and aggregate large amounts of information into concise aggregates.
By using indexed view you will force the view's resultset to be
stored in the database, leading to substantially better response
times.
With the NOEXPAND option you can force the query optimizer to
treat the view like an ordinary table with a clustered index. Or you
can explicitly exclude indexed views from consideration by using the
EXPAND VIEWS option with the query. The SCHEMABINDING option is used
to prevent the base referenced tables from being changed without
view adjustment. For the best result, indexes on tables and indexed
views should be designed concurrently to avoid redundant
recommendations that incur high storage and maintenance overhead.
Also, when designing the indexed view, you should design them to be
used by multiple queries and to keep the index as small as possible
by using the fewest number of columns and bytes. Sometimes multiple
smaller indexed views are preferable than a single big one.
You may use Index Tuning Wizard for creating indexed view, as it
will make recommendation for you. For maintenance, SQL Server
automatically take care of it. However, if the indexed view
references several tables, updating any of them may require updating
the indexed view, which is expensive relatively. You need to take
this into account before considering the use of indexed view, if the
view will reference multiple sources.
To define an index on a view, use the create index statement as
follow:
CREATE UNIQUE CLUSTERED INDEX myindex ON OrderTotals
(OrderID)
Nonclustered indexes are supported only on view with previously
defined unique clustered index. Indexes defined on computed columns
are allowed if the expression defined for the column only references
columns from the table containing the computed column and is
deterministic, meaning calculated values do not change subsequent
invocations. Using indexes on computed columns is recommended when
you are trying to create covering indexes. For views using the GROUP
BY aggregation, the COUNT_ BIG(*) statement is mandatory. In
addition, all grouping columns must appear in the view's select
list.
Keep in mind that index cannot be created on a computed column if
the column expression references nondeterministic functions. Also,
clustered index cannot be created on a view if the view references
nondeterministic functions.
Collations
With SQL Server 7.0, during install you must specify a default
code page and sort order. All databases will then be locked into
that particular code page and sort order. In SQL Server 2000,
collation acts as a collection of the sort order for Unicode data
types, the sort order for non-Unicode character data types, and the
code page that is used to store non-Unicode character data types.
You can specify collations at the database or column level, in
addition to the default collation specified during installation. In
addition to support collations at design time, you may specify
collations for individual statements to create queries on tables
supporting multilingual data specific to the language of the
data.
Take a look at this example code fragment for column level
collation:
CREATE TABLE myBody (
myName NVARCHAR(128) COLLATE SQL_Latin1_General_CP1_CI_AI,
myType VARCHAR(10),
myLang VARCHAR(10),
)
Take a look at this example code fragment for database level
collation:
CREATE DATABASE myBody
COLLATE SQL_Latin1_General_CP1_CI_AI
XML
In terms of XML Data Retrieval, results of SELECT statements can
be returned as XML document fragments through using the FOR XML
clause:
FOR XML {mode} [,XMLDATA][, ELEMENTS][, BINARY Base64]
With RAW mode you transform each row in the SQL query into an XML
element with the generic row identifier. With AUTO mode you can
return query results in a nested XML tree. Within AUTO mode you can
specify the ELEMENTS option to have columns returned as
sub-elements. The option BINARY Base64 allows you to have binary
data to be returned in base64 encoding. The XMLDATA option you can
specify that XML-Data schema information should also be returned
together.
With EXPLICIT mode you can specify the shape of the XML tree for
manually ensuring the generated XML structure. The EXPLICIT mode
query will produce a universal table with information about the
resulting XML tree, and the table will be vertically partitioned
into groups to transform into XML elements. The Tag column will
store the tag number of the current element. The parent column will
store the tag number of the parent. You use these metadata tags
together to describe a parent and child hierarchy in the XML
tree.
OpenXML is a rowset function used to expose data from the XML
document as a relational rowset. You may use it as a table reference
and allow yourself to use the data in XML documents for inserts or
updates into database tables. However, to use it you must first
create an internal instance of an XML document with
sp_xml_preparedocument. Remove it with sp_xml_removedocument to free
up memory once your job is done.
When we talk about XML you may wonder about the web integration
aspect of SQL Server 2000. SQL Server always act as the backend in
the scenario. You may use the variant datatype for internet
application that collects information from users. When someone
submit a value via a form, instead of saving the form value as
character together with all associated metadata, you can simply save
the value as a sql_variant with proper casting in place.
SQL Structure
For the SQL language, make sure you fully understand how SQL is
used in TSQL context. With the Where clause, aggregate functions
cannot be used. Without a Where clause, a tablescan will occur that
degrade performance. With the Group clause, all columns in the
Select list must occur in the Group clause with NO aliase. With the
Having clause, all columns must be listed in the group by clause or
be used in aggregate functions. With the Order By clause, all
columns in the Compute by clause must appear in the order by clause.
With the Compute By clause all columns presented in the compute by
clause must be listed in the ORDER BY clause without aliases. The
main difference between COMPUTE and GROUP BY is that GROUP BY
produces a single result set, while COMPUTE produces multiple result
sets.
It is important to remember the order of SQL commands after the
Select clause, from clause and where clause: Group By + Having /
Order by + Compute By. In the Transcender 029 exam you will see a
lot of practical cases relevant to this topic. Also make sure you
know how to read ER Diagram and know how to distinguish between one
to one, one to many and many to many relationships. Also be sure to
know what is the best column for creating primary key and foreign
key.
|