Site Server 3.0
Deployment
Content deployment distributes content across
directories on multiple servers and across remote secure
networks, to multiple destination servers. Data validation and
restart capabilities ensure reliable deployment. You can
deploy files, directories and ACLs (access control
lists).
Content deployment uses both staging and end point
servers. The staging server receives content from authors and
administrators, stages content for review, and deploys
content. The end-point server receives content from the
staging server.
The
content deployment process
1.
The administrators create projects or routes on the
staging and end-point server. One to deploy content and one to
receive content. You cannot run a project if you don’t
configure matching projects at each end.
2.
The content to be deployed is either submitted by an
author, or retrieved from the internet server by the
administrator
3.
Content deployment replicates the content from the
staging server to the end-point server.
4.
Both the staging and the end-point servers generate
reports for the administrator, documenting deployment
status.
Content deployment servers retrieve and deploy content
based on their project and route definition.The staging server is running
Windows NT. It receives content from content authors, other
staging servers, and from http/ftp servers. Then it tests the
content before it deploys the content to end point servers or
other staging servers.
Testing enables the administrator to review the
deployed content and ensure that all the links are functioning
properly.
The
end-point server is running Windows NT or UNIX server. It
receives data from the staging server. The end-point server
cannot deploy content, but is used to handle user requests for
web pages. UNIX servers can only be used as end-point
servers.
Before deploying content the administrator needs to
configure all staging and end-point servers with the
appropriate properties. These properties determine
1.
How event messages will be distributed among various
event stores
2.
Who should receive emails reports
3.
Which servers will receive content posts
4.
How many rollbacks to allow
Server properties are set individually on all of the
servers at your site.
There are 3 possible configurations that can be used
1.
A single point server: involves only one server that
contains a source and destination directory. Content authors
submit to the source directory, where the administrator test
and approves the content before deploying it to the
destination directory. This configuration is common at small
intranet sites.
2.
Staging/End-Point Configuration: Content is deployed
form a single staging server to one or more end-point servers.
Common in Intranets and small ISPs.
3.
Multiple stage and deploy configuration: In this
configuration multiple staging servers and multiple end-point
servers move content around a complex geographically
distributed Internet site. Content is distributed thru a chain
of staging servers around the world. This configuration is
normally used by large ISPs that require mirrored content on
multiple servers.
Content deployment uses 3 kinds of projects to
replicate content
1.
Content deployments: used when deploying content from a
staging server to mid-point or end-point servers
2.
Internet retrieval: to retrieve content from an
http/ftp server
3.
Component deployment: to deploy Java and COM applets
packaged in .cab files
You
can use the content deployment wizard instead of creating the
projects manually.
Configuring projects requires specifying information
about the content you want to stage and deploy. After you
specify the type of project you must specify
1.
Which content to include in the project
2.
Where to deploy the content
3.
When to deploy the content
4.
Project security information
5.
Who receives project status reports
Sources and destinations are the most important
elements of the content deployment project. You must create a
project on EVERY server included in the
replication.
Content deployment projects include
1.
End-point servers: If an end-point server is the
destination, you must first create the project on the server
before the server can be a destination
2.
Routes: using routes, any project can be stored in a
route directory
3.
Directories: directories can be destinations
Replicating metabase information enables you to clone
your web servers. The only part of the configuration that is
not cloned is TCP/IP configuration.
Routes are predetermined paths that staging and
end-point servers use to deploy and receive
content.
After a route has been created, an identically named
project on every server in that route is subsequently created
so that you can deploy content to those servers. The
administrator must create the route on every server in the
route.
There are 2 ways to create a route
1.
Manually: Using MMC or WebAdmin.
2.
Using the New Route Wizard: this wizard creates a route
on every server for you and is available from the MMC.
Provide the following to each server
1.
The route name
2.
The route directory: which contains the route and all
projects that use the route
3.
The names of all servers included in the
route
You
can change the servers in a route at any time, either by
adding new ones or removing existing ones. You can delete a
route using MMC or WebAdmin. ALL projects using the deleted
route will be affected. When deleting the route consider the
following
1.
All replications following that route should be
complete
2.
You should delete the route from all servers in the
route
3.
The route will be deleted from all project definitions
of all projects using that route.
Filtering content enables you to determine at a file or
directory level what to include in and what to exclude from
replications. Creating a filter requires building an ordered
list of files or directories to include in or exclude from a
replication. Adding content filters to a project does not
affect content that is already deployed.
Filters are evaluated and applied in the order
displayed on your screen. Since one filter can undo another,
it is recommended that you think about all of the filters you
want to apply and then begin with the most general filter and
work toward the most specific.
When using filters, please note the following
1.
When a directory is excluded, all files within that
directories and all subdirectories are excluded.
2.
When a file is included also its parent directory is
included
3.
If an invalid filter is included, no content is
included
4.
If an invalid filter is excluded, no content is
excluded.
Securing you projects requires configuring project
users, authentication accounts, and ACL deployment.
You
can effectively secure your project by creating groups, each
with its own set of access privileges. Windows NT and site
server administrators have full control.
Members of the publishing operators group, perform
service operations, such as starting, stopping and rolling
back projects.
The
publishing administrators can administer local and remote
servers. Theirs tasks includes
1.
Project administration tasks: adding, deleting,
and editing routes. Adding and deleting users to/from
projects
2.
Server administration tasks: adding/removing servers,
modifying server properties, starting/stopping/pausing content
deployment.
When content is replicated the server sending content
must have the right Windows NT credentials to give to
staging/end-point servers. The content sent is signed but NOT
encrypted. This means content cannot be changed.
Projects can be scheduled
1.
Manually
2.
Automatically: Whenever content changes
3.
Scheduled: when scheduled
4.
Scheduled with the apply option: when the content is
applied
Administrators can post data via upload pages or
posting acceptor. Posting acceptor is a server-based tool that
receives content from content authors using the http protocol.
Posting acceptor can forward/repost data. Frontpage98 and
Interdev can be used to post data.
To
Monitor deployment use the content deployment service. MMC
project pages show the status of a project, and Performance
monitor maintain record of transmission and authorization
data. After analyzing the data you can make adjustments, if
necessary.
The
rollback command is similar to the undo command: You can
specify as many rollbacks as you want but you cannot rollback
content at the source.
3
standard reports are available
1.
Project reports
2.
Replication reports
3.
Full reports
You
can configure the server to send these reports by
email.
Search
Site server includes a
search feature similar to Index Server. The main differences
between the 2 are
Functionality |
Site
server |
Index
server |
Crawl
Capabilities |
Generated Web
content, Web sites, Exchange Public folders, ODBC
databases |
File system
files |
Number of computers
that can be searched |
Multiple computers,
including internet sites |
One
Location |
Integration with
Windows NT security |
Across multiple
computers |
Single
server |
Updates of
catalogs |
Scheduled full
crawls, with incremental crawls |
Automatic, based on
file change notification |
Automatic
distribution of catalogs to other servers |
YES |
NO |
Distributed
indexing |
YES |
NO |
Multiple catalogs
searching |
YES |
NO |
Administration |
Wizard, MMC,
WebAdmin, command line |
MMC and
WebAdmin |
A catalog is a document
index containing information about a document, but not the
actual document. For each catalog you build, you must first
create a catalog definition.
A catalog definition
contains the instructions and parameters for building a
catalog, and it’s stored in the catalog build server (the host
used to build catalogs)
Search builds a catalog by
1.
Using a catalog definition to gather content
2.
Extracting words and attributes from collected
documents
3.
Creating a catalog index
4.
Compiling and propagating the created
catalog
Search uses 2 services: the
gatherer service to create catalogs, and the Search service to
perform the actual search.
Search hosts can be of 2 kinds
1.
Build servers, which build the catalog
2.
Search servers, which perform the actual search and
store the catalogs. They must have access privileges to allow
visitors to search them.
A single host can perform
both operations, but Microsoft does not advise it.
You can conserve network
bandwidth by placing the build server close to the location
where documents are stored and the search server close to the
site that visitors will search.
You can use multiple
servers to distribute the load.
Collecting documents is
called Gathering. There are 3 kinds of gatherings
1.
Web link crawl: Search can use HTTP to crawl documents
by following links.
2.
File crawl: Search can use File protocol, to crawl the
files present in one directory. Search can crawl ANY file
system that can be mounted remotely.
3.
MS Exchange crawl: Search can use exchange public
folder as a start address and crawl messages on a computer
running MS exchange server by using the exchange
protocol.
The crawl history is a list
of links that search has crawled. It is used to eliminate
duplicate crawling of the same directory.
Search uses filters to
extract the full text, files, and document attributes from the
documents it gathers. Filters are standard plug-in modules
that conform to the MS standard Ifilter interface.
To prevent the index from
becoming filled with words that do not help visitors to find
documents, search uses Noise Word lists.
3 actions to create/use a
catalog: Gathering, Compiling and Propagating.
When a document is
catalogued, search determines which language the document is
written in, and then sets the value of “detectedlanguage” for
the document.
When a search is conducted
the “detectedlanguage” value is checked. Depending on the
language different world breakers and word stemmers are
used.
Word breakers are
responsible for identifying words in a document
Word stemmers are
responsible for taking words and making grammatically correct
variations of that word.
A search system can use
multiple servers. Since WebAdmin allows you to administer only
the hosts in which it’s running you must use MMC to manage a
multiple host search system.
There are some options to
keep in mind when specifying what to crawl
Kinds of crawls
1.
Crawl all subdirectories under the start address
or
2.
Crawl by following links from the start
address
You can limit page and site
hops that search makes.
Kinds of files
1.
By specifying the type of files to crawl or to
avoid
2.
By selecting the protocol to use
3.
By specifying whether to crawl links (URLs) with
Question marks.
How far search can go
1.
Directory depth
2.
Page and site hops
3.
Site and Path Rules
4.
Specific Microsoft Exchange public folders
When crawling search
identifies itself by: user agent and/or by email.
To follow crawling
etiquette Search enables
1.
Setting hit frequency rules
2.
Obeying the rules of robot exclusion
3.
Leaving behind an email address to contact in case of
problems
Other options to set
1.
Setting resource usage. The amount of resources used by
your system to build the catalogs
2.
Access accounts: to access each catalog
server
3.
Crawling identification
4.
System proxy information
5.
Crawling timeout periods
6.
Save and Load configurations: used to save
configurations on the catalog server, and load them onto other
servers
To log on to the search
administrative interface you must be one of the following
1.
Administrator on the local host
2.
Site server Search administrators
3.
Site server Knowledge administrators
4.
Site servers administrators
You must setup an
administrative access account on the catalog build server that
has administrator privileges on all hosts to which you want to
propagate catalogs.
When accessing content to
crawl you must setup a content access account on the catalog
build server to access external data.
Security depends on the
kind of data you need to access
1.
File Access: The content search determines the files
you can access or not
2.
HTTP access: First search tries anonymously, then it
uses the access account
3.
MS Exchange Authenticates the content access account
when crawling folders
When configuring your
search system, consider the following
-
Multiple
search servers are useful when accommodating a high volume
of requests
-
Multiple
catalog build servers are useful when you have a lot of
rapidly changing content that you want to crawl
frequently
When creating a catalog
definition consider the following
1.
The number of catalogs you need
2.
The type of information for which site visitors will
search
3.
Whether to TAG your HTML documents before cataloging
them
4.
Where to start crawling and from which site to gather
documents
5.
Whether to adjust the sites hit frequency
6.
Whether to set site rules that limit which site or path
search crawls
7.
Which files you want to crawl and which protocols to
use
8.
The type of information (attributes) to store in
catalogs
9.
Whether you need to change the default schedule for
building catalogs
10. Whether to
use different hosts for searching
If you want to add messages
from MS Exchange public folders, you must first configure your
search host with information about the computer running
Exchange Server
If you want to build a
catalog that contains database records, you must create a
database catalog definition and then setup/modify ASP
pages.
Catalogs can be one of the
following types
1.
Crawl catalog: built by crawling files
2.
Notification: built by receiving information through a
notification source
3.
Database: You build database catalogs by crawling a
table in an ODBC database
Catalog definitions for a crawl must contain the
catalog name and the start address and crawling policy. It can
also contain information on site and path rules, file types,
propagation and build schedule
Catalog definitions for
Notification catalogs must include the catalog name and the
notification source. It can contain information on host to
which propagate and how many documents to receive before
updating catalogs.
Catalogs definitions for
Database catalogs must contain: catalog name, ODBC source,
Table to catalog, database column to use for content and
description.
When building a catalog you
have 2 build options
1.
Full builds: Starts with an empty catalog, and use
start address as the starting points for the crawl.
2.
Incremental builds: Start with the start address and a
previous catalog. Updates any changes to the contents since
the last crawl.
NOT
all the changes made to a catalogue definition affect an
Incremental build: in some cases you must restart with a full
build.
You can view the status of
a catalog at any time, to see whether the catalog is in an
idle, crawling, compiled, propagating or error
state.
Smaller catalogs have
higher search performance. One way to decrease the size of a
catalog is to reduce the number of retrievable
attributes.
To run a search, you must
specify which part of the catalog to search. Catalogs are
organized by columns, and you must specify which column to
search into.
Results ASP pages are
stored in a virtual directory on your web server, that has
Script permission.
Search performance can be
optimized in 2 areas
1.
Cataloging performance: By minimizing use of other
resources during cataloging and minimizing catalog
space.
2.
Searching Performance
Cataloging can be improved by
1.
Configuring server for maximum network
throughput
2.
Stopping Index Server (if not used)
3.
Minimizing number of columns for catalog
4.
Using incremental crawls when building a
catalog
5.
Scheduling catalog builds
6.
Setting the site hit timing for crawling
7.
Setting timeouts periods for crawling
8.
Setting the resource use on the catalog build
server
Search performance can be
improved in speed and accuracy by
1.
Improving performance on search server (e.g. stopping
index server)
2.
Decreasing site of searching catalog
3.
Decreasing number of catalogs
4.
Setting the resource use of the search
server
5.
Search page design
6.
How well your search page helps your users target
results
Membership
Services
Membership server is a
collection of software components, which manages P&M user
data and other information. It performs 4 key functions
1.
Managing user registration and user data
2.
Protecting and sharing user data
3.
Verifying user identity
4.
Controlling access to content on your site.
Each Membership server can have some of the following
components
1.
Membership Directory: central repository of user
data.
2.
Authentication service: Tying together the various
functions involved in site security
3.
Active User Object (AUO): Presents a single interface
for applications to access and integrate data from multiple
user directories
4.
LDAP service: an Internet standard for accessing user
information in directories, and providing standard, platform
independent access to Membership directory
5.
Message builder service: constructs and sends
mailings
Site server enables you to
choose from many configuration options
1.
Single Server: Typically used for test and evaluation
purposes or for small web sites. All components resides on a
single computer
2.
Basic Multiple Servers: For larger sites. The
Membership directory database is installed on a dedicated
computer
3.
Replicated multiple servers: Suitable for High-End
sites, multiple application servers are deployed to support
multiple applications type and performance requirements. Each
Application server has a Membership server instance installed
on it.
4.
Dedicated LDAP configuration: It may be advantageous to
put a tier of one or more dedicated LDAP service computers
between the web servers and the Membership directory database
and stop the LDAP services that reside on the web servers.
This configuration can offer an ideal balance of security and
efficiency: the application server can be exposed to the
Internet, while the LDAP service can sit behind a
firewall.
Configuration limitations
-
When using
Windows NT Authentication mode only one LDAP service can be
configured
-
When using
MS Access as the database for the Membership directory, only
one LDAP service can be configured and it must reside on the
same computer as the Access Database.
-
IIS, the
LDAP service, and SQL server, they all require large amounts
of RAM.
-
The
Membership directory is the central repository of data where
you can store:
1.
User Data: User profiles with personalization and
optionally passwords
2.
Site data: Information about your site and
organization, like the site vocabulary.
3.
Membership Directory Schema: Defines the objects, data,
and relationships of the user and site data in the Membership
directory
The Directory tree is a
representation of the Membership directory as a hierarchical
structure (or tree) of data objects. There are two general
categories: Container objects, which have Child objects
in the tree, and Leaf objects which have no child
objects.
The Authentication service
retrieves user properties, including passwords, from the
membership directory and supplies them to the AUO. It also
validates the password provided by the user, by comparing it
to the one in the membership directory.
The AUO is an Active
Directory Service (ADS) Component Object Module (COM) that you
can configure to access and integrate user attribute data from
a membership directory and other data sources. Using the
AUO, you can create a virtual user attribute schema that can
be accessed from any script or program.
User accounts can be
created in 3 ways: with administrative pages, with analysis
pages and with registration pages.
There are 3 types of users
-
Anonymous users: Do not
have an account and are not tracked at all in the Membership
directory and the Windows NT server directory
database.
-
Cookie-Identified Users:
Are tracked in the Membership directory by Means of an
automatically assigned GUID. Cookie-identified users does
not have a password, and do not register as members.
-
Registered users: are
tracked in the Membership directory.
With membership
authentication, you can create 2 kinds of user objects in the
membership directory
-
Security account
objects: represented with a membership user account and
password.
-
Cookie User objects:
identified by a GUID that is stored on the server and in the
client browser cookie file
When you create a
Membership directory, you must specify the authentication
method used.
With Membership
authentication, users and passwords are stored in the
Membership directory. With Windows NT authentication, they are
stored in the Windows NT server directory database.
Membership authentication
has advantages for Internet sites, while Windows NT
authentication is useful for Intranet sites.
Windows NT methods of
authentication: Cookie Authentication, Clear text/Basic
Authentication, Windows NT challenge/response, and client
certificate
Membership Authentication
methods: Automatic cookie, Clear text/Basic, HTML forms,
distributed password, and client certification.
In distributed password
Authentication user’s identity is validated by password
only.
Public is a P&M
built-in group to which each user belongs
When a user tries to
authenticate and fails, the user is routed to the
authentication pages. The name of this file is
privilegedcontent.asp, and all users have access to this
file.
There are 4 levels of
protections you can apply to your content
1.
Public content: not protected
2.
Registered content: Users must fill in a
Form
3.
Secured content: only registered users can
access
4.
Subscribed content: provided to a subset of your
registered users.
When a user requires a
particular file, Access control is used to check if he has
permissions. ACLs (access control lists) are then
used.
Personalizing
Content
Personalization and
Memberships enable any Internet site to present unique
personalized content automatically to specified users, using a
variety of delivery mechanisms.
Before P&M can use
content from each source, authors must first identify (tag)
content with attributes that administrators define to use
specific user needs.
User profiles are stored in
the Membership directory, and are the primary source of user
information for personalization. They contain a set of
demographics and user preference data that is used to provide
a more personalized experience to site visitors.
Personalization rules are
statements that test a condition and then perform an action
when the condition is true.
Personalized information
can be delivered thru personalized Web Pages, email or push
channels.
The membership directory
schema is the data structure that defines how user profiles
are stored.
Attribute schema objects
define attributes. All attribute definitions are stored
as attribute schema objects
Class schema objects define
classes.
All objects in the
membership directory, including user profiles, are an instance
of a class, and every object is defined by and consists of a
set of attributes.
Managing schema objects can
involve the following tasks
-
Defining a new
attribute
-
Defining a new
class
-
Editing a new attribute
definition
-
Editing a new class
definition
How configuration of
attributes works
-
Explicit profiling takes
place when users answer questions on a registration form and
establish an account.
-
The change properties
page enables a user to change his/her properties stored in
the membership directory. Only registered users are allowed
to use this page.
-
Automatic recording can
occur via Active server pages (ASP).
-
Some data can be
migrated from pre-existing databases using the P&M
migration tool.
One of the simplest ways of
personalization is to add a user property to a web content
template. There are 2 ways of doing this: by using the insert
property DTC or by using VBscript.
Rule Builder is a tool
found in Rule Manager. Rule Builder is used to create new
rules, or modify existing rules, for delivering personalized
information thru web pages/email. Rules are usually built in
Rule Manager, and then saved in rule sets.
Rule exceptions set up
conditions that prevent a rule from being executed, even when
other conditions are met.
Personalized web pages are
created with the rules you specified before.
Personalized e-mail can be
done with Direct Mailer. Direct Mailer retrieves the custom
text files, and then assembles them into a customized
e-mail.
You can have 2 types of
distribution lists: Static or dynamic, depending on if you
created them or they are dynamically created from a
database.
Knowledge
Manager
Knowledge Manager is a
web-based application that can filter documents by areas of
interest, define and schedule briefs regarding content, and
browse files by category.
It’s mainly used to enable
visitors to find information and receive updates when
information is added or changed.
Knowledge Manager makes
information available in several ways
-
By searching your
company's accumulated knowledge. (The searchable content
must be catalogued with search)
-
By browsing categories
of organization
-
By staying up-to-date on
topic of interest
-
By choosing to receive
email updates
-
By sharing expertise by
creating briefs
-
By selecting the
available channels in the intranet.
The search page is
essentially a query page to get information.
Knowledge manager enables
site visitors to learn from each other by using briefs
prepared by co-workers, the administrator, local experts or
other site visitors.
Briefs are documents that
contain the information organized around a specific topic.
Briefs are composed of 2 types of sections: saved search
sections, which are saved search queries, and link list
sections, which are lists of useful URLs along with their
descriptions.
The Brief delivery page
offers users a choice of how they want to receive briefs
updates. You can receive briefs updates thru email or your
personal briefing channel.
Channels are conduits thru
which information is stored and delivered. They are setup by
administrators and are centered on topics. Users then decide
to which channel to subscribe.
After setting up the search
center and creating briefs, you only need to add defaults and
links for your site to use knowledge manager. Basic
configuration is in the config.idc file.
The Knowledge Manager
database is in Access format and is located in
Mssiteserver\Data\Knowledge directory. The database contains
the following tables
-
Briefs: Stores briefs
alphabetically + author, creation date….
-
Details: Links the
briefs and filter tables
-
Filters: contains
information about all sections/filters
-
URLs: contains
information about all link list sections
Analyzing Web site
usage
You can use Analysis to
analyze usage at your site, including who visits the site,
where they go, and how long they stay.
Analysis offers the
following tools
-
Usage import: controls
how you import log files.
-
Report writer: provides
standard report definitions
-
Custom import: allows
you to import custom data
-
Scheduler: automates
tasks performed by report writer.
If you are installing site
server on multiple computers it is advised that you install
analysis on a dedicated computer.
You can improve analysis by
configuring IIS to gather the following information
-
Referrer data
-
User Agent data
-
Cookies
You MUST use the MS W3C
extended Log file Format configured on your IIS.
If your server is already
logging data when you install a new filter, you need to
RESTART your SERVER for the filter to take effect.
The data falls into 5
categories
1.
Hits: any request by the user
2.
Requests: any hits that successfully retrieve
content
3.
Visits: series of requests by a user
4.
Users: any entities associated with a hostname that
access a site.
5.
Organizations: groups of related users that have
registered one or more domain names.
Usage import enables you to
import log files into the Analysis database, after which you
can start report writer and run reports on the data. With
usage import, you can also delete imported log files and
requests.
Use the following to
configure/Manage usage import
-
Server Manager: primary
tool for configuring usage imports
-
Import Manager: Tool for
importing log files
-
Import History Manager:
Tool for managing log files once they have been
imported.
Once you have configured
usage import, you can import log files with the import
manager. When you import files, you can either import a single
log file or several log files from one log data source in one
import. By importing several log files in one import, the logs
are stitched.
When importing from external
data sources, you can import from
-
User Data from
P&M
-
Content data from
Content Analyzer
-
Advertising data from AD
server
-
Custom data files you
create
-
Document title files you
create
To enrich data in your
database, you can use
1.
IP resolution: (you must resolve the IP address
before),
2.
Whois queries
3.
Title lookups.
You can use Scheduler to
automate a number of tasks performed in the report writer,
custom import, and usage import. This is useful if you want to
optimize system resources or automate regular tasks
Using Scheduler consists of
scheduling import jobs, and adding task(s) to the job. The
Imports runs according to the schedule you specify. Once you
have scheduled a job you can choose to activate it. When a job
runs messages are logged to Uimport.log.
Report writer enables you
to generate reports with which you can identify trends and
your most popular pages, learn how users are navigating thru
your site, and analyze where users come from.
Every report has a report
definition. Report definitions are made up of elements that
you can add, delete, or modify.
Basic elements of the
report definition are
-
Sections: indicated by a
funnel icon
-
Calculations: indicated
by a calculator icon
-
Dimensions: indicated by
a cube Icon
-
Measures: indicated by a
circular icon
-
Presentations: indicated
by tables or graphs
You can run report writer
from 3 interfaces: Windows interface, WebAdmin, Command
line.
Analyzing Web site
Content
Site Server provides a tool
called Content Analyzer, to analyze the content of a web
site.
With Content Analyzer you
can analyze resources and the routes that connects
them
You can access Content
Analyzer through one of the 3 interfaces
-
Windows Interface
-
WebAdmin
-
Command-line
When you create a project
for a web site, Content Analyzer explores the site, starting
with the URL of the site you specify.
Content Analyzer
distinguishes between 5 kinds of resources
-
HTML Pages
-
Images (all image
files)
-
Gateways (programs that
dynamically creates content)
-
Other protocols (ftp,
news, mail, gopher, and telnet)
-
Other resources (audio
files, video files, and all other resources)
A Content Analyzer project
is a file that contains a map, a graphical view of your site.
You create a new project by exploring the site you want to
analyze. You can create a project starting from an URL or from
a file in your file system.
When Content Analyzer
finishes exploring the site it saves the results in a project
and displays the site map. A project contains the following
-
The structure of the web
site
-
A starting URL
-
A project name
-
Any special settings you
select using project options
The options for exploring
include the following
-
Explore entire site or
not
-
Set route by URL
hierarchy: process of determining whether one of alternate
routes makes a better main route from the existing main
route
-
What kind of Site report
to generate
-
Ignore case of
URLs
-
If no default file map
all files in directory
-
Verify offsite
links
-
Honor Robot
protocol
-
Include URLs with
arguments
-
Standard or custom user
agent
-
Copy site to local
directory or not.
When you create your
project you can choose to explore the entire site, or explore
it with limits. For example, you may want to analyze just part
of a branch or explore a single page. You can do that by
specifying the number of pages/levels.
Once you have created a
project you can use Content Analyzer to
-
View statistics for your
project
-
Remap a project
-
Schedule regular
updates
You can use Search
Analyzer's search option to find and analyze resources on your
Web site. Content Analyzer offers both a quick search and a
more advanced search to specify limitless combinations of
criteria.
Each Content Analyzer
window offers different advantages
-
Use the site window to
examine the site’s structure and identify broken
links
-
Use the analysis window
to view results of your searches.
-
Use the project window
to focus on a particular resource
The Site window use 2
methods of showing the site structure
-
The Outline pane
displays pages and resources in a hierarchical tree
-
The Hyperbolic plane
displays an overview map of the site
The analysis window
displays the results of a search and the respective resources
in 2 panes
-
The result pane lists
the resources that matches your criteria
-
The browser pane
displays either the resource as it appears (e.g. image) or
the html code for the resource
The Properties window
displays details about links and properties of the selected
resource or page
-
The resource pane
displays the individual properties of a resource, such as
its author, expiration date, label, URL, MIME type, or
size.
-
The link pane displays
details about the links to and from the resource
There are 3 kinds of link
types
-
Links to resources:
pointing from a selected page to other resources
-
In links from pages:
Pointing to a selected resource from other pages in the
site
-
Links on the route to a
resource: links from the site start page passing thru to the
selected resources
When analyzing links the
main goal is to make sure that links are connecting the proper
resources to one another. You can use the site window to
examine the link structure. To begin examining a link you must
FIRST select a resource and then select a link type to
show.
Usually a link is broken for
one of the following reasons
Usage data can be gathered
from log files and associated with your site. The information
that you can gather consists of 2 types
-
The hit count: number
of times a resource was accessed
-
The referrer
URL.
With this data you can
-
Display the busiest
link
-
Analyze the links to and
from the most popular pages
-
Determine which external
links are linked to your pages
-
Choose a new place to
base a main route for analysis
Ports to remember
Port |
Number |
FTP |
21 |
Telnet |
23 |
SMTP |
25 |
HTTP |
80 |
SSL |
443 |
SQL |
1433 | |