
Washington,
September 15-18, 1999 – London, November 21-24, 1999
Site Server Content Management
Dina Berry
Introduction to Site Server
Site Server is a collection of tools,
applications, code, and components to make web site development and maintenance
easier. The easiest way to use Site Server is to learn what the marketing terms
actually translate into in the product. The two pieces we will be using are
terms 'knowledge' and 'publishing'. The other piece is 'analysis'. Knowledge
consists of Personalization and Membership (the LDAP database), the Knowledge
Manager application, Search, and several components and lots of sample code.
Publishing consists of Content Management and Content Publishing.
If you want to e-commerce enable your web
site, you should take a look at Site Server Enterprise. It makes building an
e-store very simple. Of course, once you want to change the store, things get
more complicated.
Introduction to Content Management
Content
Management is the organization and management of files in their native format
such as Word, Excel, or PDF. While everyone agrees the browser is becoming the
single application interface, this concept is often misused. If the information
is stored in a database, it is easy to see why you want to use an ASP/ADO
(HTML) page to view the data. However, many people think that information
should always be viewed in HTML and take time to convert documents to HTML.
When the document is updated in its native format, the document is re-converted
to HTML. Content Management aims to catalog files in their native format but
allow access via the web. This method removes a lot of unnecessary work and
adds a level of detail (document properties) for finding the document you are
looking for.
Dina Berry
As the XBuilder Product Manager, Dina Berry focuses on product development,
quality assurance, and customer service. She received her B.S. in Computer
Science from Principia College in Elsah, IL, and has worked in hardware support
for Northern Telecom and systems design at Oppenheimer Mutual Funds. During a
three year tenure at Microsoft, Berry worked on the OLE DB and Site Server
teams. She is a conference speaker and the published author of Teach Yourself
ActiveX Programming in 21 Days and Teach Yourself Active Web Database Programming
in 21 Days. Her love of database and Internet technologies has lead to several
e-business web development sites in her own time.
What Problem does CM solve?
Content Management brings the organized
file server to the web. By organized, I mean the level of searchability beyond
what directory or file name is used. When working with a file server, user
standards are generally very loose and governed more by file system permissions
than document content. It's important that the time spent creating a document
is not lost in that the document is rarely referenced as intended or lost
altogether after it's initial use.
Content Management allows you to create
'stores' of content. Each store has a unifying purpose whether it is company,
department, subject, or location. This is up to you to define. Inside each
store, are content types such as White paper, Press Release (or whatever you
want to call it). Each content type is defined with properties that make up
that content type. This allows you to create a single level hierarchical
storage of files by file type and makes these files searchable by all
information submitted about each file.
Note: CM is not designed for multi-tiered storage. Yahoo is a good
example where you can continue to drill into a content section by level until
you have reached the content area you where interested in. A CM store is the
final location and does not contain inherent programmability for this
multi-tiered storage.
Some content type properties will be
universal such as content author and creation date, while some content type
properties will be specific to the content type. For example, time-sensitive
data will probably have timestamps to indicate the data-range of the document's
relevance. This information is not applicable to files that are always relevant
regardless of the date range covered by the document.
It is important to understand that the
content type and content type properties are the criteria for searching the
content store.
Each file type (defined by it's extension
such as *.doc) has three types of property types.
The first is properties that can be derived
from the file. The HTML file type can be scanned for Meta tags or other
well-defined HTML tags. Microsoft Office files have file properties (found in
the File menu of an office application) that can be scanned.
The second type of property type is a
property type you (as the application developer) define. These property types
can be any type of information and will be entered when the file is submitted.
The third type of property type is the
minimum set of properties that any file has: data created, file name, size,
etc. If the file type is not known (i.e. doesn't have a corresponding filter to
read the file's properties) then only second and third property types can be
submitted.
While the first property type (derived from
the file) is timesaving and less prone to error, the usability relies on the
original content author to use these properties. If you have a file type that
does not have a filter (any many file types don't have filters), you can still
enjoy the rich content management features and searchability.
Why should you use CM?
The idea of managing files in a
well-structured and searchable manner is not new. There are several very good
and expensive file management systems on the market. Used properly, any CM
software will enable to you use documents to their fullest value because you
can search on specific criteria. Most file systems give you a bare minimum of
information about a file. You would actually have to open each file to find
what you are looking for.
The idea behind Site Server's CM is that
much of the work has been done in terms of programming and usability. With very
little effort, you can enjoy feature-rich management of your file system.
Adding more content stores, content types, properties, and views into the files
is very easy. Surprisingly easy! You should be able to find all files you are
looking for. Once you have narrowed by the list, you should be able to look at
the list and find the file you are looking for without opening each one.
While everyone agrees Content Management is
important, it is rarely used to its full and time/money saving potential. It is
difficult to have an out-of-the-box solution for web servers because someone
always feels the need to change the look and feel to suit their own purposes.
And with a file server, it is too easy to skip the web-based file submission
and just place it on the file server in the appropriate directory. Content
Management requires a degree of standard use to make it worth your time,
regardless of whose CM software you use.
What can CM be integrated with?
Site Server's CM is a collection of tools,
code, components, and LDAP storage. Each piece fits together fast and well. You
can change each component (a different upload control, a different database
storage) but expect to put time into the integration process. You should think
carefully about when to change and why. 99% of the code is written for you or
can be copied from other files. Any changes will affect how short the
implementation time is.
Access by user id is one of the common
integrations. If you use P&M, you can add access via a membership user. You
can also control access via NTLM. You should have a great understanding of all
the code, before messing with permissions. Most of the files have code
regarding NTLM and anonymous access.
A natural integration to CM is version
control. Version control is the ability to store different versions of the same
file so that you can see the lifetime of a file. This is a common tool used in
software development. While the need is real and the integration with Site
Server's CM is possible, the real-world experience leads me to say that version
control is a lot of work. The problem is that the upload process would have to
update the version control software (via some API set) as well as including all
relevant information the version control submission would need. Some version
control software needs a specific user to be authenticated, which requires
authentication in the web-based submission. My last integration with
Microsoft's Source Safe ended poorly because the COM object for Source Safe
didn't handle authentication being passed via the IIS web server. These are
important issues to resolve before development begins.
Examples of Site Server's Content Management
Site Server CM comes with two working
examples: FPSample and CMSample. FpSample is meant to be a
Front Page application and CMSample is meant to be Front Page-less in that it
is straight ASP. In order to show you how to use CM, I'll also build a third
sample as part of this discussion. It won't be very complex but it will show
you each step and each gotcha.
All the samples have the same basic visual
elements:
Ø
Content viewed by content type (such
as white paper, press release, etc)
Ø
Submission page (works with IE or
netscape)
Ø
Approval page
Ø
Content viewed that I (this user
context) have submitted
There are a lot of ways to view the content
but the code that is readily available is content-type based or user-context
based. If you want other ways to view the code, you should schedule this into
the development cycle. Remember that the only searchability is based on
content-type and content-property type.
Pieces of Content Management
Content Management is made up of the ASP
pages, the LDAP database, the directory structure, the Index Server, and the
Upload Control. The ASP pages are used to place information into and get
information out of the 'Content Store'. The ASP pages also use support files
including images, stylesheets, included asp function files, and Site Server
Rule files (*.prf). The LDAP database is the storage of the data definitions
for the content store. The name and location of the content store, the content
types and the content type properties are kept in the LDAP database. The actual
information that is searched is not kept in the LDAP database, but instead kept
in the Index Server catalog. The directory structure is the 'content store'.
Each subdirectory in the main directory is either a content type or a
subsidiary function such as upload or approve. You can see where a file is in
the process of getting into the content store by where it is currently located.
A file is uploaded to the upload directory. Once the content type is set, the
file moves to the appropriate content type directory or the approval directory.
The upload control is the component that moves the file from the client-machine
to the server.
Architecture of Content Management
The architecture of Content Management may
appear complex because most of the work is not apparent to the user or
administrator. However, the system is really just moving files from the client
machine to the appropriate content type directory. In order to submit any file
(regardless of content type or content author), you use the submit.asp page.
This page has the upload control. The file is uploaded and placed in the
/upload directory. The web site returns the content type choice page. These
choices are determined by querying the LDAP database for that particular
content store. When a content author chooses a content type, the ASP page
queries the LDAP database for content type properties. Once the properties are
entered and the content author submits the form, the content properties are
indexed and the file is moved to the next appropriate directory. This directory
can be the approve directory or the final content directory. If the file needs
to be approved, the file is moved from the approve directory to the content
directory.
Installation of Content Management
Installation of Site Server is divided into
three sections: Publishing, Knowledge, and Analysis. Installation of CM
requires that both Publishing and Knowledge are installed. Please refer to
Microsoft’s Support site for the most up-to-date information on installation
steps and service packs.
The default installation will install the
LDAP server as a Microsoft Access database. For the purposes of CM, this is
fine. CM doesn’t actually rely to heavily on the LDAP server. If you are also
using P&M for a heavy-load web server, you should install the LDAP server
as a SQL Server instead of MS Access.
What is a Content Store?
A content store is the top-level directory
structure. If you need to move, reinstall or backup the content store, you
should backup the directory structure, the LDAP database. Backing up the Index
Server is a choice you’ll have to make. You will also need to backup the index
catalog but the catalog may have more files in its catalog than just your
content store.
Create a New Content Store
Since the Content Store is just the LDAP
database and the file directory, these are the objects you must create and
manage. The first step to creating the store is creating the appropriate
objects in the LDAP directory. This is partially automated for you via a file
called makecm.vbs. This is a VBScript file
that creates the LDAP objects. While most of the file is not meant to be
edited, the top portion (labeled “Start
of configurable part”) can be edited. The top portion creates the content
types, and the content type properties. The content store application that
holds the content types and content type properties is set as part of the
command line switches that are used when the makecm.vbs is executed.
The following code is necessary to create
each content type and it’s associated properties:
Set DocList.ImageLibrary.Fields =
CreateObject(DictionaryProgId)
DocList.ImageLibrary.Fields.ContentAuthor = "Yes"
DocList.ImageLibrary.Fields.Size = "Integer"
DocList.ImageLibrary.Fields.Type = "Yes"
DocList.ImageLibrary.Fields.Title = "Yes"
Set
DocList.ImageLibrary.Fields.CreateDate
= CreateObject(DictionaryProgId)
DocList.ImageLibrary.Fields.CreateDate.Type = "StringTime"
Set DocList.ImageLibrary.Fields.Topic = CreateObject(DictionaryProgId)
Doclist.ImageLibrary.Fields.Topic.Type = "Vocabulary"
DocList.ImageLibrary.Directory = "ImageLibrary"
DocList.ImageLibrary.HTMLFile =
"ImageLibrary.asp"
DocList.ImageLibrary.Description = "ImageLibrary"
DocList.ImageLibrary.Approve = 0
The DocList
object is a "Commerce.Dictionary". This might imply that you need
Site Server Enterprise for the Commerce Dictionary but you don’t. While this is
not the standard dictionary object, it functions just the same as the standard
dictionary object.
The first section of setting is regarding
the content type as it will be added to the LDAP database. The second set of
settings will also be added to the LDAP server but deal with the ASP pages and
the approval.
For every content type, you will need this
section repeated with the correct modifications. Before running the script,
make sure to have the directory for the content management created. When you
execute the script, you designate the directory location, the server, LDAP
server, the IIS virtual root, the LDAP application name, and the user and
password for the LDAP security context. After the script has run, everything is
done except for the ASP scripts. There is now a config.inc at the root of you
content store. If you open it up, you will see something similar to this:
<%
const
DPSVRoot = "/cm"
const
LDAPServer = "dina001:1002"
const
ApplicationName = "dinacm"
const
FromClause = " FROM ContentManagement..SCOPE('
""Z:\dinacm"" ') "
%>
The first constant is the IIS virtual root.
The second constant is the LDAP server path. The third setting is the
application name used in the LDAP server. The fourth path is the “from” clause
used in the queries against the Index Server.
The last step is to generate the ASP pages
that will connect to the content store. In order to correctly generate the ASP
pages, copy the following files from the CMSample directory into the root of
your content store:
|
Saveprops.asp
|
Save properties of content type
|
|
Getprops.asp
|
Get properties of content type
|
|
Common.asp
|
Common functions and values
|
|
Submit.asp
|
Page with upload control
|
|
Type.asp
|
List of content types
|
|
Cpview.asp
|
Content author’s list of submissions
|
|
Approve.asp
|
Approval page
|
|
AdminList.asp
|
[CHECK THIS] list of files awaiting
approval
|
|
Exists.asp
|
Checks ability to overwrite file
|
|
Repost.asp
|
Repost
|
|
Files.asp
|
List of files
|
|
Filename.asp
|
List of properties on files
|
|
Menu.txt
|
Navigation in this ASP application
|
|
Deploy.asp
|
Used to propagate from staging to
live
|
|
Mydocs.prf
|
Rule set to find content author’s
submissions
|
|
Approve.prf
|
Rule set to find files that are not
yet approved
|
|
Defaultviewtemplate.ast
|
Template used to create pages for
each content type;
Has hardcoded URL values of
http://localhost/cmsample/
|
In order to create each Content Type’s ASP
pages, we need to open the HTML version of the Site Server Administration Tool.
While you can view the Content Type information from the MMC version, this
version won’t create the ASP pages. Go to the following URL and log in to the
Site Server Admin site for Publishing:
1) Select the content store and click on the ‘Properties’ button
2) Select ‘Content Types’
3) For each content type:
a. Select the content type and choose ‘Properties’ button
b. All information should be correct, check the following boxes and
choose ‘Submit’:
i.
‘Generate Properties Form’
ii.
‘Generate View Page’
When you are done, you should have 3 new
files for each content type:
|
Contenttype.asp
|
Properties Page
|
|
ContentTypeView.asp
|
View Page
|
|
ContentType.prf
|
Rule for finding files from Index Server
|
A defaultviews.txt file is also created and
contains:
ImageLibraryView.asp|ImageLibraryView.asp
PressReleasesView.asp|PressReleasesView.asp
HeadlinesView.asp|HeadlinesView.asp
You will also need to copy or create your
own \images\ directory as well as modify the menu.txt for you application’s
navigation purposes.
Tips/Tricks
1. Install SS and CM on FAT partition to begin with. Makes everything
much easier until you know the code.
2. CM used NTLM everywhere. You should have a solid understanding of
NTLM and P&M if you plan to use authentication. If you don't plan to use
authentication, rip it out of the code. Recognize that if you do use NTLM, you
can only use Internet Explorer as your browser.
3. The original version of Content Management has problems with some of
the data types in the generated view and property pages. Make sure you can
either fix the code or use different data types before getting too far along.
The CM Sample uses text and LDAP lists only. None of the other data types are
used.
4. The Defaultviewtemplate.ast file will have to modified. It has a hard-coded virtual directory
of /cmsample/. This isn't very obvious but if you don't change it you'll have
errors when you try to run your content store.
5. The 'Type' column from the rule file is throwing an error on the
view page. Remove references to the Type column in the SQL query and result
set.
6. There is no default.htm or default.asp. You will need to
create one or turn directory browsing on.
7. If you want to use the FP Sample or copy it's code, make sure to
change the ADO prog id by removing "1.5" at the end of the prog id.
This symptom for this error is the "Server.CreateObject Failed"
error.
8. Take a look at the other files in the CM and FP Samples. There are
files that are not necessary to copy in order to get your store up and running
but the extra files may be just what you are looking for. For example, the
CMSample has an alldocs.asp file that shows all docs in the store.
Creating your own view page
The view pages that the Site Server HTML
Admin creates are based on Content Type. You will probably want to add more
view pages with different criteria. For example, find all files where property
X = value Y. In order to make a new view page, you need to copy one of the
generated view pages that the HTML Admin created. When you open up the view
page in notepad (not IE), you will see the code for the Ruleset object that
contains the ruleset filename. You will also notice that the same rule file is
included in the file. You only need one of the references. I recommend deleting
the object and just using the include statement.
<OBJECT
ID="FormatRuleset1" WIDTH=487 HEIGHT=237
CLASSID="CLSID:F78EAED2-F867-11D0-9F89-0000F8040D4E">
<PARAM NAME="_ExtentX"
VALUE="12885">
<PARAM NAME="_ExtentY"
VALUE="6271">
<PARAM
NAME="RecordSetURL" VALUE="http://localhost/cmsample/PressReleases.prf">
<PARAM
NAME="FormattingText" VALUE="<% Response.Write
"<TR>" %>
<% Response.Write "<td
VALIGN=baseline WIDTH=42><img SRC=images/globul2a.gif WIDTH=20 HEIGHT=20
SPACE=11></td><td><font face=Verdana size=2>" %>
<% Response.Write " <A HREF=""" &
GetDocUrl(MemRecordSet("path"),MemRecordSet("ContentType"))
& """>" & getfilename(MemRecordSet("FileName"))
& " " & MemRecordSet("Size") & " Bytes
</A></font></td>" %>
<% Response.Write
"</TR>" %> ">
<PARAM NAME="DisplayType"
VALUE="3">
</OBJECT>
<!-- #INCLUDE VIRTUAL="/cmsample/PressReleases.prf"
-->
If you open this file up with VID, the
object will appear as a DTC and you will have to edit the object via the DTC
interface. This may be more cumbersome than you want to deal with if you are
not familiar with DTCs. Editing the file in notepad won't produce any problems
if you edit the file with changes only, no deletions of parameters or object
attributes.
Once you have modified the prf file name,
you may need to alter the code that prints out the record set. Since the result
set generated from Rule Manager always has the same columns, just change the
WHERE clause, you can create a single included file to run through the result
set. Each result set has the same column names so the code can be very generic.
There are also a lot of columns in the result set. You should reduce the
columns to just what you need.
If you would prefer to modify the code for
each view page, you need the basics for a result set. This includes a loop that
rules until the end of the result set (EOF) is reached as well as code to print
out each column’s value for each row. The result set is always called ‘MemRecordset’. The following is
some simple example code for the View file that is placed after the rule
filename include:
Do
While Not MemRecordset.EOF
Response.Write
MemRecordset(“Title”) & “<br>”
Response.Write
MemRecordset(“ContentAuthor”) & “<br>”
Response.Write
" (<A HREF="""
&
GetDocUrl(MemRecordSet("path"),MemRecordSet("ContentType"))
& """>" & getfilename(MemRecordSet("FileName"))
& " " & MemRecordSet("Size") & " Bytes
</A>)”
MemRecordSet.MoveNext
Loop
MemRecordSet.Close
MemRecordset is opened as a persistent,
read-only cursor. You will also want to change the look/feel of the page to
meet your needs. If you want to return a different mime-type such as XML, you
only need to change the view page, not the rule file.
Creating a Rule File (*.prf)
Most of the Content Management code is very
obvious. The only tricky part is the rule file. If you are familiar with
ADO/SQL query code, the rule file code should be familiar but not exactly as
you have seen it before. You should
create your rule file with the Rule Manager application in the Site
Server/Tools program group. Once you have created the rule file, you can edit
it with notepad. Once you have edited the file outside of Rule Manager don't
expect to be able to re-edit with Rule Manager.
When you open the Rule Manager, you will
need to specify the ‘application server’, which is the LDAP server. You can
create as many rules as you want, and have them applied in a specific order.
Each rule in the rule file will have a name for easy referencing. For most of
the Content Management work, you will most likely only have one rule in a rule
file.
The rule file ‘awards.prf’ provided as part
of the CMSample is a good example we can look at. You can open the file in the
Rule Manager. There is 1 rule named Awards. Since there is only 1 rule, it has
priority of 1. The rule is also visible and is:
“Select content
where approved contains yes
and where ContentType contains awards”
The underlining indicates that the word or
phrase is selectable. If you have used Outlook’s Rule Manager, you will be
familiar with how CM’s rule manager works. There is a four-tab property sheet
to help you create your rule set. The rule set is generally divided between
AUO/user properties and content type properties. This is to help you match user
to content or select just based on user or content.
Once the rule file is generated, open it up
to take a look at it. For the above rule, the specific code of the rule file
is:
Do
bstrSSPMQuery = ""
bstrMemLog =
"Awards+"
fSSPMRunVBScript = FALSE
bstrMemLog = bstrMemLog & "Awards+"
If Len(bstrSSPMQuery) > 0 Then
bstrSSPMQuery = bstrSSPMQuery & " OR "
bstrSSPMQuery = bstrSSPMQuery & "(CONTAINS(approved,'yes*')
>0)"
If Len(bstrSSPMQuery) > 3
Then
bstrSSPMQuery = bstrSSPMQuery & " AND "
End If
bstrSSPMQuery = bstrSSPMQuery & "(CONTAINS(contentType,
'awards*') >0)"
If ((Len(bstrSSPMQuery) >0) OR (fSSPMRunVBScript = TRUE))Then
Exit Do
End If
Loop While (False)
The rest of the file is the same regardless
of what the rule is. If you want to
change the sort order (ORDER BY clause), you need to append to the value of
“bstrSSPMQuery”.
You should think about how you would
automatically generate rule files. For example, if you want to generate rule
files and view files in real time, you will need to understand the rule file
syntax and know where to change the code. Repeated creation and use of rule
files will solve this problem. This should only be done as an automation
process enhancement and is not recommended as a general rule of use.
Summary
Content Management is a web application
that eases the file management burden by allowing the files to stored in their
native format (doc, xls, pdf, etc.) as well as making them available from a web
page. The content store is the application unit and is made up of ASP files and
include files, the Index Server, the LDAP database, and the upload control. CM
is a tool in that you can modify any or all of the pieces to fit your needs
while still being an accessible web application.