|
|
Five Principles of Distributed DevelopmentThe answer to this problem is what I call cooperative applications. These are Web applications built with some conventions that facilitate runtime cooperation between programs. In the sort of loosely coupled and managed network environment we're talking about, a bare minimum of policies is essential. As a practical matter, coordination and agreement between development teams must be kept to a minimum. Experience with object oriented development and reuse in the past suggests that complicated policies and long lists of rules won't survive contact with actual development. I've formulated five basic principles that promote cooperation between distributed applications. They are general in nature and capable of implementation on any Web-hosting platform. Thus, while my examples are largely drawn from Microsoft technology and tools, there is nothing in the principles that prevents implementation on Unix in Java, for example. The Five Principles of Cooperative Network Application Development are as follows: 1. Applications Will Be Built From Coarse-Grained Services – applications will be implemented by coordinating and combining the results obtained from server-based modules, termed services, that are larger than individual components and which answer a specific question, problem, or task. 2. Services Will Be Discovered By Querying Directories – applications will find the location and name of the services they need at runtime by querying a directory. They will ask not for a specific implementation, but rather for any service implementation that addresses their needs. 3. Services Will Be Provided As Self-Describing Data – applications will deal with services by exchanging structured data. The data will be written according to vocabularies defined for the problem or task that the service addresses. 4. Services Will Be Enlisted On A Transient Basis – applications will find and use services in a small number (preferably one) of round-trips and will not require state to be held between round-trips. Neither application nor service will assume any long-term availability of the other partner in the exchange. 5. Services Must Support Extension And Degrade Gracefully – services must take future enhancement into account, both on the part of their own logic and exchange formats and those of other applications and services. When encountering a new version of exchange data, it should make as much use as possible of the data. Applications and services must never break if they do not receive exactly the format they expect. Before considering each principle in greater detail, I should note that implementing these principles requires slightly more coordination in practice. I will be describing some conventions for implementing each principle later in this presentation. You are free to come up with different conventions to reach these goals in your own organization. Even with the conventions, the degree of commonality that must be enforced throughout the organization is small. If general agreement is reached at a high level, individual programming teams can be permitted a great deal of freedom in selecting technologies and structuring applications without compromising the Five Principles. I believe that as a practical matter, this is the most that can be achieved on a repeatable basis. Fortunately, you will see that a great deal can be accomplished with this small beginning. ServicesThe single most important concept in this development philosophy is that of services. I define a service as an arbitrary collection of application code that is confined to one tier and accomplishes one well-defined task. A service is bigger than a component, but smaller than an entire application. In Web applications, a service can generally be implemented as a single Active Server Page. Not every ASP will be a service. A service models one useful part of the business domain. It should describe some core activity or entity of your organization. A manufacturing organization, then, would have services to describe assembly lines or cells, as well as services to describe orders, customers, and products. If you have segmented your application well, each service can represent queries to a service and their results using a pair of well-known data formats. An order will typically have a structure to denote individual line items, while an assembly line will describe its processing stations and current status. Services will communicate via HTTP using an open standard for data representation that I will describe in much greater detail shortly. A client of a service need only know where to find the service and what data formats the service requires. No knowledge of how information is obtained, stored, and processed is required of the client. Within a service, however, you are free to use whatever technology you wish. This is because a service is wholly implemented by a single development team. They can control what technology is used, specifying, for example, an Oracle RDBMS for storage and DCOM for communication between the ASP and an application server. A service can be as open or as proprietary as is needed for its efficient implementation. When crossing service boundaries, however, we need to stick to open standards and the Five Principles. Finding Services through DirectoriesThe first thing a prospective user of a service needs is the URL for the service. Common techniques from the past have included machine-specific registry settings or initialization files. Not surprisingly, this sort of practice is fast giving way to the use of directory services. Directory services, usually referred to simply as directories, are scalable stores of information describing resources accessible through the network. A directory is accessible to any machine or program on the network. Applications connect to their directory to obtain the information they need to find and use network resources. Using directories gives us two important features of our philosophy: location abstraction and dynamic resource discovery. An application that needs information about orders should not know about particular ASPs and servers. Instead, it should ask the directory for the URL of any service that takes queries about orders and replies in terms of an order vocabulary. Resources that are taken off the network have their information taken out of the directory. Since directories are hierarchical, we can control the visibility of services. We might introduce a new service by placing an entry in the local organization's portion of the directory, then move it higher as it proves its worth. Communicating through Self-Describing DataConventional data representations are brittle. Get a field out of order or run over by a single byte and the data is unusable. More importantly, every program that uses a particular representation has to be on the same version and execute the format exactly. On the sort of network we're talking about, we'll frequently run into forward and backward compatibility issues. A service may have a minor bug in how it implements a given format – small enough that a human could understand what was being communicated, but an error nonetheless. If we use a tagged data format to build our structures, our programs can read the tags and know exactly what it is seeing at any given time. Use Services and Release ThemHTTP is stateless by design. Preserving session state between pages is a challenge, therefore, for Web application developers. This is particularly true of Web applications that must be scaled through the use of server farms. Worse, we'll be using individual Active Server Pages within applications to create new applications. It is therefore unreasonable to expect a service to hold state for us. The consumer of a service, whether it is the end user client or an intermediate service, is the entity that should hold state as it determines how the partial results from services are knit together to form a complete answer. When building a cooperative application, then, make sure each HTTP round trip works like a transaction. A service should perform an atomic element of work, and the state of the data on which a service relies should be consistent before and after processing a request. That way, if a client loses communication with a service, the service will be undisturbed. The team responsible for the service won't care if your application is requesting the service because it won't disrupt their application. Similarly, if your application loses a particular service, it can simply go back to the directory to locate another implementation of the vocabulary in which it is interested. Promote Extension, Tolerate Degraded DataDifferent organizations will look at the same entity in different ways. Almost any part of a business will want to talk about customers, for example, but each will have its own take on the concept. If we use tags wisely, applications will be able to receive data in a strange format and extract the parts they recognize. XML has the advantages of wide acceptance and simplicity. Therefore, we'll use XML to transmit data between clients and services. Almost any computing platform can process XML insofar as almost any platform has facilities for processing text. If a ready-made parser is unavailable on that platform, one may be constructed. Text is less platform dependent than other data types. Text changes more to track human languages than to track processors. As you will see shortly, a few simple conventions will let us write XML vocabularies that promote cooperation between disparate organizations and their applications. Building Business Services in ASPWhen it comes to implementing the Five Principles using ASP, nothing is more important than divorcing presentation issues from data. It is vital that our services speak XML, not HTML. An ASP implementing a service is expected to be able to parse any of the request vocabularies listed with it in the directory, and return the appropriate XML response vocabulary. The ASP becomes an opaque package hiding the implementation details from the client. Within the boundaries of the ASP, I strongly advocate using software components to speed development. Building an all-XML ASP is not as difficult as it sounds. Visual Studio generates an HTML shell for an ASP by default. If you serve the shell without making any changes, the receiving browser gets a well-formed HTML page. If you want to return only an XML document, strip away all the tags surrounding the script tags. Be sure that when you call Response.Write (or use the more literal <% = something %> ), you are returning XML text. If you wish to use the default style sheet for XML that is native to IE 5 (a very useful debugging technique), be sure to call Response.ContentType="text/xml". You should also configure your server for the XML mime type to ensure any XML files served from disk are accompanied by the proper HTTP header. Directory ServicesDirectory services provide a hierarchy of information regarding the resources deployed on a network. Usually, that hierarchy mirrors the structure of the organization served by the network. This allows a client of the service to narrow its search scope to the part of the network of interest to it (also decreasing the impact of the search on the service itself). The service provides one or more APIs to the store and is optimized for read access. While a directory obviously needs to accept write operations, the resources of the network will typically change far less often than the directory is queried. This allows directory designers to make implementation choices favoring speed over flexibility. Unlike a relational database, which we query in ad hoc and possibly unpredictable ways, a directory expects a structured query that indicates a path through the hierarchy to the desired data. Two major directory services are the Active Directory in Windows 2000 and the Netscape Directory Server. We will be referring to Active Directory in our examples. Both support LDAP, an open standard protocol that is rapidly becoming the de facto standard for directory access. A directory service has a schema that describes the permissible entries in the directory. A schema has classes that describe the objects that can be described in the service and properties that may be used to describe the attributes of any particular object. What is LDAP?LDAP, the Lightweight Directory Access Protocol, is a simple protocol for querying compliant directory services and uniquely naming the objects contained in the directory hierarchy. Some directory objects are containers in that they may contain other objects rather than simple properties. Objects are given distinguished names, a naming technique to combine the common name (CN) of the object with elements that indicate the object's location within the hierarchy. Thus, a DN includes the names of the directory objects that contain the object in question. For example, one DN might be CN=PrintSrv, OU=Sales, DC=Widgets– this means that the object PrintSrv, perhaps a print server, belongs to the Sales Organizational Unit (OU) which is served by the Domain Controller (DC) named Widgets. This DN locates and names a specific machine in our network. LDAP also provides a syntax for its query language, and language bindings for the access protocol. LDAP QueriesWhen we wish to retrieve information about a particular object from the directory, we bind to the object and then query it for its properties. At a minimum, we need to provide a base that anchors our search in the directory. To further restrict the search, we might wish to add a filter, specify which properties we want to receive, limit the scope of the search to some portion of the directory tree, and express some preferences as to how the search is carries out, i.e., some limit on the number of items returned or a time limit for the search. Everything shown in the syntax on our slide except the base DN (<base DN> in the syntax, "<LDAP://sales/DC=widgets>;" in the example) is optional. The base domain name is needed to specify the root of the search. The example given in the slide provides a DN naming the sales container in the Widgets domain (note we are binding to a container, not an atomic object). Since we don't want to hear about everything in that container, we limit the search with a filter: give us every object whose class (from the schema) is User AND whose mail property (email URL) begins with the letter 's' and has any other characters after that. We are only interested in the common name and email URL (cn and mail) properties. Finally, we limit the search to the subtree whose root is our base DN. This is why we specified a container for our search – we have a general idea of where to look in the directory's tree, and we want to proceed to search from that point. COM ComponentsLDAP provides a C language API for access, but a much more productive approach for ASP programmers is to use the Active Directory Service Interface (ADSI), a family of COM components that greatly simplify directory access to LDAP-compliant directory services. ADSI is built on the idea of a service provider, which is a COM component that encapsulates a particular directory service. These need not be LDAP directories; indeed, ADSI includes providers for the current NT directory service in addition to Active Directory. We will focus on LDAP, however. The ADSI provider for LDAP lets us bind to a particular object and query its properties using LDAP query syntax. An interesting facet of ADSI is that it also includes the ADO Data Source Object for ADS, a component that puts a relational face on the Active Directory. This allows us to make ad hoc queries using either LDAP or SQL language queries. In ADSI 2.5 beta, however, the ADS DSO is read-only; if we wish to change a property of an object found with the DSO, we need to use the LDAP provider to bind to the object before we can write the new property value. Some ConventionsWe need a few conventions to implement the second principle. All teams that are working with the Five Principles within an organization need to follow these conventions if they are going to be able to find the services they need. While I will lay out the conventions I use, you are free to design your own so long as they are consistent with the idea of locating services. The idea behind the conventions is to provide the information an application needs to find a service in a consistent way. An excellent way to promote the use of the conventions is to write a utility component that implements the conventions you decide upon on top of ADSI and the ADS DSO. You need to provide a known class to contain the information for finding your services. Your objects must list all the vocabularies the service can speak since that is how prospective clients will search the directory. Once found, the object representing your service must provide a URL to the ASP that implements the service. In short, a prospective client searches for a service that provides a needed vocabulary and expects to receive one or more URLs it can use to obtain service. Active Directory ImplementationI use the serviceConnectionPoint class provided in the Active Directory schema to model the interface between a client and a service. Since other users of the directory may also be using this class for other purposes, I name all service objects with the prefix 'xml'. That way, I can use the prefix as a filter criterion, ensuring I receive only those objects that represent services as defined in the Five Principles. Next, I use the multi-valued property serviceBindingInformation of the serviceConnectionPoint class to hold a list of the vocabularies my service uses. Finally, I use the url property of this class to hold the URL for my ASP. I omit the protocol identifier – http:// - since I will always be using the Web to access my services. Here's the format for an LDAP query that uses my conventions: <base DN>;(&(objectClass=serviceConnectionPoint) (serviceClassName=xml*));url,serviceBindingInformation;scope
For example: <LDAP://sales/DC=widgets>;(&(objectClass=serviceConnectionPoint) (serviceClassName=xml*));url,serviceBindingInformation;SubTree Collaborating on DataThe heart of the Five Principles is the ability to work with less than perfect data. I've developed a few techniques that allow me to facilitate this in XML. First, many kinds of services will need to return collections of objects. Since we cannot know the syntax of all vocabularies that users will ever develop, we need to agree on a convention for expressing collections. That way, we have a clear idea of where the collection object ends and the contained objects begin. I've already said that different services will have different views of the same real-world object. Each specializes the object with attributes specific to their view. By providing a mechanism for specialization, we give services and clients with differing views a way to locate the information they have in common, and disregard specialized properties that are of no use to them. Finally, data schemas change over time. Our understanding of objects deepens, and we detect errors in how we expressed our understanding. With a scheme for expressing version information, we allow services and clients to reconcile data from both older and newer versions to the version they understand. As with specialization, the focus is on permitting a using client to get to the data it can use and discard the rest without compromising the integrity of the object. CollectionsIn my convention, the root tag is always Collection. Every immediate child of the root is the root tag of one of the contained objects. The Collection tag may have any of three attributes to express information about the collection. ObjectType is a list of the root tag names of the contained objects in the collection. Order tells the client how the data was ordered. If it is sorted, it gives the names of the tags used for the sorting. Finally, Name gives a name to the collection. Order can take on the values sequential, ascending, or descending. Sequential simply indicates that the contained objects are provided on an as-is basis and are not sorted. No tag names are provided. For ascending or descending, however, we include a comma-delimited list followed by a semicolon and the sort order keyword. SpecializationSpecialization allows an organization to agree on a small set of objects that have universal meaning throughout the organization. Developers working for specific departments or teams within the organization are able to extend those objects without leading to conflict with other teams expressing different views of the data. Most importantly, a client that cannot find a service providing exactly the vocabulary it needs can use a more specialized vocabulary and extract what it needs. If it can provide meaningful default values for properties, it can also use a service providing a less specialized form of the needed vocabulary. When specializing a vocabulary, a complete document representing the generalized vocabulary is embedded as a child of the specialized form. The specialized properties are provided as tags on the same level as the root of the generalized form. Note that we can nest subdocuments, allowing us to specialize to any degree we wish. The example on the slide shows how the generalized vocabulary Customer is specialized one level to create the specialized vocabulary OrderCustomer. A Customer document is embedded with its root element <Customer> as a child of the specialized document's <OrderCustomer> root element. VersionsXML vocabularies evolve their schemas by adding new properties, deleting obsolete properties, and even changing the nature and structure of some properties. As an example of the latter, a tag that contains only text might be refined in a later version to contain elements so as to express more complex structure in the property. Backward compatibility is always a challenge in this situation. The loosely coupled networks the Five Principles are intended to address have a harder problem: forward compatibility. Resolving the mismatch between the schema versions understood by a service and its client involves being able to reconcile some arbitrary version to some other version. That is, a client must be able to take the information passed by the service and convert it to some arbitrary version of the vocabulary without losing validity. The convention I use is to include a Version attribute on any document that expresses several versions of the data for a particular object. I also use an EarliestVersion attribute, which expresses the earliest version of the schema that can be recreated from the information included in the document. When I version an object's data under this convention, each immediate child element of the root becomes a container for the properties of the object as of some version. Versions are expressed in sequential, ascending order. The first version subtree contains a baseline set of properties. Every version subtree after that follows three rules. New properties are simply added. If the structure of a property changes, the new structure is written to the version subtree. Dropped properties are ignored. Client software must reconcile the data by exception. It starts with the earliest version and proceeds until it reaches a version later than the one it is prepared to accept. New properties are added. If a property is found that existed in a prior version, the old property and its entire subtree is replaced by the new version of the property. Dropped properties will still appear in the reconciled version, but the client can ignore them either as reconciliation takes place or when the reconciled object is used. After all, the client understands the version and "knows" the dropped property is not part of that version. Avoiding Shared StateWe now move away from explicit use of the Five Principles and into the area of generally good practices for intranet and Internet applications. Shared state is a problem for Web applications. HTTP is inherently stateless; each page request is a complete transaction as far as HTTP is concerned. While this makes for a robust protocol, it ignores the fact that applications generally require state. HTTP was originally intended for simple publishing, not applications. What can we do? ASP provides stock objects for maintaining state (i.e., Application and Session). Unfortunately, these require a client to make all its requests against the same server. Sites that scale well generally use a Web server farm approach. Scalable directory services like Microsoft Site Server's Membership Directory or Active Directory may be used to store session and application state information in lieu of using the Application and Session objects. These directory services may be accessed from script via ADSI or Site Server's Active User Object (AUO). No matter what approach is used, persistent state introduces complexity to Web applications. I believe the appropriate technique is to let consumers of data – browser clients or intermediate services – assume responsibility for maintaining state. They are, after all, the party most concerned with the integrity and use of persistent state. More important from the perspective of cooperative network applications, the resources that help create state may not be available from invocation to invocation. Use of directory services allows us to find a replacement, but the consumer must be able to provide all the needed state information. Handling ChangeThe Five Principles give us the tools to handle change and uncertainty, but there is much we can do to ensure we get the most use out of the Five Principles. Use directory scope to your advantage. Introduce new services as close to the known clients as possible. As the service proves itself, you can migrate the directory entry to other organizational units. Never raise a directory object to a higher level unless you are prepared to handle increased traffic and are sure the service is a robust implementation of the vocabularies it purports to handle. Circulate the schemas of vocabularies you develop outside your own organization. A client can only search for vocabularies it knows about. A team that has been using the generalized form of your vocabulary might be interested in the specialized form if you tell them about it. When writing script to process an XML document, write the code so that it is tag driven and assumes as little context as it can without losing meaning. That way, if the document is well-formed but has a small mistake that compromises validity, your code will handle it. For example, if the vocabulary schema for a NAME tag assumes that the child elements will be FNAME, MI, and LNAME, in that order, do not write your code so that it requires that order. If you think about it, that context isn't meaningful; it is an artifact of how the DTD or schema was written. If some programmer makes a mistake and writes LNAME first, you want to be able to accept it and continue. This is a prime benefit of a tagged format like XML; use it whenever possible. Use versioning conservatively. While it imposes its own burdens on programming, you can alleviate this by writing utility scripts or objects. If you think a service will have a long life, use versioning. When writing client code, be prepared to encounter versioning. The same goes for specialization. You may be expecting CUSTOMER. If you receive SVC_CUSTOMER, dive in and look for the general form, CUSTOMER. The Five Principles provide the tools for overcoming many of the problems that are encountered in loosely coupled networks of Web-based applications. It is not enough to follow the Principles like a recipe, however. You must take the philosophy to heart. When writing a service, provide more than the bare minimum of information. When writing a consumer of a service, be prepared to perform as much processing as you can with the information provided. Be ready to insert reasonable defaults for properties that aren't provided when you receive a format more generalized than you were expecting. Don't insist upon validity unless you absolutely need it. Read tags and their attributes and infer what you can from tags you don't recognize. Issues of ScaleUsing ASP to implement services reduces the problem of scaling an application in our approach to that of scaling a Web site. Keeping state as close to the point of use as possible allows us to use server farms without difficulty. If your service uses an application tier outside the Web server, the tier must be scalable. Your data storage scheme, generally, an RDBMS, must also scale well and work with a stateless Web tier. Whenever possible, make use of asynchronous messaging to smooth out spikes in demand. Many applications on the Web do not require that an activity be completed at the time of demand, only that the requesting client receive positive indication that the request has been received and will be performed. Some activities – fulfillment of an order involving physical goods, for example – do not occur at the pace of digital service. In that case, asynchronous messaging is a necessity. Sometimes, the request is only a command to perform some unit of work. In that case, there is no need to block the client until the work is performed. In fact, holding the resource will adversely impact all other clients. Accept the request and send and acknowledgement. Let's look at a Web-based architecture that uses these techniques. A Scalable ArchitectureOur client uses scripts for two reasons. First, any activity that doesn't need to be handled by the server can be off-loaded to the individual clients. Second, we can use scripts to maintain state information for the application. Client requests are received by an HTTP server farm (which may be implemented using the Windows Load Balancing Service) and allocated to individual servers to manage demand. No assumption is made as to the identity of the server handling any given request. The ASPs that implement the services of this tier use components. Some of these components are either distributed objects executing on an application tier, or use resources (e.g., a database) that are not resident on the Web servers. In this diagram, asynchronous messaging and a scalable application tier are used to address the demands placed on the application tier. First, if the semantics of the application permit asynchronous messaging, we use it to smooth out spikes in demand. We size the application server for the average load and use queues to handle the overflow. Backlogs are cleared when demand eases. In such a case, the consumer – in this case some ASP or component on the Web tier – must check the output queue to retrieve results. This should not be the entity that placed the request, as that would block the browser client, thereby negating one of the benefits of asynchronous messaging. Second, we can use COM+ Load Balancing or some other application server software (e.g., Netscape Application Server) to scale the application tier. The goal is to share extra computing power across multiple consumers (the Web servers) without compromising the integrity of the application. The data tier of the application is the hardest to solve. Databases are stateful by definition. The traditional approach is to throw hardware at the problem. High end Unix servers with large numbers of processors are one solution. Using a mainframe as the data repository is another. If you wish to use PC hardware, you should consider partitioning the database. In this technique, you come up with some rule for performing a horizontal partitioning of the data. In a simplistic scheme, you might decide that all records whose index field begins with the letters 'A' through 'M' are in one database, while the rest are in another. Both databases use the same data scheme. Partitioning allows you to use multiple, inexpensive servers, but it forces the application (and perhaps the Web) tier to embed the segmenting rule. In addition, you must ensure that your segmenting rule distributes the load evenly across database servers. In spite of these drawbacks, partitioning the database is a useful and potentially inexpensive alternative to high-end hardware. SummaryWhat have we learned about cooperative applications? First, I believe the future of Web application development belongs to networks of loosely coupled services offering data and processing on a dynamic basis. It's the runtime equivalent of "Build Before Buy". Applications should dynamically enlist existing resources and string the results together to form a complete computing solution to a given problem. The Five Principles of Cooperative Network Application Development are designed to promote the successful creation and maintenance of such applications using existing Web technology. Work to get your organization to agree to use the Principles and whatever minimal set of conventions you need to implement them. I believe this to be a reasonable task in terms of management and social engineering. Anything more complicated is doomed to failure. If the Web has taught us anything, it is that simplicity survives, thrives, and spreads. Within the boundaries of a service implementation, use the best tools for the job. Don't insist on or expect standardization in terms of tools and component technologies. Let development teams use the tools they know best. Use components wherever possible. They are a modular solution that runs faster and needs less maintenance than long stretches of server-side script. The Web is a rapid-application development environment, and components feed into that process. The service can be as proprietary as you wish. Restrict the interface between services to the use of open standards. Even if you are building several services for your own application, assume that some other development team will use one of the services by itself. If you have used open standards – such as XML for data exchange – they will be able to do that. If you instead rely on proprietary object technologies or a protocol of your own devising, other teams may not be able to use what you have written. Make it easy to share the results of your labors. Popular software always gets budget money for subsequent enhancement and maintenance, while single use software languishes in the hands of the team that constructed it. None of the conventions I presented to aid in the implement the Five Principles are carved in stone. Doing so would be contrary to the Principles themselves. They are a philosophy as much as they are a set of rules. Find whatever conventions and techniques work for your organization. Use the Principles as the one common thread of your organization's development. If everyone agrees to cooperate, everyone gains the freedom to innovate. |
|
|
|
|
|
| |
|
Email TopXML
|
|
Front Page Daily Stuff TopXML Forum XML blogs XML Newsgroups BizTalk Biztalk Utilities Biztalk Utilities Tutorial B2B SAP XML Microsoft .NET Dotnet System XML Soapformatter SQLXML XMLserializer XQuery PHP PHP SimpleXML PHP XML Dom PHP XML RPC PHP XSLT Java Java Java XML Xalan Microsoft ASP ASP Schemas XML SQL Server XML XMLDom XSL XSL Tutorial XSLT Stylesheets General Javascript CSS XHTML WAP |