BizTalk Utilities CV ,   Jobs ,   Code library  
 
 

Great New Regular Expression Tool available on VBXML!

Chris Stefano and Corey Haines have created a neat tool for using and learning Regular Expressions! Check it out
I have the beginnings of a subweb page on Regular Expressions Check it out

Intro to this page (click to expand)

When I work as a contract programmer at different offices, I put my copy of Joe Celko's SQL for Smarties prominently on display on my desk. That's enough for some people to think of me as a database expert. Fortunately, I have actually read SQL for Smarties, so, I am a database expert. Next to it is my copy of Jeffrey Friedl's Mastering Regular Expressions. Few people are impressed by it, but they should be. Whenever I was tempted to think that Friedl's multi-page dissection of a given regex was more detail than I really needed, some practical implication would pop off the page and I was glad I had stuck to it.

SQL and Regular Expressions are declarative not procedural. As such, mastering them requires excursions outside the realm of how-to books. If even bare competence with these tools is your goal, examination of the theory is needed.

XPath and XSLT are also declarative. And they have some other characteristics that will be odd to the Visual Basic programmer. In this page I am going to try to look at "programming" XSLT, with the perspective of a theoretical practitioner or a practical theoretician. In practice, looking at theory means I will mention a lot of stuff might be obvious, but maybe one of those obvious things will be a revelation to you.

Disclaimer: If you haven't even taken a look at XSLT, this page isn't for you. If you are an expert in it, much of this you already know. If you have minimum standards in the area of technical prose, this page is not for you, otherwise, welcome.

Infoworld on XML extensions

In the July 3, 2000 issue of InfoWorld there is a "Spotlight" on XML Extensions. We read in the TOC that "keeping up with XML extensions is a time-consuming endeavor." Evidently so. There are several points where the authors seem confused or give a confused summary. They rate the different extensions (e.g. XLink, DTD, XPath, RDF, etc.) in terms of Relevance, Acceptance and "Standards Prospects", but they do not make clear which extensions have been implemented yet.

For example, both XPath and XPointer are listed as "High" in terms of acceptance. But XPointer is only a candidate recommendation, while XPath became an "official" recommendation last November. MSXML3 almost completely implements XPath, but so far does not implement the additional functionality of XPointer.

There is a somewhat confusing discussion about XSL and XSLT, perhaps inevitably so, since they had such limited space to explain the matter. When XSL is mentioned alone, it is usually denotes XSLT, XPath and XSL-fo (formatting objects) together. When XSLT and XSL are mentioned together, XSL in that context denotes XSL-fo (and not XSLT). It is like a mainframer calling a Mac a "PC", whereas a Mac person uses "PC" to specifically denote something other than a Mac.

XSLT is an approved recommendation (i.e. a standard) and implemented in MSXML3 and a number of other parsers. XSL-fo (just called XSL on the w3 site) is close to being a release candidate.

Here is a few sentences from Infoworld.

XSLT applies XSL formatting rules (style sheets) to produce either a restructured XML document or an HTML page. Internet Explorer has a built-in XSL style sheet that formats XML files for display. XSLT is extremely powerful and can be used to translate documents from one XML layout to another. Overall, XSL and XSLT are two of the most successful XML spin-offs, so they should be considered essential components of any XML toolkit.

Here is some comments.

XSLT applies XSL formatting rules (style sheets) to produce either a restructured XML document or an HTML page. XSLT applies XSLT rules to ....
Internet Explorer has a built-in XSL style sheet that formats XML files for display. IE has a built-in stylesheet that is built around (IE 5) XSLT, CSS1 and DHTML
XSLT is extremely powerful and can be used to translate documents from one XML layout to another. Yes, correct!
Overall, XSL and XSLT are two of the most successful XML spin-offs, so they should be considered essential components of any XML toolkit. XSL doesn't exist yet, so it "success" will catch many industry experts by surprise.

Any article on this matter should also inform the reader that Microsoft seems content with CSS 1 and to ignore XSL, while going full bore on XSLT. Netscape is intent on CSS 2 and XSL-fo (formatting objects). (It doesn't help things that Microsoft's version of XSLT in IE 5 was usually referred to as XSL.) Confused?

Under XML Query we read "At present, the awkward XPath pattern-matching syntax is the only way to search an XML document.". awkward ? XPath is a model of simplicity. How could XPath be any simpler without being vastly less powerful? Have the authors seen the XML Query proposals such as YATL, Lorel, XML-QL or Quilt?

Lastly, there seems to be a lack of perspective. For example-next to the entry on XLink is an entry on XInclude, both are given the same rating (high) in the relevance category. Now XInclude is going to be a really, really handy feature for people who make web-sites, while XLink will (in time) fundamentally change the way the world uses the world wide web.

The XSLT Programming Language?

The first time I encountered people writing about (what we now call) XSLT as a true programming language, my mind had trouble accepting the notion. How could a stylesheet syntax be a language?

Cascading StyleSheets hardly fired my imagination, yet these writers certainly seemed excited about XSLT. In November of 1998, I attended a seminar given by Dr. Steve DeRose (co-editor of XPath,  XPointer and XLink),  and the light was turned on. XSLT was about more than displaying XML on the web. It was about.... well if you are reading this you probably know the many things it will do.

So from that time I realized that XSLT was a true language, but even I was surprised by some of the examples in Michael Kay's book "XSLT".

I have read a couple of books on Artificial Intelligence that work through the Knight's Tourproblem. This problem is how to move a Knight on a chess board so that every square is visited and visited only once. I have seen this problem worked out using the AI languages Prolog and Lisp. Trying to code this in VB would definitely be "non-trivial".

So when I saw a section entitled Knight's Tour in Mr. Kay's book, I did a double take and thought "surely that can't be the Knight's Tour" (as if there were others?). Indeed, it is, Mr. Kay has implemented the Knight's Tour in a 17k stylesheet (with no JavaScript).

Upon reflection the ability to solve this in XSLT makes perfect sense, since XSLT is "powered" by recursion which is what powers Lisp and (in a different way) Prolog.

So the bottom line is that XSLT is a true language and one that gives the programmer the opportunity to be clever and elegant.

VB and Real Programmers

I once worked to next to a C++ programmer who could hardly hide his contempt for VB. One day our boss told him to write a utility and since he wanted it fast, he told this C++ programmer to write in VB. Oh the insult! I helped him get a start showing him this and that and then let him go at it. A couple times he asked me some questions and then on the third day he came over to my cube exclaiming "It's done!, it's done! This program does real work and I did it in three days." I told him that it is a fairly common thing for VB programmers to complete several programs within their lifetimes.

Books on the Declarative Tools available to VBers

Two Smart Guys Overheard

If you have been programming with the MS XML parser for a while, you were no doubt surprised to discover that transformNode() is not part of the w3's XSLT standard. Being able to invoke transformNode() off of any XML node is an extremely handy way of processing a small sub-branch. The trick was to have a template that matched a period (.) instead of having a template that matched on the root (\) as you can see in the following code:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match='.'>
    <xsl:apply-templates  select="*"/>
</xsl:template>
<xsl:template match='fooKids'>
    <h2><xsl:value-of/></h2>
</xsl:template>
</xsl:stylesheet>
   
Set xNode = xmlSession.selectSingleNode("fooDads[@id='Mark']")
sAnswer = xNode.transformNode(xmlSession.documentElement)
 

So with the code above, the instruction <xsl:apply-templates select="*"> in the first template would have selected the children of <FooDads id="Mark">. If you try the same thing with MSXML3 it complains:
. may not appear to the right of / or // or be used with |.

This was incomprehesible to me since no slash was near there. As it turns out, the dot match is simply illegal in XSLT. The explanation from Jonathan Marsh, MS's XML main man: "In IE5 XSL, match="." literally means "true if the context node exists". Since XPath and XSLT both require a context node, this is always true. Several people complained about this and there was some interesting posts from Michael Kay, creator of SAXON and Mr. Marsh.

From Mike Kay

XSLT 1.0 only allows a stylesheet to be applied to the root node of the source document. The Microsoft transformNode() is a proprietary extension that allows the transformation to be applied starting at any node (or possibly just an element? I'm not sure). But there is no XSLT match pattern that matches "the starting node of the transformation", because it isn't needed as (in the standard) this is always the root.

If the stylesheet needs to know what the starting node was, I'd suggest passing the information in as a parameter. In fact it's probably good advice not to use transformNode() on any node other than the root, to keep your stylesheet portable: I think you can always achieve the same effect using parameters.


From Jonathan Marsh

Sorry, Michael, I don't agree with your analysis here at all. XSLT is designed as a whole to enable incremental regeneration. To suggest that an interface that exposes incremental execution to the programmer is not allowed goes against both the letter and the spirit of the XSLT recommendation. To suggest that an API to XSLT processing is proprietary might lead one to surmise that there IS a standard API, which is also not the case. The transformNode method offers significant benefits not available in other APIs, and enables applications that would be impossible with a more limited API. As such it provides a differentiating factor for MSXML.

There is nothing wrong with a template that matches all nodes, and thus is sensitive to the context it gets executed on. Such a stylesheet is completely portable, although the limitations of other APIs may make it impossible to get the desired results. I'm not sure how parameters can be made to solve this without proprietary extensions like saxon:evaluate() anyway. Even if it is possible, passing the parameter into the stylesheet must be done through proprietary interfaces.

Back to Mark Bosley

If you have read Michael Kay's XLST, you know he is brilliant and a visionary. But in this case I have to side with Mr. Marsh. TransformNode() is simply great, for a client-server programmer moving to XML transformNode allows using XML in all sorts of small things, as well as displaying web pages.

Please Hang up and Try Again

Another Regex Code Diet

I don't want to beat a dead horse, well actually I do. In fact, I'll beat two of them.
Need to prevent non-alpha numeric characters?
Need to validate an email?
Corey Haines reminded me that there is an entire world out there beyond .org and .com. A meager correction

AAAgggggggggghhhhhh...

5/20/2000- I have just been doing some DOM manipulations in server-side JavaScript. The XML file was large and deep and the exercise got so painful that I ended up doing all my coding in VB and then porting it back to JavaScript. The Visual InterDev debugger is not a debugger! It is just a post-mortem tool. JavaScript's lack of an Option Explicit statement, the case-sensitivity of variable names and lack of an interactive debugger is a sorry combination. Have people who diss VB ever used VB?

AAAgggggggggghhhhhh...(part 2)

6/11/2000- I am working with an e-Commerce framework that is nothing less than a "revolutionary architectural model"! Wow! I can't tell you the name, because what the company saved on programming, it probably spends on lawyers. The data is stored in SQL Server, but it is availiable to an ASP programmer through an "xml cloud". Not a bad concept. (editor's note:this last sentence was not ironic, I'll point those out to you whenever they occur).

I created some customer sign-up pages following mock-ups done by an advertising company. I didn't include the country, because the mock-up didn't have a country and the xml-schema did not require it. We were able to create, save and edit new customers and so we moved on.

Later, we began to try to do some purchases and the system failed and we didn't know why. Did I mention that the documentation is horrid? The documentation is horrid. Some high-level pages aimed at managers and then a bunch of minutiae written by what I would guess are contract tech-writers.

Although there are some audit functions in the product, the audit functions didn't work or they maybe they did work, but they were not designed to specify why an error occurred. We couldn't really see what was going on because the error was occurring in a C++ module and nobody at the company could tell us what the problem was. Well, at length, we finally were able to figure it out. The purchase routine failed if the country was not specified!

OK, dear reader, put your sarcasm hat on and help me out!

Reader: You said there was a schema, why didn't they require the country there? Me: Your tone is not sarcastic enough, ask me the next question. Reader: Why didn't the C++ component tell the audit facility it needed a Country? Me: hey you are getting better! Reader: Doesn't the fact that it is a framework mean that it is supposed to be designed, rather than shoveled together and that is should have good documentation? Isn't the whole purpose of a framework to make programmer's more productive? Now you know why I entitled this story Arrrrggggh

Moral

If you have data requirements down the pike, enforce those requirements at the toll booth! Would you let a user of your VB program to enter a bunch of alpha characters in a phone number text box and then have an operation fail later on without telling why it failed? No, that would be bad programming.

 

Where did <xsl:eval>xml</xsl:eval> go?

If you had done much with the old MSXML parser you were no doubt disappointed when you discovered that <xsl:eval>xml</xsl:eval> no longer worked in MSXML2. <xsl:eval>xml</xsl:eval>in a template body output the xml branch of the context node. Thus, when your stylesheet seemed to be missing matches you could drop <xsl:eval>xml</xsl:eval> in and see what was going on-it was a great debugging tool.

I experimented for an hour trying to come up with a work around. I gave up and posted a request to the MS newsgroup on MSXML2 and then found my answer 5 minutes later on in the SDK info. Here it is:


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
         xmlns:msxsl="urn:schemas-microsoft-com:xslt"
         xmlns:user="http://mycompany.com/mynamespace" version="1.0">
<msxsl:script language="JScript" implements-prefix="user">
function xml(nodelist){
        return nodelist.nextNode().xml;
}
</msxsl:script>


<xsl:templatematch="/">
<xsl:value-of select="user:xml(.)"/>
</xsl:template>
</xsl:stylesheet>
Nota bene:
  • the script block uses a separate namespace msxsl
  • the period indicates the current node, not the keyword this
  • don't forget the new namespace ! xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

Case Insensitivity

For those of you who used the XSL pattern language in IE 5.0 you might be wondering why case-insensitive comparison is not included in XPath. I think the problem comes from international considerations. For example, does HAENDEL equal händel? Not to an English speaking programmer, but to a German one it does. (This is so because upper-case characters don't take an umlaut)

The XML editors mandate is to create the standards quickly, and have these standards be easy to implement, and have them uncontroversial enough that every implementer won't create a different flavor. Thus, case-insensitivity didn't make this cut.


Here's an article I wrote with client-side JavaScript

Shrinking Code With the XML Parser
-DevX Exclusive-
by T. Mark Bosley
XML promises to provide richer data, but one client/server programmer discovers that XML (in the form of the IE parser) also delivers new programming capabilities that allow you to do more with less code.


Mar Bosley is a programmer based in Milwaukee, WI. You can reach him at mark@lightcc.com 

Stay tuned for a VB string library...........
 

Recent Jobs

Software Developers Needed in Charl
Sr. Software Engineer - Analytics
Immediate Mainframe openings for Ch
Immediate TANDEM-TAL openings for C
Immediate ASP.NET/C# Openings for C

View all Jobs (Add yours)
View all CV (Add yours)



go to meeting
swimming pool contractor
teleconferencing
water softener
Teleconference
Host Department NOLIMIT Web Hosting
MSN
sunglasses


    Email TopXML  

Front Page Daily Stuff TopXML Forum XML blogs XML Newsgroups BizTalk Biztalk Utilities Biztalk Utilities Tutorial B2B SAP XML Microsoft .NET Dotnet System XML Soapformatter SQLXML XMLserializer XQuery PHP PHP SimpleXML PHP XML Dom PHP XML RPC PHP XSLT Java Java Java XML Xalan Microsoft ASP ASP Schemas XML SQL Server XML XMLDom XSL XSL Tutorial XSLT Stylesheets General Javascript CSS XHTML WAP