BizTalk Utilities CV ,   Jobs ,   Code library
 
Home Page


Add/Edit your code items
Search the code library
Browse for the code library


SQL XML
Creating SQL Statements with XSLT
SQL straight to XML w/ transform
History Of XML
History Of XML And What Is XML.
Order Automation
You enjoy while you are away and let your machine do the job for you.
SQL Server 2000 User Defined Functions - A Powerful concept
XML/XSLT Maker
Generic ADO recordset to HTML table using XML / XSL
How to display the first three nodes of a XML file with XmlDocument?
What is the usage of the XmlDataDocument?


 
 

<< SEOSystem.XML >>


By Mark Wilson
I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
First Posted 06/05/2002
Times viewed 217

How to convert Word (RTF) documents to XML for auto publication


Summary In the new TopXML tutorials section, we wanted authors to upload their Word documents so that we would take those documents, convert them to XML and automatically publish them in any format using XSLT. This case study details how we did this.

Something that I think will become more common is to allow users/authors to upload a RTF document and to have their content automatically published with your companies/websites styles, able to handle tables, lists, etc.  Basically this the integration of the publication section of a content management system.

We had a requirement that the authors of our TopXML tutorials write their tutorials in an editor (such as Open Office or MS Word) using a specific template that we supplied, which had styles like code, worksheet, solution. 

Each new page for the tutorial needed to be a H2 style so that we could create the separate pages.  We needed to take this document and convert it to an XML document, marking up these styles appropriately, so that we could use XSLT to transform the whole tutorial in a matter of seconds.  The tool also had to handle images, so that if there were any images in the document, these would be saved to our server.

We needed a product that was a COM component so this could be a server-side solution.

We searched through many products and finally found one that suited our needs, which was also flexible enough to allow us to define the XML output of a style, such as:

  • handling URL's correctly, 
  • exporting our code styles sections into XML CDATA sections

Many of the products we looked at only worked within MS Word or only did the conversion in a user interface, which totally invalidated our server-side solution.

We finally found a great tool called 'Logictran RTF Converter', which has an ASP support COM component, plus a user interface, so that you can experiment with your conversions to easily view the output.

The process for this TopXML Tutorial conversion is that the author goes to their Author section on TopXML and is instructed to upload their RTF document (provided by the editor from either Open Office or MS Word).  The document is saved to their tutorial directory and the images are uploaded to the server.  They then need to fill in other metadata for the tutorial.  Then in a matter of a few seconds, Logictran has taken the document, converts it to XML - retaining the styles, colors and so on.  Our solution then runs transformations over the XML and outputs the tutorial - we can even offer the author a preview of what it will look like before submitting it to us.

Have a  you have a look at page 3 of Scott Klein's SQLXML tutorial.  We have a heading, new paragraphs, a code section and a list section.

In the converted XML document, Logictran converted these first few paragraphs to:

<h2>
  <font face=Arial size=4><emphasis>For XML</emphasis></font>
</
h2>
<
para>
<
anchor id=Heading9 />New with SQL Server 2000 is the ability to return the results of a query in XML format. This is accomplished by adding the FOR XML clause at the end of the SELECT statement. It is not difficult to use, and the syntax is as follows:
</para>
<
code><![CDATA[FOR XML mode [, XMLDATA] [, ELEMENTS] [, BINARY BASE 64]]]></code>
<
para>The arguments in brackets are optional, but definitions of all the arguments are:</para>
<
itemizedlist mark=bullet>
  <listitem>
    <para>
      <emphasis>mode</emphasis>- this is the only required argument. It specifies how the XML will be returned in the result set. There are 3 values that can be used:
     </
para>
  </listitem>
</
itemizedlist>

So with the above XML, I could use XSLT to convert <para> elements to a new paragraph, or <itemizedlist> sections to <ul> lists.

If there are images in the document, Logictran saves those images in the same directory as the document and marking it up in the output XML document. 

If you are looking for a similar solution, most of your publishing needs to handle formatting would be adequately met with Logictran's default XML output of RTF documents.  But we had a few needs out of the norm, such as that the 'code' style embed the data in a CDATA section, so that we could preserve the formatting of code sections.  To achieve this, you only need to change a text file to tell the Logictran processor how to output styles that it encounters, which the Logictran documentation explains how to do.

We are planning to take this same architecture to upgrade this Code Library section, as this solution worked so well. So rather than write in our editor you can use Open Office (or MS Word) to output an RTF document and upload it. We'll then be able to handle images as well.

If you have the same need, give this tool a try at Logictran RTF Converter

If you have had the same successes with Logictran, please let us know in the Talkback section below.



Rate this article on a scale of 1 to 10 (0 votes, average 0)

Your vote :  

<< SEOSystem.XML >>





Leave a comment for this article
Your name
Your email (optional)
Your comment
Optional: Upload an attachment
Enter the code shown:

 
 

    Email TopXML