|
Summary
In the new TopXML tutorials section, we wanted authors to upload their Word documents so that we would take those documents, convert them to XML and automatically publish them in any format using XSLT. This case study details how we did this.
Something that I think will become more common is to allow users/authors to
upload a RTF document and to have their content automatically published with
your companies/websites styles, able to handle tables, lists, etc.
Basically this the integration of the publication section of a content management
system.
We had a requirement that the authors of our TopXML
tutorials write their tutorials in an editor (such as Open Office or MS Word) using a specific template that
we supplied, which had styles like code, worksheet, solution.
Each new
page for the tutorial needed to be a H2 style so that we could create the
separate pages. We needed to take this document and convert it to an XML
document, marking up these styles appropriately, so that we could use XSLT to
transform the whole tutorial in a matter of seconds. The tool also had to
handle images, so that if there were any images in the document, these would be
saved to our server.
We needed a product that was a COM component so this could be a server-side solution.
We searched through many products and finally found one that suited our
needs, which was also flexible enough to allow us to define the XML output of a
style, such as:
- handling URL's correctly,
- exporting our code styles sections into XML CDATA sections
Many of the products we looked at only worked within MS Word or only did the
conversion in a user interface, which totally invalidated our server-side
solution.
We finally found a great tool called 'Logictran
RTF Converter', which has an ASP support COM component, plus a user
interface, so that you can experiment with your conversions to easily view the
output.
The process for this TopXML Tutorial conversion is that the author goes to
their Author section on TopXML and is instructed to upload their RTF
document (provided by the editor from either Open Office or MS Word). The document is saved to their tutorial directory and the images are uploaded to the server. They
then need to fill in other metadata for the tutorial. Then in a matter of
a few seconds, Logictran has taken the document, converts it to XML - retaining the styles, colors and so on. Our
solution then runs transformations over the XML and outputs the tutorial - we can even offer the author a preview of what it will look like before submitting it to us.
Have a you have a look at page
3 of Scott Klein's SQLXML tutorial. We have a heading, new paragraphs, a code section and a list section.
In the converted XML document, Logictran converted these first few paragraphs to:
<h2>
<font
face=Arial size=4><emphasis>For
XML</emphasis></font>
</h2>
<para>
<anchor id=Heading9
/>New with SQL Server 2000 is the ability to return the
results of a query in XML format. This is accomplished by adding the FOR XML
clause at the end of the SELECT statement. It is not difficult to use, and the
syntax is as follows:
</para>
<code><![CDATA[FOR
XML mode [, XMLDATA] [, ELEMENTS] [, BINARY BASE 64]]]></code>
<para>The
arguments in brackets are optional, but definitions of all the arguments are:</para>
<itemizedlist mark=bullet>
<listitem>
<para>
<emphasis>mode</emphasis>- this is the only required argument. It specifies how the XML will be
returned in the result set. There are 3 values that can be used:
</para>
</listitem>
</itemizedlist>
So with the above XML, I could use XSLT to convert <para> elements to a
new paragraph, or <itemizedlist> sections to <ul> lists.
If there are images in the document, Logictran saves those images in the same
directory as the document and marking it up in the output XML document.
If you are looking for a similar solution, most of your publishing needs to
handle formatting would be adequately met with Logictran's default XML output of RTF documents. But we had a few needs out of the norm, such as that
the 'code' style embed the data in a CDATA section, so that we could preserve
the formatting of code sections. To achieve this, you only need to change
a text file to tell the Logictran processor how to output styles that it
encounters, which the Logictran documentation explains how to do.
We are planning to take this same architecture to upgrade this Code Library section, as this solution worked so well. So rather than write in our editor you can use Open Office (or MS Word) to output an RTF document and upload it. We'll then be able to handle images as well.
If you have the same need, give this tool a try at Logictran
RTF Converter.
If you have had the same successes with Logictran, please let us know in the Talkback section below.
|