BizTalk Utilities CV ,   Jobs ,   Code library
 
Go to the front page to continue learning about XML or select below:

Contents

ReBlogger Contents

Previous posts in .NET XML, System.XML

 
 
Page 1211 of 20224

Remove Rsid Attributes and Elements before Comparing Documents

Blogger : MSDN Blogs
All posts : All posts by MSDN Blogs
Category : .NET XML, System.XML
Blogged date : 2008 Nov 04

[Blog Map] 

A convenient way to explore Open XML markup is to create a small document, modify the document slightly in the Word user interface, save it, and then compare it with the Open XML Diff utility that comes with the Open XML SDK V2.  However, Word adds extraneous elements and attributes that enable merging of two documents that have forked.  These elements and attributes show up as changed, and obscure the differences that we’re looking for.  An easy way to deal with this is to remove these elements and attributes before comparing documents.  We can safely do so without changing the content of the document.  This post presents a bit of code that uses the Open XML SDK and LINQ to XML to remove these elements and attributes so that it's easy to compare them.

For more information on rsid elements and attributes, see Brian Jones’s blog post on them.

This post also contains two of my most commonly used little extension methods – to get an XDocument from an Open XML part, and to save that XDocument back into the word processing document.  The XDocument is stored as an annotation on the Open XML part.

This little program takes any number of files as arguments, and strips these extraneous elements and attributes from each of the files.  Its use:

C:\> StripRsid Test1.docx Test2.docx

 

Here is the listing of this program (code is attached to this post, as well):

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.IO;

using System.Xml;

using System.Xml.Linq;

using DocumentFormat.OpenXml.Packaging;

 

public static class LocalExtensions

{

    public static XDocument GetXDocument(this OpenXmlPart part)

    {

        XDocument xdoc = part.Annotation<XDocument>();

        if (xdoc != null)

            return xdoc;

        using (StreamReader streamReader = new StreamReader(part.GetStream()))

            xdoc = XDocument.Load(XmlReader.Create(streamReader));

        part.AddAnnotation(xdoc);

        return xdoc;

    }

 

    public static void SaveXDocument(this OpenXmlPart part)

    {

        XDocument xdoc = part.Annotation<XDocument>();

        if (xdoc != null)

        {

            using (XmlWriter xw =

              XmlWriter.Create(part.GetStream(FileMode.Create, FileAccess.Write)))

                xdoc.WriteTo(xw);

        }

    }

}

 

class Program

{

    // get rid of every rsid attribute/element in the doc.

    // they exist to enable merging of forked documents; not something

    // we're interested in here.  if we don't delete these nodes, they

    // show up as changed.

    private static void CleanUp(XDocument doc)

    {

        XNamespace w =

            "http://schemas.openxmlformats.org/wordprocessingml/2006/main";

        doc.Descendants().Attributes(w + "rsidTr").Remove();

        doc.Descendants().Attributes(w + "rsidSect").Remove();

        doc.Descendants().Attributes(w + "rsidRDefault").Remove();

        doc.Descendants().Attributes(w + "rsidR").Remove();

        doc.Descendants().Attributes(w + "rsidDel").Remove();

        doc.Descendants(w + "rsid").Remove();

    }

 

    static void Main(string[] args)

    {

        foreach (var file in args)

        {

            using (WordprocessingDocument doc =

                WordprocessingDocument.Open(file, true))

            {

                XDocument xDoc = doc.MainDocumentPart.GetXDocument();

                CleanUp(xDoc);

                doc.MainDocumentPart.SaveXDocument();

 

                foreach (var h in doc.MainDocumentPart.HeaderParts)

                {

                    xDoc = h.GetXDocument();

                    CleanUp(xDoc);

                    h.SaveXDocument();

                }

 

                foreach (var f in doc.MainDocumentPart.FooterParts)

                {

                    xDoc = f.GetXDocument();

                    CleanUp(xDoc);

                    f.SaveXDocument();

                }

            }

        }

    }

}


Read comments or post a reply to : Remove Rsid Attributes and Elements before Comparing Documents
Page 1211 of 20224

Newest posts
 

    Email TopXML