When I wrote my initial post on my new approach to XSD versioning, I promised that I'd post code. Here's the first cut:
class Program
{
static void OriginalMain(string[] args)
{
FileStream xml = new FileStream(args[0], FileMode.Open);
FileStream xsd = new FileStream(args[1], FileMode.Open);
XmlReader reader = null;
XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreWhitespace = true;
settings.ValidationFlags = XmlSchemaValidationFlags.None; //ReportValidationWarnings;
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(XmlSchema.Read(xsd, null));
settings.Schemas.Compile();
// wire-up anonymous callback delegate
int badDepth = -1;
bool nextNodeInvalid = false;
settings.ValidationEventHandler += delegate(object sender, ValidationEventArgs ea)
{
Console.WriteLine("Event -- {0}: {1}", ea.Severity, ea.Message);
Console.WriteLine("Event -- {0}\t{1}\t{2}\t{3}\t{4}", reader.NodeType, reader.Name, reader.Value, reader.SchemaInfo == null ? "<none>" : reader.SchemaInfo.Validity.ToString(), reader.Depth);
if (reader.NodeType == XmlNodeType.Element &&
reader.SchemaInfo.Validity == XmlSchemaValidity.NotKnown &&
nextNodeInvalid == false)
{
Console.WriteLine("Filtering out unexpected stuff now...");
nextNodeInvalid = true;
badDepth = reader.Depth;
}
else if (reader.NodeType == XmlNodeType.EndElement &&
reader.SchemaInfo.Validity == XmlSchemaValidity.Valid)
{
Console.WriteLine("Other stuff expected, ignoring...");
}
};
reader = XmlReader.Create(xml, settings);
while (reader.Read())
{
if (nextNodeInvalid)
{
int targetDepth = badDepth - 1;
while (reader.Depth > targetDepth)
{
reader.Read();
Console.WriteLine("Filtering -- {0}\t{1}\t{2}\t{3}\t{4}", reader.NodeType, reader.Name, reader.Value, reader.SchemaInfo == null ? "<none>" : reader.SchemaInfo.Validity.ToString(), reader.Depth);
}
nextNodeInvalid = false;
badDepth = -1;
}
Console.WriteLine("Main -- {0}\t{1}\t{2}\t{3}\t{4}", reader.NodeType, reader.Name, reader.Value, reader.SchemaInfo == null ? "<none>" : reader.SchemaInfo.Validity.ToString(), reader.Depth);
}
}
}
What this code does is pretty simple. It catches XSD errors and if they occur at the beginning of an element, which indicates unexpected stuff is present, they ignore that error and any others that occur for that depth in the document. This version also happens to catch XSD errors if they occur on the close tag of an element, which indicates that expected stuff is absent. With optioanl extenstions, this is never a problem, but it was interesting to make it work with extensions that aren't marked minOccurs=”0”. Anyway, when it detects extra unexpected stuff, it indicates that that content is “filtered” - meaning that it could be removed from the XML stream and the resulting document could be assumed to be valid as per your current schema (you'd really want to do another validation pass to be absolutely sure, but I wouldn't bother in general). My next step is to wrap this in an XmlReader implementation so that it can be piped into a serializer more simply. It needs its own reader because it relies on control of the Read loop to filter stuff out. Anyway, I'll post that when it's working correctly. BTW, one thing to note about this code is that it only works with the sequence compositor. With all compositors, you can't assume that an unknown element is the beginning of content from a later version. I don't have any problem with this limitation, but it has to be mentioned.