
George Cristian Bina wrote:
Hi Eliot,
I see that you filed a bug against Xerces: http://issues.apache.org/jira/browse/XERCESJ-1104
Note that it uses an XMLEntityResolver interface (not the SAX EntityResolver) that is at XNI level and that should allow some control over system versus uri mappings if the XMLEntityResolver set uses an XML Catalog. This is the interface I thought we should implement to allow the uri mapping. The XMLEntityResolver interface defines one method:
Yes, I've been digging into the code and I have a first stab at a fix. What I've done is extended the XMLEntityResolver interface to add a resolveResourceByUri() method which does nothing but try to resolve the system ID value using the resolver's resolveUri() method. The problem is that resolveEntity() only works for true entities (that is, resources that would be mapped via SYSTEM and PUBLIC catalog entries). It would be inappropriate to use URI entries to try to resolve an external parsed or unparsed entity, in the same way it's inappropriate to use SYSTEM or PUBLIC to resolve a schema location URI. In the case of no-namespace schemas you have no choice but to either use some out-of-band binding or use schema location hints. This is one reason I recommend against using no-namespace schemas. They're no better than external DTD subsets because you have no clear and reliable way to do a non-syntactic binding of document to schema. That is, mapping namespace URIs to schemas is non-syntactic, in that the syntax of the document is not directly locating the schema. Rather the binding is indirect through the namespace, which is an invariant property of the document that directly affects the documents inherent semantics, as opposed to either a DOCTYPE declaration or schema location hint, which is a purely syntactic reference that is not an inherent property of the document the presence or absence of doesn't affect the inherent semantics of the document. Anyway, hopefully I'll be able to report more a bit later. My analysis at this point is that there's a fundamental architectural flaw in the current Xerces implementation in that it doesn't distinguish XML entities from other resources that might be involved in processing and validating a document (i.e., schemas). The approach shown above is really a hack to get around this flaw with the least disruptive change. I suspect that the same problem exists in the Xerces XInclude processing--I would not be surpised if href= values on xi:include elements are resolved via SYSTEM and PUBLIC entries. But I don't have time or energy to dive into that code just now. It looks like I may have to experiment with writing my own XMLEntityResolver implementation in order to implement my desired recursive and fallback catalog resolution behaviors. Cheers, Eliot -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8841 ekimber@innodata-isogen.com www.innodata-isogen.com