
I'm trying to set up some new catalogs for DITA (I'm tasked with defining how DITA will use various forms of URL and public ID to handle version-specific and inspecific references). I'm trying to use oXygen 6.2 to develop and test my catalogs and I'm not having much luck, which is probably mostly user error but I think there may be some limitations in oXygen's catalog implementation and I'm trying to figure out which is which. Unfortunately, the online help for catalogs is not helpful in this case. I'm trying to do several things: 1. Set up entries that map namespace URIs to local URLs for the corresponding schema document 2. Set up entries that map one URL to another URL. 3. Set up entries that map public IDs to system IDs that are fully-qualified URLs that then map, via URI entries, to local files I'm running into several problems. In all cases, I have set the catalog options to not use the default catalogs and I only list one top-level catalog, which then uses <next-catalog> elements to point to subordinate catalogs that do the actual mapping, i.e.: <catalog> <nextCatalog catalog="dtd/catalog-dita-dtd.xml" /> </catalog> -------------------------------------------- Problem 1: HTTP URL not resolved via catalog I have a document with this DOCTYPE declaration: <!DOCTYPE topic SYSTEM "http://dita.oasis-open.org/DITA/version.1.0/topic.dtd"> In my catalog I have this entry: <uri name="http://dita.oasis-open.org/DITA/version.1.0/topic.dtd" uri="topic.dtd"/> I have verified that the file "topic.dtd" is where the catalog says it is (using "edit file at cursor within oXygen). When I validate the document, I get this error: "Description: F HttpException-dita.oasis-open.org (http://dita.oasis-open.org/DITA/version.1.0/topic.dtd)" This suggests that the validator is trying to resolve the URL and, upon failure to do so (the URL isn't currently resolvable), is not then trying to resolve it via the catalog, which it should be doing. Also, in this case, even if the URL was resolvable over the net, I would prefer to have it resolved via the catalog--I didn't see an option for controlling that behavior. ----------------------------------------------------- Problem 2: Non-HTTP URN gives malformed URL exception In another document I have this doctype declaration: <!DOCTYPE topic SYSTEM "urn:oasis:http://dita.oasis-open.org/DITA/topic.dtd"> When I try to validate I get this failure: "Description: F MalformedURLException-unknown protocol: urn" This suggests that oXygen is expecting the SYSTEM value to be an HTTP URL, which is not required by the XML spec (the SYSTEM value is a URI, not a URL). In this case, I would expect the validator to try to resolve the URN via the catalog (where it is mapped) and only if that fails then return "could not resolve URN" error, not a malformed URN exception. ------------------------------------------------------ Problem 3: Unclear response from catalog verbose trace It's also not clear what these catalog trace messages are telling me: Description: Public: null System: http://dita.oasis-open.org/DITA/version.1.0/topic.dtd = null In particular, what the final "null" is indicating--does that indicate that the file to which the URI is mapped couldn't be found? I couldn't find an explanation of this case in the online docs. Cheers, Eliot -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8841 ekimber@innodata-isogen.com www.innodata-isogen.com

Eliot Kimber wrote:
I'm running into several problems. In all cases, I have set the catalog options to not use the default catalogs and I only list one top-level catalog, which then uses <next-catalog> elements to point to subordinate catalogs that do the actual mapping, i.e.:
Follow up: Using the resolver class in the XML commons resolver.jar I have verified that my catalogs are correct in that given the starting catalog and the input URI to resolve, resolver is able to resolve the URI to the expected value. Cheers, E. -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8841 ekimber@innodata-isogen.com www.innodata-isogen.com

Hello, There are only user errors, not limitations in the <oXygen/>'s catalog implementation. 1. First the root element of an OASIS catalog must be in the "urn:oasis:names:tc:entity:xmlns:xml:catalog" namespace, for example: <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> ... </catalog> as the "Resource Failures" section of the OASIS XML Catalogs specification states: http://www.oasis-open.org/committees/entity/spec-2001-08-06.html#s.res.fail The non-normative XML Schema listed in the specification requires that *all* the catalog elements are in that namespace and that is what <oXygen/> requires: <xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema' targetNamespace='urn:oasis:names:tc:entity:xmlns:xml:catalog' ... Second you tried to resolve an external identifier with a URI entry of the catalog. You must use an external identifier entry, for example *system*: <system systemId="http://dita.oasis-open.org/DITA/version.1.0/topic.dtd" uri="topic.dtd"/> For external identifier entries please see: http://www.oasis-open.org/committees/entity/spec-2001-08-06.html#s.ext.ent For URI entries please see: http://www.oasis-open.org/committees/entity/spec-2001-08-06.html#s.uri.ent The catalog works like this: try to resolve the URL via the catalog, *only* upon failure try to resolve the URL by accessing the remote location. So the possible user preference that you specified does not make sense. 2. Problem 2 depends on problem 1. Validation works as you expect if you put all the catalog elements in the right namespace and use an external identifier entry in your catalog instead of URI entry. 3. The substring following the "=" character is the result of resolving the pair (public ID, system ID) via the catalog. "Null" indicates that the URI could not be resolved via the catalog. XML Catalog operations are completely independent of accessing any resource specified by a URL. You can find this explained in the OASIS XML Catalogs specification directly linked from the "Working with XML Catalogs" section of the <oXygen/> User Manual: http://www.oxygenxml.com/doc/ug-standalone-en/editing-xml-documents.html#usi... Best regards, Sorin Eliot Kimber wrote:
I'm trying to set up some new catalogs for DITA (I'm tasked with defining how DITA will use various forms of URL and public ID to handle version-specific and inspecific references).
I'm trying to use oXygen 6.2 to develop and test my catalogs and I'm not having much luck, which is probably mostly user error but I think there may be some limitations in oXygen's catalog implementation and I'm trying to figure out which is which. Unfortunately, the online help for catalogs is not helpful in this case.
I'm trying to do several things:
1. Set up entries that map namespace URIs to local URLs for the corresponding schema document
2. Set up entries that map one URL to another URL.
3. Set up entries that map public IDs to system IDs that are fully-qualified URLs that then map, via URI entries, to local files
I'm running into several problems. In all cases, I have set the catalog options to not use the default catalogs and I only list one top-level catalog, which then uses <next-catalog> elements to point to subordinate catalogs that do the actual mapping, i.e.:
<catalog> <nextCatalog catalog="dtd/catalog-dita-dtd.xml" /> </catalog>
-------------------------------------------- Problem 1: HTTP URL not resolved via catalog
I have a document with this DOCTYPE declaration:
<!DOCTYPE topic SYSTEM "http://dita.oasis-open.org/DITA/version.1.0/topic.dtd">
In my catalog I have this entry:
<uri name="http://dita.oasis-open.org/DITA/version.1.0/topic.dtd" uri="topic.dtd"/>
I have verified that the file "topic.dtd" is where the catalog says it is (using "edit file at cursor within oXygen).
When I validate the document, I get this error:
"Description: F HttpException-dita.oasis-open.org (http://dita.oasis-open.org/DITA/version.1.0/topic.dtd)"
This suggests that the validator is trying to resolve the URL and, upon failure to do so (the URL isn't currently resolvable), is not then trying to resolve it via the catalog, which it should be doing.
Also, in this case, even if the URL was resolvable over the net, I would prefer to have it resolved via the catalog--I didn't see an option for controlling that behavior.
----------------------------------------------------- Problem 2: Non-HTTP URN gives malformed URL exception
In another document I have this doctype declaration:
<!DOCTYPE topic SYSTEM "urn:oasis:http://dita.oasis-open.org/DITA/topic.dtd">
When I try to validate I get this failure:
"Description: F MalformedURLException-unknown protocol: urn"
This suggests that oXygen is expecting the SYSTEM value to be an HTTP URL, which is not required by the XML spec (the SYSTEM value is a URI, not a URL).
In this case, I would expect the validator to try to resolve the URN via the catalog (where it is mapped) and only if that fails then return "could not resolve URN" error, not a malformed URN exception.
------------------------------------------------------ Problem 3: Unclear response from catalog verbose trace
It's also not clear what these catalog trace messages are telling me:
Description: Public: null System: http://dita.oasis-open.org/DITA/version.1.0/topic.dtd = null
In particular, what the final "null" is indicating--does that indicate that the file to which the URI is mapped couldn't be found? I couldn't find an explanation of this case in the online docs.
Cheers,
Eliot

Sorin Ristache wrote:
Hello,
There are only user errors, not limitations in the <oXygen/>'s catalog implementation.
1. First the root element of an OASIS catalog must be in the "urn:oasis:names:tc:entity:xmlns:xml:catalog" namespace, for example:
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> ... </catalog>
Yes, I have all the correct declarations, I just didn't show them for brevity.
Second you tried to resolve an external identifier with a URI entry of the catalog. You must use an external identifier entry, for example *system*:
<system systemId="http://dita.oasis-open.org/DITA/version.1.0/topic.dtd" uri="topic.dtd"/>
Oops. This was an error on my part.
The catalog works like this: try to resolve the URL via the catalog, *only* upon failure try to resolve the URL by accessing the remote location. So the possible user preference that you specified does not make sense.
I'm not sure I understand your response. I'm saying that *I want it to work the way I said I want it work:* try to resolve via the catalog *first*, then only on failure try to resolve via the net. I want that option and there's no reason not to provide it. There's nothing in the Entity Resolution spec that requires you to try to resolve a system ID via the Net first. From the spec: "This Committee Specification does not dictate when an entity manager should access this catalog; for example, an application may attempt other mapping algorithms before or after accessing this catalog." Therefore, there's every reason to offer users the option of how they want catalogs to be applied when resolving entities and resources.
XML Catalog operations are completely independent of accessing any resource specified by a URL. You can find this explained in the OASIS XML Catalogs specification directly linked from the "Working with XML Catalogs" section of the <oXygen/> User Manual:
I'm not sure what you mean here: the entity resolution spec is quite clear that it can be used for resolving non-entity resources referenced from within a XML document (or in any other processing context for that matter): "A catalog can be used in two different, independent ways: (1) it can be used to locate the replacement text for an external entity, or (2) it can be used to locate an alternate URI reference for a resource." Therefore, it's not unreasonable to expect any parser to use an available catalog to resolve both entity external identifiers and non-entity resource URIs. In fact, one would expect this to always be available for schema locations since without this it would be very hard to migrate from DTDs to schemas in environments where catalogs have been used to manage the local access to DTD components. Again, user options should be provided, i.e. "use catalogs to resolve non-entity URIs?". So, given that, here's where I've gotten after fixing my error identified above: - I can resolve system IDs that map directly to a local file. But what I would like to be able to do and sort of expected to work, is to map through several levels of resolution. In particular, given this DOCTYPE declaration: <!DOCTYPE topic SYSTEM "http://dita.oasis-open.org/DITA/version.1/topic.dtd"> I would like to be able to resolve through this system of catalog entries: <!-- Application-version-dependent, resource version indenpendent URL: --> <system systemId="http://dita.oasis-open.org/DITA/version.1/topic.dtd" uri="http://dita.oasis-open.org/DITA/version.1.0/topic.dtd"/> <!-- Version-specific URL: --> <system systemId="http://dita.oasis-open.org/DITA/version.1.0/topic.dtd" uri="./topic.dtd"/> That is, I was hoping that the entity resolution would work as follows: Step 1: resolve system ID "http://dita.oasis-open.org/DITA/version.1/topic.dtd" to URI "http://dita.oasis-open.org/DITA/version.1.0/topic.dtd" Step 2: try to reslve URI "http://dita.oasis-open.org/DITA/version.1.0/topic.dtd" via catalog, resolve it to URI "./topic.dtd" Step 3: try to reslve URI "./topic.dtd" via catalog. Find no entry, try to resolve it using outside system, find local file topic.dtd. However, it appears that Oxygen's resolver does not try to resolve the URI returned in step 1, causing it to fail the validation with an HttpException. Essentially, in the case where an Oxygen user has requested that all URIs be resolved via the catalogs, it should just automatically be applied recursively because every URI will be passed to a "resolveUri()" method that will try to resolve it first via the catalog, which will have the effect of recursing through all relevant catalog entries until no entry is found for a result URI. Also, it appears that the oXygen resolver is resolving noNamespaceSchemaLocation= values via SYSTEM catalog entries and not URI entries. I don't think this is correct given that noNamespaceSchemaLocation and schemaLocation are not entity references but are references to non-entity resources. They should be resolved via URI entries exclusively. Cheers, Eliot -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8841 ekimber@innodata-isogen.com www.innodata-isogen.com

Eliot Kimber wrote:
That is, I was hoping that the entity resolution would work as follows:
Step 1: resolve system ID "http://dita.oasis-open.org/DITA/version.1/topic.dtd" to URI "http://dita.oasis-open.org/DITA/version.1.0/topic.dtd"
Step 2: try to reslve URI "http://dita.oasis-open.org/DITA/version.1.0/topic.dtd" via catalog, resolve it to URI "./topic.dtd"
Step 3: try to reslve URI "./topic.dtd" via catalog. Find no entry, try to resolve it using outside system, find local file topic.dtd.
However, it appears that Oxygen's resolver does not try to resolve the URI returned in step 1, causing it to fail the validation with an HttpException.
Note that this is the behavior of client of the resolver, not the behavior of the resolver itself (which always does just a single lookup). The Entity Resolution spec makes it clear the the resolver does not do recursive entry processing. That is, it's the software component that calls the resolve*() method on the resolver that implements the above recursive algorythm, not the resolver. That's one reason this needs to be an oXygen-level option, because this behavior is a function of the processors integrated with oXygen, not the core resolver (which I presume is the Apache commons resolver). Cheers, E. -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8841 ekimber@innodata-isogen.com www.innodata-isogen.com

Hi Eliot, There are two requirements that we extracted from these emails: 1. Allow multiple levels of indirection through the catalog mappings 2. Use uri mappings instead of system mappings for schema locations We will consider both, the second one is a little more difficult to implement. The first requirement also needs attention as the catalogs may contain direct or indirect recursion. We are using currently the catalog as an EntityResolver set on the XML parser thus it is not possible to use uri mappings as all we get at that level is a resolveEntity call. Using uri mappings requires usage of parser specific support, in our case working at XNI (Xerces Native Interface) level. Best Regards, George --------------------------------------------------------------------- George Cristian Bina <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com Eliot Kimber wrote:
Eliot Kimber wrote:
That is, I was hoping that the entity resolution would work as follows:
Step 1: resolve system ID "http://dita.oasis-open.org/DITA/version.1/topic.dtd" to URI "http://dita.oasis-open.org/DITA/version.1.0/topic.dtd"
Step 2: try to reslve URI "http://dita.oasis-open.org/DITA/version.1.0/topic.dtd" via catalog, resolve it to URI "./topic.dtd"
Step 3: try to reslve URI "./topic.dtd" via catalog. Find no entry, try to resolve it using outside system, find local file topic.dtd.
However, it appears that Oxygen's resolver does not try to resolve the URI returned in step 1, causing it to fail the validation with an HttpException.
Note that this is the behavior of client of the resolver, not the behavior of the resolver itself (which always does just a single lookup). The Entity Resolution spec makes it clear the the resolver does not do recursive entry processing.
That is, it's the software component that calls the resolve*() method on the resolver that implements the above recursive algorythm, not the resolver. That's one reason this needs to be an oXygen-level option, because this behavior is a function of the processors integrated with oXygen, not the core resolver (which I presume is the Apache commons resolver).
Cheers,
E.

George Cristian Bina wrote:
Hi Eliot,
There are two requirements that we extracted from these emails:
1. Allow multiple levels of indirection through the catalog mappings
Yes
2. Use uri mappings instead of system mappings for schema locations
Yes.
We will consider both, the second one is a little more difficult to implement.
There is a third requirement, which is provide control over whether to use the catalog first or second when resolving system IDs and URIs. The current behavior is to try to resolve a URI via the net and only if that fails, try the catalog. I would like the option of doing the reverse, namely attempting to do all resolution locally first.
The first requirement also needs attention as the catalogs may contain direct or indirect recursion.
I'm not sure I follow you here--I think there can only be direct recursion--that is, a given entry either resolves to another URI mapped in the catalog or it doesn't. If it doesn't then you will never return to the catalog (because either you will resolve the resource from where it is served or resolution will fail completely).
We are using currently the catalog as an EntityResolver set on the XML parser thus it is not possible to use uri mappings as all we get at that level is a resolveEntity call.
Hmm. Is this an aspect of a standard API behavior or the behavior of the specific parser you're using? I'm not necessarily conversant with this level of detail in SAX or JAXP. If I'm asking for something that's at odds with current APIs perhaps I need to rethink my request (not that the request is incorrect in principle, but if it's in conflict with established, albeit incorrect, practice, far be it from me to buck that trend). Using uri mappings requires usage of
parser specific support, in our case working at XNI (Xerces Native Interface) level.
Hmm, OK. I'm not sure what this means at the implementation level, but I assume this means that you've got to use a tighter binding to the parser. Cheers, E. -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8841 ekimber@innodata-isogen.com www.innodata-isogen.com

Hi Eliot, See a few comments below. Eliot Kimber wrote:
George Cristian Bina wrote:
Hi Eliot,
There are two requirements that we extracted from these emails:
1. Allow multiple levels of indirection through the catalog mappings
Yes
2. Use uri mappings instead of system mappings for schema locations
Yes.
We will consider both, the second one is a little more difficult to implement.
There is a third requirement, which is provide control over whether to use the catalog first or second when resolving system IDs and URIs.
The current behavior is to try to resolve a URI via the net and only if that fails, try the catalog. I would like the option of doing the reverse, namely attempting to do all resolution locally first.
As far as we know/tested oXygen does not do that, and cannot do that. The catalog resolver acts as an entity resolver on the parser so it is called first to resolve an entity. Also I am not aware of any parser that will perform such a fallback as you describe, to try to access a resource and to fallback to something else if that fails.
The first requirement also needs attention as the catalogs may contain direct or indirect recursion.
I'm not sure I follow you here--I think there can only be direct recursion--that is, a given entry either resolves to another URI mapped in the catalog or it doesn't. If it doesn't then you will never return to the catalog (because either you will resolve the resource from where it is served or resolution will fail completely).
By recursion I mean getting to the same id, for instance direct recursion: map http://www.example.com to http://www.example.com indirect recursion: map http://www.example.com/1 to http://www.example.com/2 map http://www.example.com/2 to http://www.example.com/1
We are using currently the catalog as an EntityResolver set on the XML parser thus it is not possible to use uri mappings as all we get at that level is a resolveEntity call.
Hmm. Is this an aspect of a standard API behavior or the behavior of the specific parser you're using? I'm not necessarily conversant with this level of detail in SAX or JAXP. If I'm asking for something that's at odds with current APIs perhaps I need to rethink my request (not that the request is incorrect in principle, but if it's in conflict with established, albeit incorrect, practice, far be it from me to buck that trend).
We are using mainly SAX in oXygen, for validation we create an XML Reader. At SAX level all you can set on the XML Reader is an EntityResolver or since SAX 2.0 an EntityResolver2: http://www.saxproject.org/apidoc/org/xml/sax/EntityResolver.html http://www.saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.html The methods defined by these interfaces do not provide enough information to be able to detect that you are trying to resolve an external entity or a schema location, so the issue is with general XML API and not with the specific parser we are using which is Xerces 2.7.1.
Using uri mappings requires usage of
parser specific support, in our case working at XNI (Xerces Native Interface) level.
Hmm, OK. I'm not sure what this means at the implementation level, but I assume this means that you've got to use a tighter binding to the parser.
Exactly. There is one more issue I think. If we will implement this support for using URI mappings for resolving schemas then if someone tries to validate the document from command line using the Apache resolver at SAX level then he will get a different behavior as that will use the system mappings.
Cheers,
E.
Best Regards, George --------------------------------------------------------------------- George Cristian Bina <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com

George Cristian Bina wrote:
The current behavior is to try to resolve a URI via the net and only if that fails, try the catalog. I would like the option of doing the reverse, namely attempting to do all resolution locally first.
As far as we know/tested oXygen does not do that, and cannot do that. The catalog resolver acts as an entity resolver on the parser so it is called first to resolve an entity. Also I am not aware of any parser that will perform such a fallback as you describe, to try to access a resource and to fallback to something else if that fails.
Hmm--I must have misunderstood the implications of the original failures I was seeing. Since I have to look into the core Xerces code that does entity resolution and schema lookup (see below) I'll poke into this more.
By recursion I mean getting to the same id, for instance direct recursion: map http://www.example.com to http://www.example.com indirect recursion: map http://www.example.com/1 to http://www.example.com/2 map http://www.example.com/2 to http://www.example.com/1
I would call this circular references (cycles). One would have to detect cycles--any process that does recursive lookup in any context must do cycle detection. But again, if this is all built-in Xerces behavior then of course there's nothing y'all should do.
There is one more issue I think. If we will implement this support for using URI mappings for resolving schemas then if someone tries to validate the document from command line using the Apache resolver at SAX level then he will get a different behavior as that will use the system mappings.
I'm starting to understand the issue a bit more and I agree that this is really an issue with the low-level parser implementation. I'm trying to find where in the the Xerces code it is resolve schema locations as system IDs rather than URI calls. I think the correct solution is to fix the code at the parser level, so I don't think there's anything oXygen needs to do or could reasonably do about this. Cheers, E. -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8841 ekimber@innodata-isogen.com www.innodata-isogen.com

Hi Eliot, I see that you filed a bug against Xerces: http://issues.apache.org/jira/browse/XERCESJ-1104 Note that it uses an XMLEntityResolver interface (not the SAX EntityResolver) that is at XNI level and that should allow some control over system versus uri mappings if the XMLEntityResolver set uses an XML Catalog. This is the interface I thought we should implement to allow the uri mapping. The XMLEntityResolver interface defines one method: public XMLInputSource resolveEntity(XMLResourceIdentifier resourceIdentifier) and the XMLResourceIdentifier has the following fields: publicId expandedSystemId literalSystemId baseSystemId namespace So basically in a catalog aware implementation of the XMLEntityResolver one can try to see if a namespace is specified and if it is then try to map that namespace through the catalog uri mappings to a resource, if that fails then it can try to map the system ID to a resource using also uri mappings. I'm not sure however what that gives for a no namespace schema for instance. I will watch the bug to see the feedback from Xerces developers. Best Regards, George --------------------------------------------------------------------- George Cristian Bina <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com Eliot Kimber wrote:
George Cristian Bina wrote:
The current behavior is to try to resolve a URI via the net and only if that fails, try the catalog. I would like the option of doing the reverse, namely attempting to do all resolution locally first.
As far as we know/tested oXygen does not do that, and cannot do that. The catalog resolver acts as an entity resolver on the parser so it is called first to resolve an entity. Also I am not aware of any parser that will perform such a fallback as you describe, to try to access a resource and to fallback to something else if that fails.
Hmm--I must have misunderstood the implications of the original failures I was seeing. Since I have to look into the core Xerces code that does entity resolution and schema lookup (see below) I'll poke into this more.
By recursion I mean getting to the same id, for instance direct recursion: map http://www.example.com to http://www.example.com indirect recursion: map http://www.example.com/1 to http://www.example.com/2 map http://www.example.com/2 to http://www.example.com/1
I would call this circular references (cycles). One would have to detect cycles--any process that does recursive lookup in any context must do cycle detection.
But again, if this is all built-in Xerces behavior then of course there's nothing y'all should do.
There is one more issue I think. If we will implement this support for using URI mappings for resolving schemas then if someone tries to validate the document from command line using the Apache resolver at SAX level then he will get a different behavior as that will use the system mappings.
I'm starting to understand the issue a bit more and I agree that this is really an issue with the low-level parser implementation.
I'm trying to find where in the the Xerces code it is resolve schema locations as system IDs rather than URI calls. I think the correct solution is to fix the code at the parser level, so I don't think there's anything oXygen needs to do or could reasonably do about this.
Cheers,
E.

George Cristian Bina wrote:
Hi Eliot,
I see that you filed a bug against Xerces: http://issues.apache.org/jira/browse/XERCESJ-1104
Note that it uses an XMLEntityResolver interface (not the SAX EntityResolver) that is at XNI level and that should allow some control over system versus uri mappings if the XMLEntityResolver set uses an XML Catalog. This is the interface I thought we should implement to allow the uri mapping. The XMLEntityResolver interface defines one method:
Yes, I've been digging into the code and I have a first stab at a fix. What I've done is extended the XMLEntityResolver interface to add a resolveResourceByUri() method which does nothing but try to resolve the system ID value using the resolver's resolveUri() method. The problem is that resolveEntity() only works for true entities (that is, resources that would be mapped via SYSTEM and PUBLIC catalog entries). It would be inappropriate to use URI entries to try to resolve an external parsed or unparsed entity, in the same way it's inappropriate to use SYSTEM or PUBLIC to resolve a schema location URI. In the case of no-namespace schemas you have no choice but to either use some out-of-band binding or use schema location hints. This is one reason I recommend against using no-namespace schemas. They're no better than external DTD subsets because you have no clear and reliable way to do a non-syntactic binding of document to schema. That is, mapping namespace URIs to schemas is non-syntactic, in that the syntax of the document is not directly locating the schema. Rather the binding is indirect through the namespace, which is an invariant property of the document that directly affects the documents inherent semantics, as opposed to either a DOCTYPE declaration or schema location hint, which is a purely syntactic reference that is not an inherent property of the document the presence or absence of doesn't affect the inherent semantics of the document. Anyway, hopefully I'll be able to report more a bit later. My analysis at this point is that there's a fundamental architectural flaw in the current Xerces implementation in that it doesn't distinguish XML entities from other resources that might be involved in processing and validating a document (i.e., schemas). The approach shown above is really a hack to get around this flaw with the least disruptive change. I suspect that the same problem exists in the Xerces XInclude processing--I would not be surpised if href= values on xi:include elements are resolved via SYSTEM and PUBLIC entries. But I don't have time or energy to dive into that code just now. It looks like I may have to experiment with writing my own XMLEntityResolver implementation in order to implement my desired recursive and fallback catalog resolution behaviors. Cheers, Eliot -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8841 ekimber@innodata-isogen.com www.innodata-isogen.com

Eliot Kimber wrote:
Yes, I've been digging into the code and I have a first stab at a fix.
What I've done is extended the XMLEntityResolver interface to add a resolveResourceByUri() method which does nothing but try to resolve the system ID value using the resolver's resolveUri() method.
The problem is that resolveEntity() only works for true entities (that is, resources that would be mapped via SYSTEM and PUBLIC catalog entries). It would be inappropriate to use URI entries to try to resolve an external parsed or unparsed entity, in the same way it's inappropriate to use SYSTEM or PUBLIC to resolve a schema location URI.
I'm trying to hack, I mean fix, the Xerces code for schema resolution to use URI catalog entries, but in trying to test my fix using a DOM parser I can't figure out how to set up the configuration so the schema loader uses my fixed XMLCatalogEntityResolver instead of the default XMLEntityManager. I'm wondering of someone at Oxygen could show me how Oxygen sets this up since clearly Oxygen is able to set up this configuration. Thanks, Eliot -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8841 ekimber@innodata-isogen.com www.innodata-isogen.com

Hi Eliot, Well, we do not set an XMLEntityResolver in oXygen yet, we will probably do that to have the XML Catalog working at that level. You can set an XMLEntiryResolver as a property, here it is an example on the Xerces website: http://xml.apache.org/xerces2-j/faq-xcatalogs.html *** // Set the resolver on the parser. reader.setProperty( "http://apache.org/xml/properties/internal/entity-resolver", resolver); *** Xerces also has implemented an XML Catalog resolver at the XNI level, the above is the FAQ page for the XML Catalog support in Xerces. I have not looked however inside that class to see how they handle schema resources. Best Regards, George --------------------------------------------------------------------- George Cristian Bina <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com Eliot Kimber wrote:
Eliot Kimber wrote:
Yes, I've been digging into the code and I have a first stab at a fix.
What I've done is extended the XMLEntityResolver interface to add a resolveResourceByUri() method which does nothing but try to resolve the system ID value using the resolver's resolveUri() method.
The problem is that resolveEntity() only works for true entities (that is, resources that would be mapped via SYSTEM and PUBLIC catalog entries). It would be inappropriate to use URI entries to try to resolve an external parsed or unparsed entity, in the same way it's inappropriate to use SYSTEM or PUBLIC to resolve a schema location URI.
I'm trying to hack, I mean fix, the Xerces code for schema resolution to use URI catalog entries, but in trying to test my fix using a DOM parser I can't figure out how to set up the configuration so the schema loader uses my fixed XMLCatalogEntityResolver instead of the default XMLEntityManager.
I'm wondering of someone at Oxygen could show me how Oxygen sets this up since clearly Oxygen is able to set up this configuration.
Thanks,
Eliot

George Cristian Bina wrote:
You can set an XMLEntiryResolver as a property, here it is an example on the Xerces website: http://xml.apache.org/xerces2-j/faq-xcatalogs.html
OK--that's what I am doing (setting the entity resolver property on the reader). Not really an issue to discusss on this forum, but the problem I'm running into is that while the above sets the resolver for the parser, it doesn't set it for the schema loader, which, at least as I have things configured, defaults to the base XMLEntityResolver. But when using oXygen I do get schema locations resolved through the catalog, albeit via the wrong kind of entries, so I must be missing something somewhere. Anyway, I'm sure I'll figure it out eventually. Thanks for the help. Cheers, Eliot -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8841 ekimber@innodata-isogen.com www.innodata-isogen.com

Hi Eliot, In oXygen we use the parser at SAX level (that is why we resolve the schemas using the system ID mappings) so we just set an EntityResolver: XMLReader.setEntityResolver(EntityResolver) http://www.saxproject.org/apidoc/org/xml/sax/XMLReader.html#setEntityResolve...) Best Regards, George --------------------------------------------------------------------- George Cristian Bina <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com Eliot Kimber wrote:
George Cristian Bina wrote:
You can set an XMLEntiryResolver as a property, here it is an example on the Xerces website: http://xml.apache.org/xerces2-j/faq-xcatalogs.html
OK--that's what I am doing (setting the entity resolver property on the reader).
Not really an issue to discusss on this forum, but the problem I'm running into is that while the above sets the resolver for the parser, it doesn't set it for the schema loader, which, at least as I have things configured, defaults to the base XMLEntityResolver.
But when using oXygen I do get schema locations resolved through the catalog, albeit via the wrong kind of entries, so I must be missing something somewhere.
Anyway, I'm sure I'll figure it out eventually.
Thanks for the help.
Cheers,
Eliot
participants (3)
-
Eliot Kimber
-
George Cristian Bina
-
Sorin Ristache