Re: [oXygen-user] Re: [dita-users] All possible Xpath generator?

Hedley, Unfortunately the list of "all possible XPaths to a text file" is infinite in many cases, due to the possibility of recursive structures such as nested div or section elements, lists inside lists, or inline elements that may have arbitrary inline elements in their content. Do you really want a path such as "/doc/body/div/div/div/div/list/item/list/item/list/item/p/figure/caption/p/b/i/mono/i/b" even such a path points to an element that could be valid? I think Dan is right that the requirement needs some refinement. Cheers, Wendell At 08:42 PM 2/12/2008, you wrote:
Dan:
At Wednesday, 13/02/2008, 12:16 PM;, Dan wrote: your post is a bit confusing, and some better details/explanations would be nice to see. What do you mean by "write a list of all possible absolute Xpaths to a text file."
Rephrasing my original request: I am developing a CSS implementation for [instance documents that conform to] an XML schema. It would really help to check if all [required CSS class matches] have been covered if I could find a utility that would scan a DTD (including *.mod inclusions) or XML Schema to write a list of all absolute -- not including wildcards -- [Xpaths from the root element to each possible leaf element] ... to a text file. For example, using a possible path from a DITA DTD:
/reference/refbody/section/p
This would help determine what class definitions can be generic no matter in what context an element apppears (e.g. <i>) and what may need different treatment depending on context (e.g. */section/title). Oh, and if the generator could list the Xpath in reverse, from leaf node to root, as well that would be pleasant:
p\section\refbody\reference\
Then I could sort the list of paths so that all instances where <p> was a leaf would be together and I could decide which contexts could share a CSS class and which would need context-specific classes.
I've tried using the <oXygen/> instance generator on the DITA task.xsd, but even limiting recursion depth and number of repetitions, it produces very large files, possibly not completing in my lifetime. And then there is the problem of extracting the Xpaths.
Hope that makes it clearer, Hedley
-- Hedley Stewart Finger 28 Regent Street Camberwell VIC 3124 Australia Tel. +61 3 9809 1229 Mobile +61 412 461 558, E-mail <mailto:hfinger@handholding.com.au>
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user
====================================================================== Wendell Piez mailto:wapiez@mulberrytech.com Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

I was thinking the same thing as Wendell wrote, regarding recursive structures. However, if the request is limited to non-recursive structures (and to DTDs, not XML Schema... personal preference) it sounds like a fun little project. .mod inclusion should come for free with any library that processes DTDs, because as I understand it, we're talking about parameter entities, which are a required part of an XML parser (at least, of an XML parser that handles DTDs at all). I was wondering what is an easy-to-use DTD parser that exposes the rules (not just a validation function) via an API. If there is one for Python, for example, it seems like writing this tool would be easy. So far the most promising I've found is xml.parsers.expat, which offers the following handler for element type declarations: ElementDeclHandler(name, model) Called once for each element type declaration. name is the name of the element type, and model is a representation of the content model. This should allow one to build up a data structure of element type declarations, and use that to generate a list of possible XPaths (within constraints). As for recursive structures... you could detect them pretty easily, I think. And when they occur, you could indicate them something like this: /doc/body/div/div... /doc/body/div/list/item/list... where the ... indicates that the last element name begins a recursion. Since DTDs basically describe context-free grammars, we don't need to worry about recursion beyond one level. Another alternative would be to limit the number of levels of recursion, or the total depth of any XPath generated. E.g. you could tell the generator to generate all XPaths with up to 2 levels of recursion (of all element types), or up to 10 path steps ('/'). This sort of allowed-but-constrained recursion could be useful in some cases, where you have e.g. divs within divs that have different meaning from divs not within divs. I'm assuming from your description that you are not concerned about attributes or about text content of elements. Also, you said "Xpaths from the root element to each possible leaf element", which means we have to remove XPaths for elements that cannot be leaves (which in your case means they must have child *elements*, not just text or attributes or anything else). That makes it a bit harder. I've hacked up a python prototype and will email it to you, once I have it checking for recursion. Lars On 2/14/2008 11:13 AM, Wendell Piez wrote:
Hedley,
Unfortunately the list of "all possible XPaths to a text file" is infinite in many cases, due to the possibility of recursive structures such as nested div or section elements, lists inside lists, or inline elements that may have arbitrary inline elements in their content.
Do you really want a path such as "/doc/body/div/div/div/div/list/item/list/item/list/item/p/figure/caption/p/b/i/mono/i/b" even such a path points to an element that could be valid?
I think Dan is right that the requirement needs some refinement.
Cheers, Wendell
At 08:42 PM 2/12/2008, you wrote:
Dan:
At Wednesday, 13/02/2008, 12:16 PM;, Dan wrote: your post is a bit confusing, and some better details/explanations would be nice to see. What do you mean by "write a list of all possible absolute Xpaths to a text file."
Rephrasing my original request: I am developing a CSS implementation for [instance documents that conform to] an XML schema. It would really help to check if all [required CSS class matches] have been covered if I could find a utility that would scan a DTD (including *.mod inclusions) or XML Schema to write a list of all absolute -- not including wildcards -- [Xpaths from the root element to each possible leaf element] ... to a text file. For example, using a possible path from a DITA DTD:
/reference/refbody/section/p
This would help determine what class definitions can be generic no matter in what context an element apppears (e.g. <i>) and what may need different treatment depending on context (e.g. */section/title). Oh, and if the generator could list the Xpath in reverse, from leaf node to root, as well that would be pleasant:
p\section\refbody\reference\
Then I could sort the list of paths so that all instances where <p> was a leaf would be together and I could decide which contexts could share a CSS class and which would need context-specific classes.
I've tried using the <oXygen/> instance generator on the DITA task.xsd, but even limiting recursion depth and number of repetitions, it produces very large files, possibly not completing in my lifetime. And then there is the problem of extracting the Xpaths.
Hope that makes it clearer, Hedley
-- Hedley Stewart Finger 28 Regent Street Camberwell VIC 3124 Australia Tel. +61 3 9809 1229 Mobile +61 412 461 558, E-mail <mailto:hfinger@handholding.com.au>
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user
participants (2)
-
Lars Huttar
-
Wendell Piez