
Karl, The good folks at syncROsoft will undoubtedly chime in with features to recommend, but be aware that what you are asking is very open-ended: how do you use and develop tools to perform analytics on documents and schemas. In the general case, I doubt it's possible to build a set of tools for everyone that does this comprehensively and well, given the variety of different sorts of document sets, their schemas, how and whether they are already documented, their designs and implementation patterns, and the requirements of different sorts of transformations. This is not to say that there are no sweet spots that a toolkit like oXygen can find (it's already found a few); just that whatever oXygen gives you, inevitably you are going to be on your own to an extent -- to say nothing of being in a position to envision and possibly implement something that is very useful for you without necessarily being useful to anyone else (or even to you on the next project). That's just what life on the leading edge is like. That having been said, the answer to your specific questions is certainly yes. As you suggest, XSLT is great for this sort of thing. Many things can also be accomplished ad-hoc, using XPath (especially XPath 2.0) from oXygen's XPath query. So, "distinct-values(//*/name())" will list the names of all elements appearing in the document, while "distinct-values(//div/@class)" will give you all the values of @class attributes appearing on div elements. Etc. etc. (IINM oXygen has even promised to give us a way to export the results of these queries in XML, which will be extra useful.) I have also found Schematron to be very useful and fairly lightweight for edge-case diagnostics over sets of documents. For analytics over large sets of documents, for performance reasons you may wish to load your documents into an XML database. Certain databases can be configured as back ends to oXygen, as documented on the site. The hardest part of this -- and the reason why it's not necessarily easy to generalize -- is in defining your requirements: what do you want to find and how do you want the report to look? Having done that, implementing is generally pretty straightforward. For example, here's a simple XSLT 2.0 template that lists all the element types appearing with their occurrences by parent: <xsl:template match="/"> <xsl:for-each-group select="//*" group-by="name()"> <element count="{count(current-group())}"> <name> <xsl:value-of select="current-grouping-key()"/> </name> <xsl:for-each-group select="current-group()/.." group-by="name()"> <parent count="{count(current-group())}"> <xsl:value-of select="current-grouping-key()"/> </parent> </xsl:for-each-group> </element> </xsl:for-each-group> </xsl:template> Many useful variations on this are readily imaginable. Cheers, Wendell p.s. I do find RelaxNG easier for analytics than XSD, but that may be personal taste. At 09:35 PM 4/30/2008, you wrote:
Ok, baby steps! Using Trang Converter, I've created an XML schema from XML source. The output I chose was W#C XML-Schema (recommendations for other formats are welcome, I do not know the pros/cons here). Now I am going to mark up the schema with xs:documentation and xs:annotation. Rigth so far?
Ok, how about this. I have my stylesheet and I have the XML source. Arbitrarily I have chosen to use a number of the elements in the stylesheet. Is it possible to create a USED and NOT USED resource of elements and attributes from the 2 documents? So a document outlining the uses, maybe the count of occurences? I could write another transformation to figure this out, er, but if it is built into Oxygen than that would be great.
Karl..
====================================================================== Wendell Piez mailto:wapiez@mulberrytech.com Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================