Re: [oXygen-user] Commenting and Documenting Customer XML

2 May 2008

      Karl,

The good folks at syncROsoft will undoubtedly chime in with features 
to recommend, but be aware that what you are asking is very 
open-ended: how do you use and develop tools to perform analytics on 
documents and schemas. In the general case, I doubt it's possible to 
build a set of tools for everyone that does this comprehensively and 
well, given the variety of different sorts of document sets, their 
schemas, how and whether they are already documented, their designs 
and implementation patterns, and the requirements of different sorts 
of transformations. This is not to say that there are no sweet spots 
that a toolkit like oXygen can find (it's already found a few); just 
that whatever oXygen gives you, inevitably you are going to be on 
your own to an extent -- to say nothing of being in a position to 
envision and possibly implement something that is very useful for you 
without necessarily being useful to anyone else (or even to you on 
the next project). That's just what life on the leading edge is like.

That having been said, the answer to your specific questions is 
certainly yes. As you suggest, XSLT is great for this sort of thing. 
Many things can also be accomplished ad-hoc, using XPath (especially 
XPath 2.0) from oXygen's XPath query. So, 
"distinct-values(//*/name())" will list the names of all elements 
appearing in the document, while "distinct-values(//div/@class)" will 
give you all the values of @class attributes appearing on div 
elements. Etc. etc. (IINM oXygen has even promised to give us a way 
to export the results of these queries in XML, which will be extra 
useful.) I have also found Schematron to be very useful and fairly 
lightweight for edge-case diagnostics over sets of documents.

For analytics over large sets of documents, for performance reasons 
you may wish to load your documents into an XML database. Certain 
databases can be configured as back ends to oXygen, as documented on the site.

The hardest part of this -- and the reason why it's not necessarily 
easy to generalize -- is in defining your requirements: what do you 
want to find and how do you want the report to look?

Having done that, implementing is generally pretty straightforward. 
For example, here's a simple XSLT 2.0 template that lists all the 
element types appearing with their occurrences by parent:

<xsl:template match="/">
   <xsl:for-each-group select="//*" group-by="name()">
     <element count="{count(current-group())}">
       <name>
         <xsl:value-of select="current-grouping-key()"/>
       </name>
       <xsl:for-each-group select="current-group()/.." group-by="name()">
          <parent count="{count(current-group())}">
         <xsl:value-of select="current-grouping-key()"/>
          </parent>
       </xsl:for-each-group>
     </element>
   </xsl:for-each-group>
</xsl:template>

Many useful variations on this are readily imaginable.

Cheers,
Wendell

p.s. I do find RelaxNG easier for analytics than XSD, but that may be 
personal taste.

At 09:35 PM 4/30/2008, you wrote:
...
Ok, baby steps!
Using Trang Converter, I've created an XML schema from XML source.
The output I chose was W#C XML-Schema (recommendations for other
formats are welcome, I do not know the pros/cons here).  Now I am
going to mark up the schema with xs:documentation and xs:annotation.
Rigth so far?
Ok, how about this.  I have my stylesheet and I have the XML source.
Arbitrarily I have chosen to use a number of the elements in the
stylesheet.  Is it possible to create a USED and NOT USED resource of
elements and attributes from the 2 documents?  So a document outlining
the uses, maybe the count of occurences?  I could write another
transformation to figure this out, er, but if it is built into Oxygen
than that would be great.
Karl..
======================================================================
Wendell Piez                            mailto:wapiez@mulberrytech.com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
   Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================