
Hi, At 09:13 AM 6/20/2008, Sorin wrote:
Version 9.3 which will be released in a couple of weeks will include an Archive Browser view that is able to open and browse Word and Excel documents saved in XML format, that is .docx files and .xlsx files. In the Archive Browser view the files that are included in such a Word or Excel document can be opened and edited in Oxygen so migrating the data to a DITA document will be easy: just apply an XSLT stylesheet to the XML file containing the data that must be imported.
I like this approach, as there are several other XML vocabularies that might be wanted as targets for upconversion. Nothing against DITA, of course, but it makes sense to consider requirements for other tag sets as well. However, those of us who have even glanced at .docx format know that it's a ravenous beast of unorthodox tagging practice, for which will be a challenge to write stylesheets. One solution to this problem would entail a generic stylesheet that will upconvert .docx into a more regular and proper sort of XML, in which (just to mention the most glaring problem) mixed content is actually mixed content. Such a plain vanilla word-processing XML would make a much more tractable source format for conversion into arbitrary targets such as DITA or what have you. I dare say this stylesheet will be a devil to write, especially if it aimed to be comprehensive. All the more reason to solve this problem once instead of making everyone solve it on their own. An alternative (which might be more feasible) might be a library of XSLT templates and functions that would help take care of the hard parts. Cheers, Wendell ====================================================================== Wendell Piez mailto:wapiez@mulberrytech.com Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================