Feature request - Copy-and-paste from Word/Excel/HTML to DITA

Some XML authoring tools can copy-and-paste from Word / Excel / HTML and intelligently re-tag into DITA. Oxygen doesn't seem to be able to do that yet. Can you add it to your list of feature requests? Thanks very much! Seraphim

Hello, Version 9.3 which will be released in a couple of weeks will include an Archive Browser view that is able to open and browse Word and Excel documents saved in XML format, that is .docx files and .xlsx files. In the Archive Browser view the files that are included in such a Word or Excel document can be opened and edited in Oxygen so migrating the data to a DITA document will be easy: just apply an XSLT stylesheet to the XML file containing the data that must be imported. For importing from an Excel document saved in the old format (.xls file) you should try the action for importing MS Excel files from menu File -> Import. The action creates an XML file with a configurable structure. This file can be merged into a DITA document with a custom XSLT stylesheet that you write. There is a stylesheet in DITA Open Toolkit for converting an HTML document to a DITA one. It is called h2d.xsl and it is located in [Oxygen-folder]/frameworks/dita/DITA-OT/demo/h2d. Just create a transformation scenario with this stylesheet and apply it to your HTML documents. The type of the DITA result is configurable with a parameter stylesheet: concept, reference, task, topic. The DITA result may contain elements <required-cleanup> which you should edit manually. You can find more details at: http://dita-ot.sourceforge.net/doc/DITA-h2d.html Regards, Sorin Seraphim Larsen wrote:
Some XML authoring tools can copy-and-paste from Word / Excel / HTML and intelligently re-tag into DITA.
Oxygen doesn't seem to be able to do that yet. Can you add it to your list of feature requests?
Thanks very much! Seraphim

Hi, At 09:13 AM 6/20/2008, Sorin wrote:
Version 9.3 which will be released in a couple of weeks will include an Archive Browser view that is able to open and browse Word and Excel documents saved in XML format, that is .docx files and .xlsx files. In the Archive Browser view the files that are included in such a Word or Excel document can be opened and edited in Oxygen so migrating the data to a DITA document will be easy: just apply an XSLT stylesheet to the XML file containing the data that must be imported.
I like this approach, as there are several other XML vocabularies that might be wanted as targets for upconversion. Nothing against DITA, of course, but it makes sense to consider requirements for other tag sets as well. However, those of us who have even glanced at .docx format know that it's a ravenous beast of unorthodox tagging practice, for which will be a challenge to write stylesheets. One solution to this problem would entail a generic stylesheet that will upconvert .docx into a more regular and proper sort of XML, in which (just to mention the most glaring problem) mixed content is actually mixed content. Such a plain vanilla word-processing XML would make a much more tractable source format for conversion into arbitrary targets such as DITA or what have you. I dare say this stylesheet will be a devil to write, especially if it aimed to be comprehensive. All the more reason to solve this problem once instead of making everyone solve it on their own. An alternative (which might be more feasible) might be a library of XSLT templates and functions that would help take care of the hard parts. Cheers, Wendell ====================================================================== Wendell Piez mailto:wapiez@mulberrytech.com Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================
participants (3)
-
Seraphim Larsen
-
Sorin Ristache
-
Wendell Piez