[oXygen XML Editor Blog] - Batch converting HTML to XHTML

oXygen XML Editor Blog /////////////////////////////////////////// Batch converting HTML to XHTML Posted: 12 Jun 2017 02:03 AM PDT http://feedproxy.google.com/~r/AboutOxygenXmlEditor/~3/aF0B0Z1Zw1I/batch-converting-html-to-xhtml.html?utm_source=feedburner&utm_medium=email Let's say you have a bunch of possible not-wellformed HTML documents already created and you want to process them using XSLT. For example you may want to migrate the HTML to DITA using the predefined XHTML to DITA Topic transformation scenario available in Oxygen. So you need to create valid XML wellformed XHTML documents from the existing HTML documents and you need to do this in a batch processing automated fashion. There are lots of open source projects which deliver processors which can convert HTML to its wellformed XHTML equivalent. For this blog post we'll use HTML Tidy. Here are a couple of steps to automate this process: Create a new folder on your hard drive (for example I created one on my Desktop: C:\Users\radu_coravu\Desktop\tidy) and download there the HTML Tidy executable specific for your platform: http://binaries.html-tidy.org/. In the same folder with the Tidy executable create an ANT build file called build.xml having the following content: <project basedir="." name="TidyUpHTMLtoXHTML" default="main"> <basename property="filename" file="${file}"/> <target name="main"> <exec command="tidy.exe -o ${output.dir}/${filename} ${file}"/> </target> </project> Link in the Oxygen Project view the entire folder where the original HTML documents are located. Right click the folder, choose Transform->Configure Transformation Scenarios... and create a new transformation scenario of type ANT Scenario. Modify the following properties in the transformation scenario: Change the scenario name to something relevant like HTML to XHTML. Change the Working Directory to point to the folder where the ANT build file is located, in my case: C:\Users\radu_coravu\Desktop\tidy. Change the Build file to point to your custom build.xml, in my case: C:\Users\radu_coravu\Desktop\tidy\build.xml. In the Parameters tab add a parameter called file with value ${cf} and a parameter called output.dir with value the path to the output folder where the equivalent XHTML files will be stored, in my case I set it to: C:\Users\radu_coravu\Desktop\testOutputXHTML. Apply the newly transformation scenario on the entire folder containing the HTML documents. At the end in the output folder you will find the XHTML equivalents of the original HTML files, XHTML documents which can later be processed using XML technologies like XSLT or XQuery. -- You are subscribed to email updates from "oXygen XML Editor Blog." To stop receiving these emails, you may unsubscribe now: https://feedburner.google.com/fb/a/mailunsubscribe?k=y_tRXtumvTurKTedh51JnlY... Email delivery powered by Google. Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States
participants (1)
-
oXygen XML Editor Blog