Re: [oXygen-user] Xpath and Saxon return tabs as text

9 Sep 2008

      Hi,

In the meantime, Philip should be aware that 
there is generally only a loose binding between a 
schema (or DTD) and a document, such that (other 
things being equal) processors will not 
automatically strip whitespace-only text nodes 
from documents without explicit instruction to do 
so. This is by design, since schemas are not 
always available to processors, and indeed some 
operations can and should be able to run without 
schemas. Whitespace stripping without a schema is 
dangerous and can frequently result in corrupt 
data where whitespace was stripped improperly.

Accordingly, although the XPath 2.0/XQuery family 
of technologies provides this feature, Philip may 
have to get used to its not always being 
available, for example when using XPath 1.0.

In general, it's something to watch out for; 
automatic whitespace stripping can easily fall 
into the category of "be careful what you wish for".

Cheers,
Wendell

At 11:23 AM 9/3/2008, Sorin wrote:
...
Hello,
Saxon 9 has an option for stripping whitespace 
nodes but Oxygen allows you to set it only for 
transformations (Preferences -> XML -> 
XSLT-FO-XQuery -> XSLT -> Saxon -> Saxon-B/SA). 
If you set the above option to strip whitespace 
nodes and you run an XSLT transform that uses 
the expression //text() you can see that the 
list of nodes does not contain such nodes. In 
the next version we will add this Saxon 9 option for XPath expressions too.
...
...
Philipp SteinkrÃ¼ger wrote:
...
Dear Oxygen-Users,
i am having a problem with an indented XML File. The File looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<TEI 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns="http://www.i-d-e.de/ns/1.0">
    <teiHeader>
        <fileDesc>
            <titleStmt>
                <title>MS Einsiedeln</title>
            </titleStmt>
            <publicationStmt>
                <p>publicationsStmt empty</p>
            </publicationStmt><sourceDesc>
                <p>sourceDesc empty</p>
            </sourceDesc></fileDesc>
    </teiHeader>
    <text>
        <body>
            <div>
                <div>
                    <div>
                        <p><c>D</c>ie gotheit 
it beloÅ¿en<lb/>in dem vater n<ex>atur</ex>elich
                                dar<lb/>vmbe 
it er alvermvgende<lb/>vnd enpfat niht von ite<lb
                                />des<gap 
reason=""/> er elber nit en it an<lb/>iner go<unclear
                                >tl</unclear>icher macht wan<lb/>ers
weelich i<ex>n</ex> ime vnd
                                an<lb/>ime 
elben beloÅ¿en hat<space unit="letters" quantity="1"
                        /></p>
   </div>
     </div>
       </div>
    </body>
  </text>
</TEI>
Now, using the following XPath 2.0 expression: 
//text(), the tabs are returned as text-nodes, 
for example the first tab before the tag 
<teiHeader>. In fact, my DTD does not allow 
#PCDATA inside <TEI>, but the document is 
validated without any problems. To me this 
seems kind of schizophrenic, or am I mistaken? 
Btw: the same file in XMLSpy with its build-in 
xslt engine as well as MS XML parser with the 
same xPath expression does not return the tabs as text-nodes.
Any ideas?
Philipp
PS: I am using Oxygen 9.3
======================================================================
Wendell Piez                            mailto:wapiez@mulberrytech.com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
   Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================