In-element whitespace and Author Mode

Dear list, I am sorry for bringing up yet another whitespace question. I almost believe that in XML the devil lives in the whitespaces. Also, it is quite possible that our problem is specific to our particular situation and of no interest to other projects. But then again, maybe someone is able to help nonetheless. We deal with TEI xml documents recording linebreaks that in many cases are not meant to represent a word boundary: <lb n="016_011"/>que con el pre<lb break="no" rendition="#noHyphen" n="016_012"/>sidente o juez que reside en la prouincia: puede <lb n="016_013"/>hazer thesoreros y receptores en su prouincia: In order to improve readability of the XML source, all our lines begin at the leftmost position of the line no matter the nesting level the current the paragraph is at. The exception is lines that begin with four spaces in order to align the @n-attribute with other lines and yet have the lb-element begin without intervening whitespace at the end of the preceding line/word fragment. But when I edit the document in author mode, it removes the linebreak within the element, so that the first of the following is a very long line and the snippet is only two lines long: <lb_n="016_011"/>que_con_el_pre<lb_break="no"_rendition="#noHyphen"_n="016_012"/>sidente_o_juez_que_reside_en_la_prouincia:_puede <lb n="016_013"/>hazer thesoreros y receptores en su prouincia: (Whitespace and indenting preferences are mentioned below.) Now our workflow relies on an external file providing links to certain places in the TEI file: <a href="W0004.xml#line=449;column=1">016_013</a> Therefore it is somewhat annoying that editing the TEI leads to ("hard") lines being drawn together and the external file increasingly pointing to wrong places. I understand that author mode parses the XML into a DOM tree and re-serializes it on save, so I don't know if this behaviour can be changed at all. But then what would you suggest how we should be approaching this problem? (Can we point to the relevant place based on the @n-attribute of the lb element? If we had to provide all the lbs with @xml:ids I think it would thwart our attempts to make the xml sources better readable. And all of this would help us with linking the two files, but the xml file would still end up with bad readability.) Thank you for any idea, Andreas P.S. I have selected the "Preserve empty lines", "Preserve text as it is" and "Preserve line breaks in attributes" in Options | Preferences | Editor/Format/XML and added "*" to the "Preserve Space" Elements. I also think I have deactivated pretty printing everywhere I could. In Editor | Edit Modes | Author | Format and indent, I have chosen "only the modified content". -- Dr. Andreas Wagner Project "The School of Salamanca" Academy of Sciences and Literature, Mainz and Institute of Philosophy Goethe University Frankfurt http://salamanca.adwmainz.de IGF HP 25 / R 2.455 Norbert-Wollheim-Platz 1 60629 Frankfurt am Main Tel. +49 (0)69/798-32774 Fax +49 (0)69/798-32794

Dear Andreas, Oxygen 17.1 will come with a set of format and indent options which will allow to define special elements before or after which line breaks should be added. But I'm afraid that won't help much in your case because: 1) The "Editor / Format / XML" has lists of specific elements but in your case you want to apply special formatting for the "lb" element only if it does not have the "break="no"" attribute set on it. And we do not yet support this kind of look up. We probably will in a future version. 2) Even if (1) was implemented, the Author editing mode normalizes the entire XML content when it gets presented in the visual editing mode. So when the content gets saved there, there are no guarantees formatting will remain precisely as it was. For example you seem to want to align the "n" attribute precisely under the other "n" attribute on the previous line. Even if you specify that you want the entire document to be considered as space-preserve, spaces between attributes and elements are not significant and the Author mode will not preserve them exactly as they were. About your workflow relying on the line and columns in a specific XML document for referencing:
<a href="W0004.xml#line=449;column=1">016_013</a>
I would consider this quite a bad practice. Any new line you insert at the beginning of the document would make all the links point to the wrong content. Usually links should be made to elements which have IDs. For TEI probably you should probably define an "<anchor xml:id='idValue'/>" and point to that. You could ask around on the TEI Users List what the best approach would be for this. So if you want to keep your current workflow unfortunately editing in the Author editing mode is not an option for you. Regards, Radu Radu Coravu <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com On 9/17/2015 12:08 PM, Andreas Wagner wrote:
Dear list,
I am sorry for bringing up yet another whitespace question. I almost believe that in XML the devil lives in the whitespaces. Also, it is quite possible that our problem is specific to our particular situation and of no interest to other projects. But then again, maybe someone is able to help nonetheless.
We deal with TEI xml documents recording linebreaks that in many cases are not meant to represent a word boundary:
<lb n="016_011"/>que con el pre<lb break="no" rendition="#noHyphen" n="016_012"/>sidente o juez que reside en la prouincia: puede <lb n="016_013"/>hazer thesoreros y receptores en su prouincia:
In order to improve readability of the XML source, all our lines begin at the leftmost position of the line no matter the nesting level the current the paragraph is at. The exception is lines that begin with four spaces in order to align the @n-attribute with other lines and yet have the lb-element begin without intervening whitespace at the end of the preceding line/word fragment.
But when I edit the document in author mode, it removes the linebreak within the element, so that the first of the following is a very long line and the snippet is only two lines long:
<lb_n="016_011"/>que_con_el_pre<lb_break="no"_rendition="#noHyphen"_n="016_012"/>sidente_o_juez_que_reside_en_la_prouincia:_puede <lb n="016_013"/>hazer thesoreros y receptores en su prouincia:
(Whitespace and indenting preferences are mentioned below.)
Now our workflow relies on an external file providing links to certain places in the TEI file:
<a href="W0004.xml#line=449;column=1">016_013</a>
Therefore it is somewhat annoying that editing the TEI leads to ("hard") lines being drawn together and the external file increasingly pointing to wrong places.
I understand that author mode parses the XML into a DOM tree and re-serializes it on save, so I don't know if this behaviour can be changed at all. But then what would you suggest how we should be approaching this problem? (Can we point to the relevant place based on the @n-attribute of the lb element? If we had to provide all the lbs with @xml:ids I think it would thwart our attempts to make the xml sources better readable. And all of this would help us with linking the two files, but the xml file would still end up with bad readability.)
Thank you for any idea,
Andreas
P.S. I have selected the "Preserve empty lines", "Preserve text as it is" and "Preserve line breaks in attributes" in Options | Preferences | Editor/Format/XML and added "*" to the "Preserve Space" Elements. I also think I have deactivated pretty printing everywhere I could. In Editor | Edit Modes | Author | Format and indent, I have chosen "only the modified content".
participants (2)
-
Andreas Wagner
-
Oxygen XML Editor Support (Radu Coravu)