Re: [oXygen-user] Content of oXygen-user Digest, Vol 27, Issue 8

Unicode 4.0 does not contain the character you're looking for. It's found in Unicode 6.0 Cyrillic Extended-B. Java 1.6 is based on Unicode 4.0. Java 1.7 is based on Unicode 6.0. Thus, to get the proper mapping, you'll need to be using Java 1.7. ---------------------- Sample Program ---------------------- public class Case { public static void main(String[] args) { String unicodeVersion; String specVersion = System.getProperty("java.specification.version"); if(specVersion.equals("1.7")) unicodeVersion = "6.0"; else if(specVersion.equals("1.6")) unicodeVersion = "4.0"; else unicodeVersion = "n/a"; System.out.println(unicodeVersion); char[] originalChars = { 0x41, 0xa656 }; String theString = new String(originalChars); System.out.println(theString.charAt(0) + "\t" + theString.charAt(1)); System.out.println(theString.codePointAt(0) + "\t" + theString.codePointAt(1)); System.out.println(Character.isLowerCase(theString.charAt(0)) + "\t" + Character.isLowerCase(theString.charAt(1))); theString = theString.toLowerCase(); System.out.println(theString.charAt(0) + "\t" + theString.charAt(1)); System.out.println(theString.codePointAt(0) + "\t" + theString.codePointAt(1)); System.out.println(Character.isLowerCase(theString.charAt(0)) + "\t" + Character.isLowerCase(theString.charAt(1))); } } ---------------------- Java 1.6 Output ---------------------- 4.0 A ? 65 42582 false false a ? 97 42582 true false ---------------------- Java 1.7 Output ---------------------- 6.0 A ? 65 42582 false false a ? 97 42583 true true -Erik On 1/15/13 11:00 AM, "oxygen-user-request@oxygenxml.com" <oxygen-user-request@oxygenxml.com> wrote:
Send oXygen-user mailing list submissions to oxygen-user@oxygenxml.com
To subscribe or unsubscribe via the World Wide Web, visit http://www.oxygenxml.com/mailman/listinfo/oxygen-user or, via email, send a message with subject or body 'help' to oxygen-user-request@oxygenxml.com
You can reach the person managing the list at oxygen-user-owner@oxygenxml.com
When replying, please edit your Subject line so it is more specific than "Re: Contents of oXygen-user digest..."
Today's Topics:
1. Re: unicode support? (Oxygen XML Editor Support)
----------------------------------------------------------------------
Message: 1 Date: Tue, 15 Jan 2013 18:02:24 +0200 From: Oxygen XML Editor Support <support@oxygenxml.com> Subject: Re: [oXygen-user] unicode support? To: David Birnbaum <djbpitt@gmail.com> Cc: oxygen-user@oxygenxml.com Message-ID: <50F57D90.1060208@oxygenxml.com> Content-Type: text/plain; charset=UTF-8; format=flowed
Hello,
This is XSLT processor related. My guess is Saxon 9 doesn't process the lower-case() function as you expect. This could also be further delegated as Java related, since Saxon 9 runs on top of Java and I'm guessing it uses its uppercase/lowercase mapping mechanism. Further investigation is necessary.
I've also looked at the default-collation attribute from XSLT, but it doesn't seem to affect this.
Regards, Adrian
Adrian Buza oXygen XML Editor and Author Support
Tel: +1-650-352-1250 ext.202 Fax: +40-251-461482 support@oxygenxml.com http://www.oxygenxml.com
David Birnbaum wrote:
Dear <oXygen/> support,
I'm trying to case-fold some early Cyrillic text, which includes characters from the Unicode Cyrillic B range (http://www.unicode.org/charts/PDF/UA640.pdf), and the lower-case() function does not seem to be returning what I expect. I am testing in the XPath browser box in <oXygen/> 14.1 (set to XPath 2.0), but I get the same results when performing an XSLT transformation using Saxon-PE 9.4.0.4.
Input: string-to-codepoints('Ꙗ') Output (as expected): 42582
Input: string-to-codepoints(lower-case('Ꙗ')) Output (incorrect): 42582
That is, I get the same result when I process this upper-case letter regardless of whether I try to convert it to lower case.
The lower-case counterpart of U+A656 is U+A657. The case mapping seems to be correct in the Unicode property table at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt, where the relevant lines are:
A656;CYRILLIC CAPITAL LETTER IOTIFIED A;Lu;0;L;;;;;N;;;;A657; A657;CYRILLIC SMALL LETTER IOTIFIED A;Ll;0;L;;;;;N;;;A656;;A656
For comparison (ASCII-range characters):
Input: string-to-codepoints('A') Output (as expected): 65
Input: string-to-codepoints(lower-case('A')) Output (as expected): 97
It looks, then, as if the lower-case() function works properly on some Unicode characters, such as those in the ASCII range, but not on others, such as those in the Cyrillic B range. The Cyrillic B characters have been in Unicode since version 5.1.0 (April 4, 2008); Unicode is now at 6.2.0. Is this a bug (and if so, whose bug is it?), or are my expectations based on a misunderstanding?
Thanks,
David (djbpitt@gmail.com <mailto:djbpitt@gmail.com>) ------------------------------------------------------------------------
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user
------------------------------
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user
End of oXygen-user Digest, Vol 27, Issue 8 ******************************************

Hello Erik, Thank you for looking into this. I can confirm that this is indeed the issue and only Java 7 returns the expected result. I've also tested Saxon/Oxygen with Java SE 7 and "string-to-codepoints(lower-case('Ꙗ'))" returns the correct value: 42583 Instructions for running Oxygen with Java 7: 1. Download and install Java SE 7 from: http://www.oracle.com/technetwork/java/javase/downloads/index.html Make sure you download Java SE 7 of the same architecture (32/64) as the kit of Oxygen that you have installed. 2. To bypass the Java VM bundled with Oxygen for the Windows and Linux installations of Oxygen, quit the application, navigate to the Oxygen installation folder and rename the 'jre' folder. e.g. 'jreold'. When you start Oxygen, it will automatically pick up and use your system installed Java VM. 2b. (optional) You could copy the system installed JRE to the Oxygen folder and rename the folder to 'jre'. 3. Check the Java version in Oxygen: Help > About, System properties, "java.runtime.version" For OS X, currently (v14.1) it's not possible to use the Oxygen .app launchers with Java SE 7. When you start them they will look for Java SE 6 and use that instead. To start Oxygen with Java SE 7 on OS X, you have to open a Terminal window and run the command line script: oxygenMac.sh for XML Editor, oxygenAuthorMac.sh for XML Author or oxygenDeveloperMac.sh for XML Developer. Please note that we do not yet encourage using Java 7 with Oxygen on OS X. If you do not need Java 7 for a specific reason (like in this case), we recommend sticking with Java SE 6 for Oxygen for the time being (use the .app launchers). Regards, Adrian Adrian Buza oXygen XML Editor and Author Support Tel: +1-650-352-1250 ext.202 Fax: +40-251-461482 support@oxygenxml.com http://www.oxygenxml.com Holley, Erik wrote:
Unicode 4.0 does not contain the character you're looking for. It's found in Unicode 6.0 Cyrillic Extended-B. Java 1.6 is based on Unicode 4.0. Java 1.7 is based on Unicode 6.0. Thus, to get the proper mapping, you'll need to be using Java 1.7.
---------------------- Sample Program ---------------------- public class Case {
public static void main(String[] args) {
String unicodeVersion; String specVersion = System.getProperty("java.specification.version"); if(specVersion.equals("1.7")) unicodeVersion = "6.0"; else if(specVersion.equals("1.6")) unicodeVersion = "4.0"; else unicodeVersion = "n/a";
System.out.println(unicodeVersion);
char[] originalChars = { 0x41, 0xa656 }; String theString = new String(originalChars); System.out.println(theString.charAt(0) + "\t" + theString.charAt(1)); System.out.println(theString.codePointAt(0) + "\t" + theString.codePointAt(1)); System.out.println(Character.isLowerCase(theString.charAt(0)) + "\t" + Character.isLowerCase(theString.charAt(1)));
theString = theString.toLowerCase(); System.out.println(theString.charAt(0) + "\t" + theString.charAt(1)); System.out.println(theString.codePointAt(0) + "\t" + theString.codePointAt(1)); System.out.println(Character.isLowerCase(theString.charAt(0)) + "\t" + Character.isLowerCase(theString.charAt(1))); }
}
---------------------- Java 1.6 Output ---------------------- 4.0 A ? 65 42582 false false a ? 97 42582 true false
---------------------- Java 1.7 Output ---------------------- 6.0 A ? 65 42582 false false a ? 97 42583 true true
-Erik
On 1/15/13 11:00 AM, "oxygen-user-request@oxygenxml.com" <oxygen-user-request@oxygenxml.com> wrote:
Send oXygen-user mailing list submissions to oxygen-user@oxygenxml.com
To subscribe or unsubscribe via the World Wide Web, visit http://www.oxygenxml.com/mailman/listinfo/oxygen-user or, via email, send a message with subject or body 'help' to oxygen-user-request@oxygenxml.com
You can reach the person managing the list at oxygen-user-owner@oxygenxml.com
When replying, please edit your Subject line so it is more specific than "Re: Contents of oXygen-user digest..."
Today's Topics:
1. Re: unicode support? (Oxygen XML Editor Support)
----------------------------------------------------------------------
Message: 1 Date: Tue, 15 Jan 2013 18:02:24 +0200 From: Oxygen XML Editor Support <support@oxygenxml.com> Subject: Re: [oXygen-user] unicode support? To: David Birnbaum <djbpitt@gmail.com> Cc: oxygen-user@oxygenxml.com Message-ID: <50F57D90.1060208@oxygenxml.com> Content-Type: text/plain; charset=UTF-8; format=flowed
Hello,
This is XSLT processor related. My guess is Saxon 9 doesn't process the lower-case() function as you expect. This could also be further delegated as Java related, since Saxon 9 runs on top of Java and I'm guessing it uses its uppercase/lowercase mapping mechanism. Further investigation is necessary.
I've also looked at the default-collation attribute from XSLT, but it doesn't seem to affect this.
Regards, Adrian
Adrian Buza oXygen XML Editor and Author Support
Tel: +1-650-352-1250 ext.202 Fax: +40-251-461482 support@oxygenxml.com http://www.oxygenxml.com
David Birnbaum wrote:
Dear <oXygen/> support,
I'm trying to case-fold some early Cyrillic text, which includes characters from the Unicode Cyrillic B range (http://www.unicode.org/charts/PDF/UA640.pdf), and the lower-case() function does not seem to be returning what I expect. I am testing in the XPath browser box in <oXygen/> 14.1 (set to XPath 2.0), but I get the same results when performing an XSLT transformation using Saxon-PE 9.4.0.4.
Input: string-to-codepoints('Ꙗ') Output (as expected): 42582
Input: string-to-codepoints(lower-case('Ꙗ')) Output (incorrect): 42582
That is, I get the same result when I process this upper-case letter regardless of whether I try to convert it to lower case.
The lower-case counterpart of U+A656 is U+A657. The case mapping seems to be correct in the Unicode property table at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt, where the relevant lines are:
A656;CYRILLIC CAPITAL LETTER IOTIFIED A;Lu;0;L;;;;;N;;;;A657; A657;CYRILLIC SMALL LETTER IOTIFIED A;Ll;0;L;;;;;N;;;A656;;A656
For comparison (ASCII-range characters):
Input: string-to-codepoints('A') Output (as expected): 65
Input: string-to-codepoints(lower-case('A')) Output (as expected): 97
It looks, then, as if the lower-case() function works properly on some Unicode characters, such as those in the ASCII range, but not on others, such as those in the Cyrillic B range. The Cyrillic B characters have been in Unicode since version 5.1.0 (April 4, 2008); Unicode is now at 6.2.0. Is this a bug (and if so, whose bug is it?), or are my expectations based on a misunderstanding?
Thanks,
David (djbpitt@gmail.com <mailto:djbpitt@gmail.com>) ------------------------------------------------------------------------
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user
------------------------------
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user
End of oXygen-user Digest, Vol 27, Issue 8 ******************************************
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user
participants (2)
-
Holley, Erik
-
Oxygen XML Editor Support