
Unicode 4.0 does not contain the character you're looking for. It's found in Unicode 6.0 Cyrillic Extended-B. Java 1.6 is based on Unicode 4.0. Java 1.7 is based on Unicode 6.0. Thus, to get the proper mapping, you'll need to be using Java 1.7. ---------------------- Sample Program ---------------------- public class Case { public static void main(String[] args) { String unicodeVersion; String specVersion = System.getProperty("java.specification.version"); if(specVersion.equals("1.7")) unicodeVersion = "6.0"; else if(specVersion.equals("1.6")) unicodeVersion = "4.0"; else unicodeVersion = "n/a"; System.out.println(unicodeVersion); char[] originalChars = { 0x41, 0xa656 }; String theString = new String(originalChars); System.out.println(theString.charAt(0) + "\t" + theString.charAt(1)); System.out.println(theString.codePointAt(0) + "\t" + theString.codePointAt(1)); System.out.println(Character.isLowerCase(theString.charAt(0)) + "\t" + Character.isLowerCase(theString.charAt(1))); theString = theString.toLowerCase(); System.out.println(theString.charAt(0) + "\t" + theString.charAt(1)); System.out.println(theString.codePointAt(0) + "\t" + theString.codePointAt(1)); System.out.println(Character.isLowerCase(theString.charAt(0)) + "\t" + Character.isLowerCase(theString.charAt(1))); } } ---------------------- Java 1.6 Output ---------------------- 4.0 A ? 65 42582 false false a ? 97 42582 true false ---------------------- Java 1.7 Output ---------------------- 6.0 A ? 65 42582 false false a ? 97 42583 true true -Erik On 1/15/13 11:00 AM, "oxygen-user-request@oxygenxml.com" <oxygen-user-request@oxygenxml.com> wrote:
Send oXygen-user mailing list submissions to oxygen-user@oxygenxml.com
To subscribe or unsubscribe via the World Wide Web, visit http://www.oxygenxml.com/mailman/listinfo/oxygen-user or, via email, send a message with subject or body 'help' to oxygen-user-request@oxygenxml.com
You can reach the person managing the list at oxygen-user-owner@oxygenxml.com
When replying, please edit your Subject line so it is more specific than "Re: Contents of oXygen-user digest..."
Today's Topics:
1. Re: unicode support? (Oxygen XML Editor Support)
----------------------------------------------------------------------
Message: 1 Date: Tue, 15 Jan 2013 18:02:24 +0200 From: Oxygen XML Editor Support <support@oxygenxml.com> Subject: Re: [oXygen-user] unicode support? To: David Birnbaum <djbpitt@gmail.com> Cc: oxygen-user@oxygenxml.com Message-ID: <50F57D90.1060208@oxygenxml.com> Content-Type: text/plain; charset=UTF-8; format=flowed
Hello,
This is XSLT processor related. My guess is Saxon 9 doesn't process the lower-case() function as you expect. This could also be further delegated as Java related, since Saxon 9 runs on top of Java and I'm guessing it uses its uppercase/lowercase mapping mechanism. Further investigation is necessary.
I've also looked at the default-collation attribute from XSLT, but it doesn't seem to affect this.
Regards, Adrian
Adrian Buza oXygen XML Editor and Author Support
Tel: +1-650-352-1250 ext.202 Fax: +40-251-461482 support@oxygenxml.com http://www.oxygenxml.com
David Birnbaum wrote:
Dear <oXygen/> support,
I'm trying to case-fold some early Cyrillic text, which includes characters from the Unicode Cyrillic B range (http://www.unicode.org/charts/PDF/UA640.pdf), and the lower-case() function does not seem to be returning what I expect. I am testing in the XPath browser box in <oXygen/> 14.1 (set to XPath 2.0), but I get the same results when performing an XSLT transformation using Saxon-PE 9.4.0.4.
Input: string-to-codepoints('Ꙗ') Output (as expected): 42582
Input: string-to-codepoints(lower-case('Ꙗ')) Output (incorrect): 42582
That is, I get the same result when I process this upper-case letter regardless of whether I try to convert it to lower case.
The lower-case counterpart of U+A656 is U+A657. The case mapping seems to be correct in the Unicode property table at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt, where the relevant lines are:
A656;CYRILLIC CAPITAL LETTER IOTIFIED A;Lu;0;L;;;;;N;;;;A657; A657;CYRILLIC SMALL LETTER IOTIFIED A;Ll;0;L;;;;;N;;;A656;;A656
For comparison (ASCII-range characters):
Input: string-to-codepoints('A') Output (as expected): 65
Input: string-to-codepoints(lower-case('A')) Output (as expected): 97
It looks, then, as if the lower-case() function works properly on some Unicode characters, such as those in the ASCII range, but not on others, such as those in the Cyrillic B range. The Cyrillic B characters have been in Unicode since version 5.1.0 (April 4, 2008); Unicode is now at 6.2.0. Is this a bug (and if so, whose bug is it?), or are my expectations based on a misunderstanding?
Thanks,
David (djbpitt@gmail.com <mailto:djbpitt@gmail.com>) ------------------------------------------------------------------------
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user
------------------------------
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user
End of oXygen-user Digest, Vol 27, Issue 8 ******************************************