Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

What is the difference between chars() and codePoints() method in CharSequence interface?

I read javadoc, but don’t understand the differences, both of them return same result.
Also can anyone explain what is ‘zero-extending’ means?

Javadoc of chars() method

Returns a stream of int zero-extending the char values from this sequence. Any char which maps to a surrogate code point is passed through uninterpreted.
The stream binds to this sequence when the terminal stream operation commences (specifically, for mutable sequences the spliterator for the stream is late-binding). If the sequence is modified during that operation then the result is undefined.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Javadoc of codePoints() method

Returns a stream of code point values from this sequence. Any surrogate pairs encountered in the sequence are combined as if by Character.toCodePoint and the result is passed to the stream. Any other code units, including ordinary BMP characters, unpaired surrogates, and undefined code units, are zero-extended to int values which are then passed to the stream.
The stream binds to this sequence when the terminal stream operation commences (specifically, for mutable sequences the spliterator for the stream is late-binding). If the sequence is modified during that operation then the result is undefined.

>Solution :

A ‘char’ is a 16-bit unsigned value in Java, so there are 65536 possible chars.

Unicode unfortunately now has more than 65536 characters, each of which is identified by a ‘codepoint’, which is a number from 0 to whatever.

It is therefore obviously not possible to represent every character as a single Java ‘char’. There are two choices available to the Java programmer for codepoints larger than 65535: a pair of chars (known as a surrogate pair) or else a single 32-bit integer codepoint.

The difference between char and codepoint shows up only for codepoints larger than 65535.

Note that the 32-bit ‘codepoint’ value is not simply the concatenation of the two 16-bit ‘char’ values. The surrogate pair is appropriately decoded.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading