Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

What is the proper charset for decoding System#lineSeparator?

Let us say we need to verify the following method.

    /**
     * Prints {@code hello, world}, to the {@link System#out}, followed by a system dependent line separator.
     *
     * @param args command line arguments.
     */
    public static void main(String... args) {
        System.out.printf("hello, world%n"); // %n !!!!!!
    }

Now we can verify the method prints the hello, world.

    /**
     * Verifies that the {@link HelloWorld#main(String...)} method prints {@code hello, world}, to the
     * {@link System#out}, followed by a system-dependent line separator.
     *
     * @author Jin Kwon <onacit_at_gmail.com>
     * @see System#lineSeparator()
     */
    @DisplayName("main(args) prints 'hello, world' followed by a system-dependent line separator")
    @Test
    public void main_PrintHelloWorld_() {
        final var out = System.out;
        try {
            // --------------------------------------------------------------------------------------------------- given
            final var buffer = new ByteArrayOutputStream();
            System.setOut(new PrintStream(buffer));
            // ---------------------------------------------------------------------------------------------------- when
            HelloWorld.main();
            // ---------------------------------------------------------------------------------------------------- then
            final var actual = buffer.toByteArray();
            final var expected = ("hello, world" + System.lineSeparator()).getBytes(StandardCharsets.US_ASCII);
            Assertions.assertArrayEquals(actual, expected);
        } finally {
            System.setOut(out);
        }
    }

The questionable part is the .getBytes(StandardCharsets.US_ASCII).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I don’t think it’s wrong to presume the system-dependent line separator encodes with US_ASCII.

Is the Charset#defaultCharset() right for the %n?

>Solution :

You should use the same encoding that encoded the byte array returned by buffer.toByteArray().

It is the PrintStream‘s job to turn strings into bytes, so what encoding does your PrintStream use? You created the PrintStream by calling this constructor. The documentation says:

Characters written to the stream are converted to bytes using the default charset, or where out is a PrintStream, the charset used by the print stream.

So you should use Charset.defaultCharset() to encode the expected string into a byte array.

Also consider passing your own Charset to the PrintStream using this constructor, and use the same charset for encoding the expected string. This way you make it very clear that you are using the correct charset.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading