Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Java: Concatenating chars with String.charAt() and + operator turns them into encoded UTF-8

I’m having trouble properly concatenating characters out of a String[] using String.charAt() and + operator with the following method.

private PcbGroup createPcbGroup(String[] metadata, PcbGroup pcbGroup) {
    char group_short = metadata[2].charAt(0);
    
    pcbGroup.setId(Integer.parseInt(metadata[0]));
    pcbGroup.setGroup_name(metadata[1]);
    for (int i = 1; i < metadata[2].length(); i++) {
        group_short += metadata[2].charAt(i);
    }
    pcbGroup.setGroup_short(group_short);

    // create and return pcbGroup of this metadata
    return pcbGroup;
}  

I’m reading a CSV file with BufferedReader and populate String[] metadata with it. The content of the String[] metadata is [3, "Foo", ML]. The line char group_short = metadata[2].charAt(0); correctly assings 'M' to char group_short. It then turns into ? (space intended) when concatenating it with the second character 'L'.

When i save this object, Hibernate complains about a incorrect String value which appears to be '\xC2\x99'. So first ML turned into ? and got interpreted by Hibernate as '\xC2\x99'.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Hibernate: insert into pcb_group (group_name, group_short, id) values (?, ?, ?)

2022-11-22 10:41:35.902  WARN 2988 --- [           main] 
o.h.engine.jdbc.spi.SqlExceptionHelper   : SQL Error: 1366, SQLState: HY000

2022-11-22 10:41:35.903 ERROR 2988 --- [           main] 
o.h.engine.jdbc.spi.SqlExceptionHelper   : (conn=3030) Incorrect string value: '\xC2\x99' for 
column 'group_short' at row 1

2022-11-22 10:41:35.903  INFO 2988 --- [           main] o.h.e.j.b.internal.AbstractBatchImpl     
: HHH000010: On release of batch it still contained JDBC statements

I’m struggling with this little thing for a few hours now and it’s getting on my nerves, could use some help.

>Solution :

This line doesn’t do what you seem to think it does:

group_short += metadata[2].charAt(i);

While this might look like a string concatenation to you, it’s not. group_short is of type char meaning it holds a single character*.

What this does is add the codepoint value of the other characters to the one of the first character which doesn’t result in anything semantically meaningful for your use case (one could argue that it’s a very simple kind of hashing, but it’s not even good at that).

What you want to do is have a String (or ideally StringBuilder) variable and do proper concatenation:

String group_short = "" + metadata[2].charAt(0);

// and later in the loop:
group_short += metadata[2].charAt(i);

* Due to the complexity of Unicode and Java Strings using UTF-16 this is not entirely accurate as multiple char values can be required to make up a single "character" in the "human language" sense, it’s close enough for this issue.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading