Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to convert single-byte charset (non-ASCII) ByteArray into Kotlin UTF8 String (How to avoid ��?)

I have API that produces results in specific single-byte charset (WIN 1257) and I am reading this result in Kotlin as:

val connection = URL("http://192.168.1.21:92/someAPI").openConnection() as HttpURLConnection
var byteArray: ByteArray = ByteArray(10000000)
connection.inputStream.read(byteArray)
val tmp = String(byteArray, Charsets.UTF_8).trim()

Of course, this is clearly incorrect code, because it presumes that byteArray is the representation of the string that is encoded in UTF-8. It may be desirable to correct this code by using Charsets.WIN_1257, but there is no such option in Kotlin. My byte array is the representation of the string that is WIN-1257 encoded – how can I get UTF-8 string?

Here is simple test code that isolates my problem and that can be run in https://play.kotlinlang.org:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

/**
 * You can edit, run, and share this code.
 * play.kotlinlang.org
 */
fun main() {
    var byteArray: ByteArray = listOf(0xe2, 0x72).map { it.toByte() }.toByteArray()
    println(String(byteArray, Charsets.UTF_8))
}

On can se that UTF_8 produces the result:

�r

But I expect:

ār

>Solution :

Look into Charset.availableCharsets; just Charset.forName("Windows-1257") might work.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading