Format a `Debug` value into a utf-16 string

Suppose I have value of a type that implements Debug and I want to encode the result of formatting the value in utf-16. One way to do this would be to use format! and then convert the str to utf-16:

use std::fmt::Debug;

#[derive(Debug)]
pub struct User {
    name: String,
    some_ids: [u8; 16],
    // more fields, etc. Quite a few of them, actually
}

pub fn display_to_u16(user: &User) -> Vec<u16> {
    let asutf8 = format!("{user:#?}");
    asutf8.encode_utf16().collect()
}

But it seems wasteful to not directly write the result as a utf-16 string – or more precisely a vector of utf-16 codepoints. Is there any way to directly format a value as a utf-16 string?


Note: The real requirements is to work with an abstract dyn Debug type as part of an impl of tracing_subscriber::field::Visit::record_debug. The User type should serve as an example only. It is not feasible to "simply" implement a different serialization scheme. Working with the Debug trait is an integral part of the question.

>Solution :

The format!() macro always returns a String. However, the write!() macro is the generic version that accepts anything that implements std::fmt::Write (or std::io::Write). So if there’s a type that implements one of those and encodes to UTF16 on-the-fly, then that’d be your best bet.

I took a quick look and didn’t find a 3rd-party crate that provides this, but it’s easy enough to implement yourself:

use std::fmt::{Error, Write};

struct Utf16Writer(Vec<u16>);

impl Write for Utf16Writer {
    fn write_str(&mut self, s: &str) -> Result<(), Error> {
        Ok(self.0.extend(s.encode_utf16()))
    }
    
    fn write_char(&mut self, c: char) -> Result<(), Error> {
        Ok(self.0.extend(c.encode_utf16(&mut [0; 2]).iter()))
    }
}

And then you can use it like so:

use std::fmt::{Debug, Write};

pub fn display_to_u16(user: &User) -> Vec<u16> {
    let mut writer = Utf16Writer(Vec::new());
    write!(writer, "{user:#?}").unwrap();
    writer.0
}

You can assess the performance yourself, but I’m skeptical that this will make it faster (though it may use less memory overall). Intermediate calls may need to encode the relevant data into UTF8 (if they aren’t already) in order to pass it as a str so you’re not really saving any encoding steps.

Leave a Reply