Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replacing any sequences of spaces, tabs, newlines etc with single spaces using nom

I am learning parsing with nom right now and There’s a few problems that pop up that I cannot solve myself. One I’d like to ask here.
First: I am not sure, if I actually need to use nom for this, but it seemed the easiest way to detect any combinations of spaces, tabs, newlines and carriage returns. I want to replace any sequences of these characters with only one single space. My kind of unelegant solution looks like this:

enum EmptyOrStr {
    Empty,
    Str(char),

fn replace_multispaces<'a>(i: &str) -> IResult<&str, String> {
    let (rest, tokens) = many0(alt((
        map(multispace1, |_| EmptyOrStr::Empty),
        // i would like to use `not(multispace1)` instead of `anychar`,
        // but then the output type is `()` and I cannot put it into the 
        // `EmptyOrStr::Str( )`-variant
        map(anychar, |s| EmptyOrStr::Str(s)),
    )))(i)?;

    Ok((
        "",
        tokens
            .into_iter()
            .map(|t| match t {
                EmptyOrStr::Str(s) => format!("{s}"), // because s is a `char` not a `&str`
                EmptyOrStr::Empty => " ".to_string(),
            })
            .collect::<String>(),
    ))
}

I didn’t manage to do this with &str. Intuitively I’d guess it would be better to have the second parser inside of alt use something like take_till1(is_multispace1), but that needs a condition, not a parser.
I think there’s just too many things used together for me to understand everything.

Ideally this whole replacing of multispace‘s with single spaces wouldn’t need nom. But The multispace1-function is pretty practical I guess.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Are there any obvious ways I could improve this?

>Solution :

If you just want to split the input up by any number of consecutive whitespace characters, you can use split_ascii_whitspace().

This example function below uses split_ascii_whitespace() in order to split the input up and rejoin it using a single space character.

fn strip_multispace(input: &str) -> String {
    input.split_ascii_whitespace().collect::<Vec<&str>>().join(" ")
}

fn main() {
    let s = "This is    a string with\t multiple\n\r\n white\t\t\r\nspaces";
    println!("{}", strip_multispace(s));
}

This outputs:

This is a string with multiple white spaces
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading