Home How to clean a string from non-alphanumeric characters, but keep certain ones?

Questions

How to clean a string from non-alphanumeric characters, but keep certain ones?

January 31, 2022

I have a string that has non-alphanumeric characters, this string contains English and non English alphabets. I need to clean the string from non-alphanumeric characters, but I want to keep some of them. For instance: Let’s say that I want to keep comma and colon only.

Example:
String st = "I, Love: ( Coding {}+-), codificación"

I want the output to be "I,Love:Coding,codificación"

Is there a regex that can do that?

Note the method below will clean the text from all non-alphanumeric characters.

public static String cleanText(String text) {
     return text.replaceAll("\\P{LD}+", "");
}

>Solution :

You can use

public static String cleanText(String text) {
    return text.replaceAll("[^\\p{L}\\p{N}:,]+", "");
    // or return text.replaceAll("[^\\p{LD}:,]+", "");
}

Details:

[^ – start of a negated character class
- \p{L} – any Unicode letter
- \p{N} – any digit
- : – a colon
- , – a comma
]+ – end of the character class, repeat one or more times.

See the regex demo. See a Java demo:

import java.util.*;
import java.io.*;

class Test
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String st = "I, Love: ( Coding {}+-), codificación";
        System.out.println(cleanText(st));

    }
    public static String cleanText(String text) {
        return text.replaceAll("[^\\p{L}\\p{N}:,]+", "");
    }
}
// => I,Love:Coding,codificación