I have a string that has non-alphanumeric characters, this string contains English and non English alphabets. I need to clean the string from non-alphanumeric characters, but I want to keep some of them. For instance: Let’s say that I want to keep comma and colon only.
Example:
String st = "I, Love: ( Coding {}+-), codificación"
I want the output to be "I,Love:Coding,codificación"
Is there a regex that can do that?
Note the method below will clean the text from all non-alphanumeric characters.
public static String cleanText(String text) {
return text.replaceAll("\\P{LD}+", "");
}
>Solution :
You can use
public static String cleanText(String text) {
return text.replaceAll("[^\\p{L}\\p{N}:,]+", "");
// or return text.replaceAll("[^\\p{LD}:,]+", "");
}
Details:
[^– start of a negated character class\p{L}– any Unicode letter\p{N}– any digit:– a colon,– a comma
]+– end of the character class, repeat one or more times.
See the regex demo. See a Java demo:
import java.util.*;
import java.io.*;
class Test
{
public static void main (String[] args) throws java.lang.Exception
{
String st = "I, Love: ( Coding {}+-), codificación";
System.out.println(cleanText(st));
}
public static String cleanText(String text) {
return text.replaceAll("[^\\p{L}\\p{N}:,]+", "");
}
}
// => I,Love:Coding,codificación