Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Back-ticks in DataFrame.colRegex?

For PySpark, I find back-ticks enclosing regular expressions for
DataFrame.colRegex()
here,
here,
and in this SO
question
. Here is the
example from the DataFrame.colRegex doc string:

df = spark.createDataFrame([("a", 1), ("b", 2), ("c",  3)], ["Col1", "Col2"])
df.select(df.colRegex("`(Col1)?+.+`")).show()
+----+
|Col2|
+----+
|   1|
|   2|
|   3|
+----+

The answer to the SO question
doesn’t show back-ticks for Scala. It refers to the Java
documentation for the Pattern
class
,
but that doesn’t explain back-ticks.

This page
indicates the use of back-ticks in Python to represent the string
representation of the adorned variable, but that doesn’t apply
to a regular expression.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What is the explanation for the back-ticks?

>Solution :

The back-ticks are used to delimit the column name in case it includes special characters. For example, if you had a column called column-1 and you try

SELECT column-1 FROM mytable

You will probably get a

non-existent column ‘column’

error as the interpreter will treat that as SELECT (column) - 1 FROM mytable. Instead, you can delimit the column name with back-ticks to get around that issue:

SELECT `column-1` FROM mytable
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading