Easy way to do group by with multiple output in pandas

I am a long time SAS/SQL user and have always defaulted to using SQL for my groupbys for example to do select region ,case when age < 5 then ‘Low’ when age >= 5 and age <= 10 then ‘Middle’ else ‘High’ as duration ,sum(1) as total ,sum(profit) as profit ,sum(profit)/sum(1) as avg_profit ,max(revenue) as… Read More Easy way to do group by with multiple output in pandas

How to select the scala dataframe column with special character in it?

I am reading a json file where the key is having come special character. E.g [{ "ABB/aws:1.0/CustomerId:2.0": [{ "id": 20, "namehash": "de8cfcde-95c5-47ac-a544-13db50557eaa" }] }] I am creating a scala dataframe and then trying to select the column using spark.sql "ABB/aws:1.0/CustomerId:2.0". Thats when its complaining about special character. dataframe looks like this >Solution : Use backtick… Read More How to select the scala dataframe column with special character in it?

Back-ticks in DataFrame.colRegex?

For PySpark, I find back-ticks enclosing regular expressions for DataFrame.colRegex() here, here, and in this SO question. Here is the example from the DataFrame.colRegex doc string: df = spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["Col1", "Col2"]) df.select(df.colRegex("`(Col1)?+.+`")).show() +—-+ |Col2| +—-+ | 1| | 2| | 3| +—-+ The answer to the SO question doesn’t show… Read More Back-ticks in DataFrame.colRegex?

map columns of two dataframes based on array intersection of their individual columns and based on highest common element match Pyspark/Pandas

I have a dataframe df1 like this: A B AA [a,b,c,d] BB [a,f,g,c] CC [a,b,l,m] And another one as df2 like: C D XX [a,b,c,n] YY [a,m,r,s] UU [e,h,I,j] I want to find out and map column C of df2 with column A of df1 based on the highest element match between the items of… Read More map columns of two dataframes based on array intersection of their individual columns and based on highest common element match Pyspark/Pandas