Making a row_number column in a query for a combination of columns

Advertisements

I´m using pandasql library in Python to make an special query of DataFrames, I trying to make a new row number column from the combination of some columns. I have the next example:

tabla_1= pd.DataFrame(['a','b','c'],columns=['letras']) 
tabla_2= pd.DataFrame([1,2],columns=['numeros']) 
tabla_3= pd.DataFrame(['rojo','verde'],columns=['colores'])  
pysqldf = lambda q: sqldf(q, globals())  

# Here is my query that doesn't work
q = '''
SELECT 
 letras, 
 numeros, 
 colores,  
 ROW_NUMBER() OVER (PARTITION BY (letras||numeros))   
FROM 
 tabla_1 CROSS JOIN tabla_2 CROSS JOIN tabla_3
'''
pysqldf(q)

The table without the row number looks like this:

And the table with the new row number should look like this:

Row number should create a different number depending on the combinatio of the columns ‘letras’ and ‘numeros’.

I appreciate the help!

>Solution :

No need to use pandasql, you can easily do this using builtin pandas methods:

# Do a cross join
df = tabla_1.merge(tabla_2, how='cross').merge(tabla_3, how='cross')

# Do a groupby/partition and use ngroup to assign group numbers
df['rownum'] = df.groupby(['letras', 'numeros']).ngroup() + 1

Result

   letras  numeros colores  rownum
0       a        1    rojo       1
1       a        1   verde       1
2       a        2    rojo       2
3       a        2   verde       2
4       b        1    rojo       3
5       b        1   verde       3
6       b        2    rojo       4
7       b        2   verde       4
8       c        1    rojo       5
9       c        1   verde       5
10      c        2    rojo       6
11      c        2   verde       6

Leave a ReplyCancel reply