Remove duplicates using column value with some ignore condition

I have two columns in my excel file and I want to remove duplicates from ‘A’ column with an ignore condition. The columns are as follow: A B 1 10 1 20 2 30 2 40 3 10 3 20 Now, I want it to turn into this: A B 1 10 2 30 2… Read More Remove duplicates using column value with some ignore condition

How to get a group by with aggregations considering the value of the columns of a dataframe

i have a pandas dataframe like this: id gender column_1 column_2 column_3 column_n 10 male a b a b 10 female b c b c 10 male c c a a 10 male b a a b I want to get this as output: id column_name male_%_a male_%_b male_%_c female_%_a female_%_b female_%_c 10 column_1 33.3… Read More How to get a group by with aggregations considering the value of the columns of a dataframe

How do I sort a dataframe column alphabetically starting with the letter "l"?

I have a dataframe that I would like to sort alphabetically beginning with the letter "l" (rather than "a"). Here’s my dataframe: import pandas as pd data = [[‘C:/folder/!!file this’, 15], [‘C:/folder/apple’, 14], [‘C:/folder/Land file’, 10]] df = pd.DataFrame(data, columns=[‘Doc’, ‘Size’]) Here’s what I want my dataframe to look like: data = [[‘C:/folder/Land file’, 10],… Read More How do I sort a dataframe column alphabetically starting with the letter "l"?

How to merge multiple columns of a dataframe using regex?

I have a df which as following import pandas as pd df = pd.DataFrame( {‘number_C1_E1’: [‘1’, ‘2’, None, None, ‘5’, ‘6’, ‘7’, ‘8’], ‘fruit_C11_E1’: [‘apple’, ‘banana’, None, None, ‘watermelon’, ‘peach’, ‘orange’, ‘lemon’], ‘name_C111_E1’: [‘tom’, ‘jerry’, None, None, ‘paul’, ‘edward’, ‘reggie’, ‘nicholas’], ‘number_C2_E2’: [None, None, ‘3’, None, None, None, None, None], ‘fruit_C22_E2’: [None, None, ‘blueberry’, None,… Read More How to merge multiple columns of a dataframe using regex?

Having the same index values when pivoting a dataframe from long to wide format gives an average value

Context: I’m trying to pivot a long format dataframe to a wide format dataframe, however, I’m noticing a weird pattern on the wide format dataframe. It seems that if we have repeated values for the index (in my case, a date), it’s almost like it’s giving me an average instead of repeating each index value… Read More Having the same index values when pivoting a dataframe from long to wide format gives an average value

ValueError: too many values to unpack when using apply to return multiple values

I am using apply function to return 2 new columns, and then I got an error, not sure what is wrong? Thanks for your help. def calc_test(row): a=row[‘col1’]+row[‘col2’] b=row[‘col1’]/row[‘col2’] return (a,b) df_test_dict={‘col1′:[1,2,3,4,5],’col2’:[10,20,30,40,50]} df_test=pd.DataFrame(df_test_dict) df_test col1 col2 0 1 10 1 2 20 2 3 30 3 4 40 4 5 50 df_test[‘a’],df_test[‘b’]=df_test.apply(lambda row:calc_test(row),axis=1) df_test —————————————————————————… Read More ValueError: too many values to unpack when using apply to return multiple values

Reorder dataframe groupby medians following custom order

I have a dataset containing a bunch of data in the columns params and value. I’d like to count how many values each params contains (to use as labels in a boxplot), so I use mydf[‘params’].value_counts() to show this: slidingwindow_250 11574 hotspots_1k_100 8454 slidingwindow_500 5793 slidingwindow_100 5366 hotspots_5k_500 3118 slidingwindow_1000 2898 hotspots_10k_1k 1772 slidingwindow_2500 1160… Read More Reorder dataframe groupby medians following custom order