R: Calculate percentage of observations in a column that are below a certain value for panel data

I have panel data and I would like to get the percentage of observations in a column (Size) that are below 1 million. My data is the following: structure(list(Product = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C"), Date = c("02.05.2018", "04.05.2018", "05.05.2018", "06.05.2018", "07.05.2018",… Read More R: Calculate percentage of observations in a column that are below a certain value for panel data

Check for duplicate rows for a subset of columns in a Pandas DataFrameGroupBy object

Suppose I have a groupby object (grouped on Col1) like below: Col1 Col2 Col3 Col4 Col5 —————————————- AAA 001 456 846 239 row1 002 374 238 904 row2 003 456 846 239 row3 BBB 001 923 222 398 row1 002 923 222 398 row2 003 755 656 949 row3 CCC 001 324 454 565 row1… Read More Check for duplicate rows for a subset of columns in a Pandas DataFrameGroupBy object

Pivot_wider: Combine Duplicate Observations AND Create New Variable Columns for Those Values

I’m new to R and have scoured the site to find a solution – I’ve found lots of similar, but slightly different questions. I’m stumped. I have a dataset in this structure: SURVEY_ID CHILD_NAME CHILD_AGE Survey1 Billy 4 Survey2 Claude 12 Survey2 Maude 6 Survey2 Constance 3 Survey3 George 22 Survey4 Marjoram 14 Survey4 LeBron… Read More Pivot_wider: Combine Duplicate Observations AND Create New Variable Columns for Those Values

rank for nan values based on group

I have dataframe with column d1 and now i am trying calculate ‘out’ column after ranking that column when there in ‘nan’ value with in a column. data_input = {‘Name’:[‘Renault’, ‘Renault’, ‘Renault’, ‘Renault’,’Renault’,’Renault’,’Renault’,’Renault’,’Renault’,’Renault’,’Renault’,’Renault’,’Renault’,’Renault’], ‘type’:[‘Duster’, ‘Duster’, ‘Duster’,’Duster’,’Duster’,’Duster’,’Duster’,’Triber’,’Triber’,’Triber’,’Triber’,’Triber’,’Triber’,’Triber’], ‘d1’:[‘nan’,’10’,’10’,’10’,’nan’,’nan’,’20’,’20’,’nan’,’nan’,’30’,’30’,’30’,’nan’]} df_input = pd.DataFrame(data_input) data_out = {‘Name’:[‘Renault’, ‘Renault’, ‘Renault’, ‘Renault’,’Renault’,’Renault’,’Renault’,’Renault’,’Renault’,’Renault’,’Renault’,’Renault’,’Renault’,’Renault’], ‘type’:[‘Duster’, ‘Duster’, ‘Duster’,’Duster’,’Duster’,’Duster’,’Duster’,’Triber’,’Triber’,’Triber’,’Triber’,’Triber’,’Triber’,’Triber’], ‘d1’:[‘nan’,’10’,’10’,’10’,’nan’,’nan’,’20’,’20’,’nan’,’nan’,’30’,’30’,’30’,’nan’], ‘out’:[1,np.NaN,np.NaN,np.NaN,2,2,np.NaN,np.NaN,1,1,np.NaN,np.NaN,np.NaN,2]} df_out = pd.DataFrame(data_out) If… Read More rank for nan values based on group