How to duplicate specific rows but changing the value in one column by splitting by the comma separated values of an original cell in R

Advertisements To simplify this problem I’ll use a very basic subset of what the dataset might look like: library(dplyr) DF <- tibble(id = seq(1:4), label = c("A", "B", "C", "D"), val = c(NA, "5, 10", "20", "6, 7, 8")) DF # A tibble: 4 × 3 # id label val # <int> <chr> <chr> #… Read More How to duplicate specific rows but changing the value in one column by splitting by the comma separated values of an original cell in R

Modify a column in Python such that the numbering is continuous

Advertisements I have a dataset given as such: #Load the required libraries import pandas as pd #Create dataset data = {‘team’: [‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’], ‘Run_time’: [1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 4], ‘Married’: [‘No’, ‘Yes’, ‘Yes’, ‘Yes’, ‘No’, ‘Yes’, ‘Yes’, ‘Yes’,… Read More Modify a column in Python such that the numbering is continuous

Create a column based on a value from another columns values on pandas

Advertisements I’m new with python and pandas and I’m struggling with a problem Here is a dataset data = {‘col1’: [‘a’,’b’,’a’,’c’], ‘col2′: [None,None,’a’,None], ‘col3′: [None,’a’,None,’b’], ‘col4’: [‘a’,None,’b’,None], ‘col5’: [‘b’,’c’,’c’,None]} df = pd.DataFrame(data) I need to create 3 columns based on the unique values of col1 to col4 and whenever the col1 or col2 or col3… Read More Create a column based on a value from another columns values on pandas

Dataset splitting with pandas sample and drop does not work as expected

Advertisements I have a train dataset with 4,000 examples, I want to split it randomly into 2 equal sub-datasets with 2,000 in each of them. As suggested here i used the split and drop methods like so: I1 = train_df.sample(frac=0.5, random_state=opts.seed) I2 = train_df.drop(index=I1.index) However it seems like it drops more indices for no apparent… Read More Dataset splitting with pandas sample and drop does not work as expected

how to create a dictionary from a dataframe where the keys are column names and values are the number of values under each column?

Advertisements I have a data frame whose dimensions are (356027, 163). I want to create a dictionary from the data frame where i will have 163 keys with values which are the number of entries in them(i.e. number of non null entries) I tried using the to_dict() operation but couldn’t insert the values as the… Read More how to create a dictionary from a dataframe where the keys are column names and values are the number of values under each column?