How can I label a column of strings into numbered groups based on another column containing substrings?

I have the 1st column that is around 4920 different chemical compounds. For example: 0 Ag(AuS)2 1 Ag(W3Br7)2 2 Ag0.5Ge1Pb1.75S4 3 Ag0.5Ge1Pb1.75Se4 4 Ag2BBr … … 4916 ZrTaN3 4917 ZrTe 4918 ZrTi2O 4919 ZrTiF6 4920 ZrW2 I have the 2nd column that has all the elements of the periodic table numerically listed atomic number 0… Read More How can I label a column of strings into numbered groups based on another column containing substrings?

Adding greek symbol and superscript to ggplot axis text (tickmarks)

I am trying to get the stable oxygen isotope symbol into the axis text (tick mark label) in ggplot. Example data df <- data.frame(author = c("one", "two", "three"), d18O = c("D", "D", "U"), Mg = c("I", "D", "D"), `Drip Rate` = c("U", "I", "I")) %>% pivot_longer(-c(author)) Exmample plot df %>% ggplot(aes(x = name, fill =… Read More Adding greek symbol and superscript to ggplot axis text (tickmarks)

Identifying the categorical columns of a dataframe

I am trying to identify the categorical columns of a dataset so that I can convert them to numerical columns. I have looked at this, this, and this, among others but I still seem to be doing something wrong. EDITED My code: import pandas as pd from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from… Read More Identifying the categorical columns of a dataframe

Change certain categorical variables to a unified entry

Let’s say I have have a dataframe with a column called animals. The entries look as followed: ‘A’, ‘A’, ‘A’, ‘B’, ‘B’, ‘B’, ‘C’, ‘C’, ‘E’, ‘F’, ‘G’, ‘H’, ‘I’. I want to change the entries ‘E’, ‘F’, ‘G’, ‘H’ and ‘I’ to another unified entry called ‘D’. What is the best way to transform… Read More Change certain categorical variables to a unified entry

Grouping a pandas dataframe with categorical strings

I have the following df df = pd.DataFrame({‘Cat’:[‘tq’,’tb’,’ta’,’tb’,’ta’,’tq’,’tb’,’tq’,’ta’], ‘col1’:[‘a’,’a’,’a’,’b’,’b’,’c’,’c’,’c’,’a’], ‘col2’:[‘aa’,’aa’,’aa’,’aa’,’ba’,’ba’,’cc’,’cc’,’cc’], ‘val’:np.random.rand(9)}) I would like to create the following rankings: df[‘Cat’] = pd.Categorical(df[‘Cat’],[‘tb’,’tq’,’ta’]) However, when I try to do a group by sum: df2 = df.groupby([‘col1′,’Cat’,’col2′])[‘val’].sum() I end up with a 27 row table instead of the desired 8 rows that would occur where I to… Read More Grouping a pandas dataframe with categorical strings