efficient way of removing leading and trailing whitepaces around hashes in text strings and then splitting the text into multiple columns of pandas

Suppose the pandas dataframe contains the following: import pandas as pd df = pd.DataFrame({‘text’: [‘ABC – XYZ- Some Text’, ‘DEF- XYZ -sometext’, ‘GHI -XYZ – sometext’, ‘JKL-XYZ- sometext’, ‘MNO1- XYZ- some text’, ‘MNO2 – XYZ – some text’, ‘MNO3 – XYZ-some text’, ‘MNO4-XYZ -some text’, ‘MNO5- XYZ-sometext -someother text’, ‘MNO6 -XYZ -sometext-someother text’]}) All I… Read More efficient way of removing leading and trailing whitepaces around hashes in text strings and then splitting the text into multiple columns of pandas

Split multiple columns into multiple columns, pandas

I have a dataframe df = pd.DataFrame({‘≤8’: {1: ‘3687 55.5’, 2: ‘838 66.5’, 3: ‘8905 66.9’}, ‘9–13’: {1: ‘2234 33.6’, 2: ‘419 33.3’, 3: ‘3362 25.2′}, ’14–15’: {1: ‘290 4.4’, 2: nan, 3: ‘473 3.6′}, ’16–17’: {1: ‘194 2.9’, 2: nan, 3: ‘252 1.9′}, ’18–20’: {1: ‘185 2.8’, 2: nan, 3: ‘184 1.4’}, ‘≥21’: {1:… Read More Split multiple columns into multiple columns, pandas

Get the most frequent value of several variables

I am trying to get the most frequent value for each variable in a dataset in python. For example, I want to know the most frequent preferred color for a person per city. data = {‘Name’:[‘Tom’, ‘nick’, ‘krish’, ‘jack’, ‘John’, ‘Bettany’, ‘Leo’, ‘Aubrie’, ‘Martha’, ‘Grant’], ‘Age’:[20, 21, 19, 18,24,25,26,26,27, 25], ‘Prefered color’:[‘green’, ‘green’, ‘red’, ‘blue’,… Read More Get the most frequent value of several variables

Pandas groupby filter only last two rows

I am working on pandas manipulation and want to select only the last two rows for each column "B". How to do without reset_index and filter (do inside groupby) import pandas as pd df = pd.DataFrame({ ‘A’: list(‘aaabbbbcccc’), ‘B’: [0,1,2,5,7,2,1,4,1,0,2], ‘V’: range(10,120,10) }) df My attempt df.groupby([‘A’,’B’])[‘V’].sum() Required output A B a 1 20 2… Read More Pandas groupby filter only last two rows

How to find and calculate common letters between words in pandas

I have a dataset with some words in it and I want to compare 2 columns and count common letters between them. For e.g I have: data = {‘Col_1’ : [‘Heaven’, ‘Jako’, ‘Sm’, ‘apizza’], ‘Col_2’ : [‘Heaven’, ‘Jakob’, ‘Smart’, ‘pizza’]} df = pd.DataFrame(data) | Col_1 | Col_2 | ——————- | Heaven | Heaven | |… Read More How to find and calculate common letters between words in pandas

IOB format merge

I have a dataframe in IOB format as below:- Name Label Alan B-PERSON Smith I-PERSON is O Alice’s B-PERSON uncle O from O New B-LOCATION York I-LOCATION city I-LOCATION I would like to convert into a new dataframe as below:- Name Label Alan Smith PERSON Alice’s PERSON New York city LOCATION Any help is much… Read More IOB format merge