Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I sort a dataframe column alphabetically starting with the letter "l"?

I have a dataframe that I would like to sort alphabetically beginning with the letter "l" (rather than "a").

Here’s my dataframe:

import pandas as pd

data = [['C:/folder/!!file this', 15], ['C:/folder/apple', 14], ['C:/folder/Land file', 10]]

df = pd.DataFrame(data, columns=['Doc', 'Size'])

Here’s what I want my dataframe to look like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data = [['C:/folder/Land file', 10], ['C:/folder/!!file this', 15], ['C:/folder/apple', 14]]

df = pd.DataFrame(data, columns=['Doc', 'Size'])

Here’s what I have so far:

alphabet = """lmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijk!"#$%_'()*+,-./:;<=>?@[\]^&`{|}~"""
    
df = df.sort_values(by=['Doc'], key=lambda x: [
        alphabet.index(c) for c in x[0]])

I get the error code ValueError: substring not found.

I also tried the following, but it doesn’t change the order in the dataset:

def split(word):
    return list(word)


mylist = split(
    """lmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijk!"#$%_'()*+,-./:;<=>?@[\]^&`{|}~""")


alphabetical = pd.Categorical(mylist,
                              ordered=True)

df = df.sort_index(level=alphabetical)
print(df)

>Solution :

The exact logic is unclear, but one option is to use a translation table to reorder the priority of the characters:

# original priority
alphabet  = """!abcdefghijklmnopqrstuvwxyz"""
# new priority
alphabet2 = """lmnopqrstuvwxyz!abcdefghijk"""

t = str.maketrans(alphabet2, alphabet)

df.sort_values('Doc', key=lambda s: s.str.lower().str.translate(t))

Output:

                     Doc  Size
2    C:/folder/Land file    10
0  C:/folder/!!file this    15
1        C:/folder/apple    14

Intermediate translation:

0    r:/uc!stf/ooux!t hwxg
1          r:/uc!stf/pdd!t
2      r:/uc!stf/!pbs ux!t
Name: Doc, dtype: object

If you only want to consider the first letter of the file name as special and keep normal order of the letter for the rest, use numpy.lexsort:

order = np.lexsort([df['Doc'], ~df['Doc'].str.extract('/([^/]+)$', expand=False).str[0].isin(['l', 'L'])])

out = df.iloc[order]

Output:

                     Doc  Size
2    C:/folder/Land file    10
0  C:/folder/!!file this    15
1        C:/folder/apple    14
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading