Home How would you go about finding longest string per row in a data frame?

Questions

How would you go about finding longest string per row in a data frame?

November 23, 2022

I am writing a piece of code which allows me to open a CSV file and remove nan rows and also find strings that are too long in the data frame. I want the program to say what row the length of data exceeds the 30-character limit and give you an option to exit or skip.

I previously had it set up so it would go by columns instead, however im finding it difficult to locate the string when its set up like this.

for column in df:
        print(column,"->", df[column].astype(str).str.len().max())
        if df[column].astype(str).str.len().max() > 30 and column != ('Column 17'):
            print ("ERROR: Length of data exceeds 30 character limit")
            abill=int(input("1.Continue through file.\n2.Exit\n"))
            if abill==1:
                continue
            else:
                sys.exit()
        else:
            continue

This is my code at the moment.

>Solution :

I would recommend not to use a loop, but rather to vectorize.

So, you want to identify the strings longer than a threshold, except for excluded columns?

Assuming this example:

        col1        col2              col3
0        abc           A  this_is_excluded
1  defghijkl       BCDEF          excluded
2       mnop  GHIJKLMNOP          excluded

If you want to mask the long strings:

exclude = ['col3']  # do not consider the columns in this list
threshold = 9       # consider strings longer or equal to threshold

mask = (df.drop(columns=exclude, errors='ignore')
          .apply(lambda s: s.str.len().ge(threshold))
          .reindex(columns=df.columns, fill_value=False)
       )

out = df.mask(mask, '') # mask with empty string

Output:

   col1   col2              col3
0   abc      A  this_is_excluded
1        BCDEF          excluded
2  mnop                 excluded

If you want to drop the rows with long strings:

exclude = ['col3']
threshold = 9

mask = (df.drop(columns=exclude, errors='ignore')
          .apply(lambda s: s.str.len().ge(threshold))
        )

out = df.loc[~mask.any(axis=1)]

Output:

  col1 col2              col3
0  abc    A  this_is_excluded

If you want to drop the columns with at least one too long string:

exclude = ['col3']
threshold = 9

mask = (df.drop(columns=exclude, errors='ignore')
          .agg(lambda s: s.str.len().ge(threshold).any())
       )

out = df.drop(columns=mask[mask].index)

Output:

               col3
0  this_is_excluded
1          excluded
2          excluded

dataframe

byMR

Published November 23, 2022

Add a comment

The space between the items in the Bottom Navigation Bar

byMR

November 23, 2022

Questions

Cannot initialise wifi_config.ap.ssid with const char * ssid

byMR

November 23, 2022

Questions

I need to get the id for updating a field in angular

byMR

November 23, 2022

Questions

Why I got A RenderFlex overflowed by 8.0 pixels on the right With flutter?

byMR

November 23, 2022

Questions

While loop with logical operators – stuck in loop

byMR

November 23, 2022

Questions

How can you group a data frame and reshape from long to wide?

byMR

November 23, 2022

How would you go about finding longest string per row in a data frame?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

The space between the items in the Bottom Navigation Bar

Cannot initialise wifi_config.ap.ssid with const char * ssid

I need to get the id for updating a field in angular

Why I got A RenderFlex overflowed by 8.0 pixels on the right With flutter?

While loop with logical operators – stuck in loop

How can you group a data frame and reshape from long to wide?

Keep Up to Date with the Most Important News

How would you go about finding longest string per row in a data frame?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

The space between the items in the Bottom Navigation Bar

Cannot initialise wifi_config.ap.ssid with const char * ssid

I need to get the id for updating a field in angular

Why I got A RenderFlex overflowed by 8.0 pixels on the right With flutter?

While loop with logical operators – stuck in loop

How can you group a data frame and reshape from long to wide?

Discover more from Dev solutions