Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to extract user IDs from URLs inside Pandas in Python?

Data frame has 1,050,000 rows.

Input: (a pandas dataframe column)

UserImage
    https://play-lh.googleusercontent.com/a/AItbvmkI4RoZOTFftgRqwJ0QVl-OqLw0PXFRQsQmzPwayQ=mo
    https://play-lh.googleusercontent.com/EGemoI2NTXmTsBVtJqk8jxF9rh8ApRWfsIMQSt2uE4OcpQqbFu7f7NbTK05lx80nuSijCz7sc3a277R67g
    https://play-lh.googleusercontent.com/a-/AFdZucpr-V6JJAWHdTjxYVPa15fmQC7pWl5Xd5StFt1E'

Output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

UserIDs
AItbvmkI4RoZOTFftgRqwJ0QVl-OqLw0PXFRQsQmzPwayQ
EGemoI2NTXmTsBVtJqk8jxF9rh8ApRWfsIMQSt2uE4OcpQqbFu7f7NbTK05lx80nuSijCz7sc3a277R67g
AFdZucpr-V6JJAWHdTjxYVPa15fmQC7pWl5Xd5StFt1E

>Solution :

This looks like a perfect use case for a regex:

df['UserIDs'] = df['UserImage'].str.extract('^.*/([^/=]+)[^/]*$')

Or if you want to keep only alphanum + -:

df['UserIDs'] = df['UserImage'].str.extract('^.*/([-\w]+)[^/]*$')

output:

                                           UserImage  \
0  https://play-lh.googleusercontent.com/a/AItbvm...   
1  https://play-lh.googleusercontent.com/EGemoI2N...   
2  https://play-lh.googleusercontent.com/a-/AFdZu...   

                                             UserIDs  
0     AItbvmkI4RoZOTFftgRqwJ0QVl-OqLw0PXFRQsQmzPwayQ  
1  EGemoI2NTXmTsBVtJqk8jxF9rh8ApRWfsIMQSt2uE4OcpQ...  
2       AFdZucpr-V6JJAWHdTjxYVPa15fmQC7pWl5Xd5StFt1E  

regex demo

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading