Follow

Follow

Contact

Home Converting a List into a formated pandas Dataframe

Questions

Converting a List into a formated pandas Dataframe

byMR

August 4, 2023

I am trying to scrape the top 100 cities in the US from the following website:

**https://www.nationalpopularvote.com/100-biggest-cities-have-59849899-people-and-rural-areas-have-59492267-people
**

I have gotten the data from the website and converted into a list:

#Note this is not the full list because the full list is too long
List = ['City   Population', '1   New York, NY   8,175,133   \xa0Biggest city is 2.6% of U.S. population', '2   Los Angeles, CA   3,792,621   \xa0Top 2 cities are 3.8% of U.S. population', '3   Chicago, IL   2,695,598   \xa0Top 3 cities are 4.7% of U.S. population]

I am trying to convert that into an organized dataframe.

I have already tried this:

df = pd.DataFrame(List)

print(df)

I wish this could have been that simple but it returns the following:

    0                                    City   Population
    1    1   New York, NY   8,175,133    Biggest city i...
    2    2   Los Angeles, CA   3,792,621    Top 2 citie...
    3    3   Chicago, IL   2,695,598    Top 3 cities ar...
    4    4   Houston, TX   2,099,451    Top 4 cities ar...
    ..                                                 ...
    97   97   Birmingham, AL   212,237
    98   98   Rochester, NY   210,565
    99   99   San Bernadino, CA   209,924
    100  100   Spokane, WA   208,916   Top 100 cities a...
    101                                 Total   59,849,899
[102 rows x 1 columns]

The problem is, its not actually ‘orginized’; I cant do print(df['City']). I want this:

    0   RANK CITY           POPULATION                    
    1    1   New York, NY   8,175,133    
    2    2   Los Angeles, CA   3,792,621    
    3    3   Chicago, IL   2,695,598    
    4    4   Houston, TX   2,099,451    
    ...............................                                    
    97   97   Birmingham, AL   212,237
    98   98   Rochester, NY   210,565
    99   99   San Bernadino, CA   209,924
    100  100   Spokane, WA   208,916  

[101 rows x 3 columns]

Can someone help me with this?

>Solution :

You can do this in a one-liner:

Use pd.read_html with the header and index_col parameters both set to 0. The result will be a list of dfs. In this case with only one df, so we select the first element ([0]).
Use df.reset_index to reset the index, since the original index values will have been turned into floats on account of the NaN value in the final row.

import pandas as pd

url = 'https://www.nationalpopularvote.com/100-biggest-cities-have-59849899-people-and-rural-areas-have-59492267-people'

df = pd.read_html(url, header=0, index_col=0)[0].reset_index(drop=True)

df.head()

               City  Population                                Unnamed: 3
0      New York, NY     8175133   Biggest city is 2.6% of U.S. population
1   Los Angeles, CA     3792621  Top 2 cities are 3.8% of U.S. population
2       Chicago, IL     2695598  Top 3 cities are 4.7% of U.S. population
3       Houston, TX     2099451  Top 4 cities are 5.4% of U.S. population
4  Philadelphia, PA     1526006  Top 5 cities are 5.9% of U.S. population

# use `df.rename` to change the name of the 3rd (nameless) column)
df = df.rename(columns={'Unnamed: 3': 'Comment'})

python-requests

byMR

Published August 04, 2023

Add a comment

Leave a ReplyCancel reply

Read more

Questions

Fix range of matplotlib histogram with multiple datasets?

byMR

August 4, 2023

Questions

How can I (compliantly) read from a tmpfile()?

byMR

August 4, 2023

Questions

Random elements showing up on the array in c

byMR

August 4, 2023

Questions

An example of how pytorch clip_grad_norm_ works

byMR

August 4, 2023

Questions

Subtyping in Java results in unexpected output

byMR

August 4, 2023

Questions

Tried to export file as PDF but had error Named argument not found excel VBA

byMR

August 4, 2023