Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to update multiple columns of given dataframe with new corresponding scraped data?

1.) I have initial code below that scrapes various stock data from Finviz and gives the output below titled First Output. The problem comes up because I need to add additional information from a different, snapshot table. This information is Sales and Income and I’ve inserted columns for these fields.

2.) I added additional code under additional code below to pull in the Sales and Income information from the snapshot table.

This line in the additional code finds the first symbol AAPL in the dataframe.
ticker = (df.iloc[0,1]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This line finds the AAPL sales and income information in the snapshot table

print(f'{df[0][3]}: {df[1][3]}: {df[0][2]}: {df[1][2]}')

The Output after adding additional code is shown below and now includes Sales and Income information for AAPL at the bottom of the output.

3.)So my question is how do I get the Sales and Income data to go to their proper columns instead of showing up at the bottom of the output. The result I’m hoping to achieve is below under Desired Output. Does anyone know how I could accomplish this? Thank you so much!!

**Initial Code**
    import pandas as pd
    import requests
    import bs4
    import time
    import random
    
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
    
    def testDf(version):
        url = 'https://finviz.com/screener.ashx?v=111&f=cap_mega,sh_outstanding_o1000&f=ind_stocksonly&o=-marketcap'
        
        page = 1
    
        screen = requests.get(url.format(version=version, page=page), headers=headers)
        soup = bs4.BeautifulSoup(screen.text, features='lxml')
        pages = int(soup.find_all('a', {'class': 'screener-pages'})[-1].text)
    
        data = []
        for page in range(1, 1 * pages, 20):
            print(version, page)
            screen = requests.get(url.format(version=version, page=page), headers=headers).text
            tables = pd.read_html(screen)
            tables = tables[-2]
            tables.columns = tables.iloc[0]
            tables = tables[1:]
            data.append(tables)
            time.sleep(random.random())
        return pd.concat(data).reset_index(drop=True).rename_axis(columns=None)
    
    
    df = testDf('152').copy()
    df.insert(2,'Sales',"")
    df.insert(3,'Income',"")
    
    # The newly converted dataframe is in the df variable
    print(df) 

Note: I have inserted Sales and Income Fields for later on.

First Output

         No. Ticker   Sales  Income  ...     P/E    Price  Change    Volume
        0    1   AAPL                ...   30.71   172.17   0.10%  84987917
        1    2   MSFT                ...   35.10   314.04   0.05%  32609473
        2    3   TSLA                ...  333.21  1026.96  -3.54%  27835827
        3    4     FB                ...   23.72   331.79  -0.20%  1446342
    [4 rows x 13 columns]

        
        

Additional Code

    ticker = (df.iloc[0,1])
    #---------- Pulling the data from chosen stock ticker ----------#
    
    url = ('https://finviz.com/quote.ashx?t=' + ticker.upper())
    
    req = requests.get(url,headers=headers)
    table = pd.read_html(req.text, attrs = {"class":"snapshot-table2"} )
    df = table[0]
    
    print(f'{df[0][3]}: {df[1][3]}: {df[0][2]}: {df[1][2]}')

Output after adding additional code above

            No. Ticker      Sales  Income  ...     P/E    Price  Change    Volume
            0    1   AAPL                  ...   30.71   172.17   0.10%  84987917
            1    2   MSFT                  ...   35.10   314.04   0.05%  32609473
            2    3   TSLA                  ...  333.21  1026.96  -3.54%  27835827
            3    4     FB                  ...   23.72   331.79  -0.20%  1446342
        [4 rows x 13 columns]
    
        Sales: 365.82B: Income: 94.68B
            

Desired Output

            No. Ticker   Sales  Income   ...     P/E    Price  Change    Volume
            0    1   AAPL 365.82B 94.68M  ...   30.71   172.17   0.10%  84987917
            1    2   MSFT                 ...   35.10   314.04   0.05%  32609473
            2    3   TSLA                 ...  333.21  1026.96  -3.54%  27835827
            3    4     FB                 ...   23.72   331.79  -0.20%  1446342
        [4 rows x 13 columns]
        

>Solution :

How to achieve?

Put your additional code into a def and use zip() in combination with map() to update your columns:

df['Sales'],df['Income'] = zip(*df['Ticker'].map(lambda x: get_ticker(x)))

Example

def get_ticker(ticker):
    #---------- Pulling the data from chosen stock ticker ----------#
    
    url = ('https://finviz.com/quote.ashx?t=' + ticker.upper())
    
    req = requests.get(url,headers=headers)
    table = pd.read_html(req.text, attrs = {"class":"snapshot-table2"} )
    df = table[0]
    return(df[1][3],df[1][2])


df['Sales'],df['Income'] = zip(*df['Ticker'].map(lambda x: get_ticker(x)))

Output

No. Ticker Sales Income Company Sector Industry Country Market Cap P/E Price Change Volume
1 AAPL 365.82B 94.68B Apple Inc. Technology Consumer Electronics USA 2821.90B 30.71 172.17 0.10% 84987917
2 MSFT 176.25B 67.88B Microsoft Corporation Technology Software – Infrastructure USA 2356.61B 35.10 314.04 0.05% 32609473
3 TSLA 46.85B 3.47B Tesla, Inc. Consumer Cyclical Auto Manufacturers USA 1031.33B 333.21 1026.96 -3.54% 27835827
4 FB 112.33B 40.30B Meta Platforms, Inc. Communication Services Internet Content & Information USA 924.82B 23.72 331.79 -0.20% 14463421
5 BRK-B 268.68B Berkshire Hathaway Inc. Financial Insurance – Diversified USA 777.07B 319.78 2.09% 5540310
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading