Home How to update multiple columns of given dataframe with new corresponding scraped data?

Questions

How to update multiple columns of given dataframe with new corresponding scraped data?

January 9, 2022

1.) I have initial code below that scrapes various stock data from Finviz and gives the output below titled First Output. The problem comes up because I need to add additional information from a different, snapshot table. This information is Sales and Income and I’ve inserted columns for these fields.

2.) I added additional code under additional code below to pull in the Sales and Income information from the snapshot table.

This line in the additional code finds the first symbol AAPL in the dataframe.
ticker = (df.iloc[0,1]

This line finds the AAPL sales and income information in the snapshot table

print(f'{df[0][3]}: {df[1][3]}: {df[0][2]}: {df[1][2]}')

The Output after adding additional code is shown below and now includes Sales and Income information for AAPL at the bottom of the output.

3.)So my question is how do I get the Sales and Income data to go to their proper columns instead of showing up at the bottom of the output. The result I’m hoping to achieve is below under Desired Output. Does anyone know how I could accomplish this? Thank you so much!!

**Initial Code**
    import pandas as pd
    import requests
    import bs4
    import time
    import random
    
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
    
    def testDf(version):
        url = 'https://finviz.com/screener.ashx?v=111&f=cap_mega,sh_outstanding_o1000&f=ind_stocksonly&o=-marketcap'
        
        page = 1
    
        screen = requests.get(url.format(version=version, page=page), headers=headers)
        soup = bs4.BeautifulSoup(screen.text, features='lxml')
        pages = int(soup.find_all('a', {'class': 'screener-pages'})[-1].text)
    
        data = []
        for page in range(1, 1 * pages, 20):
            print(version, page)
            screen = requests.get(url.format(version=version, page=page), headers=headers).text
            tables = pd.read_html(screen)
            tables = tables[-2]
            tables.columns = tables.iloc[0]
            tables = tables[1:]
            data.append(tables)
            time.sleep(random.random())
        return pd.concat(data).reset_index(drop=True).rename_axis(columns=None)
    
    
    df = testDf('152').copy()
    df.insert(2,'Sales',"")
    df.insert(3,'Income',"")
    
    # The newly converted dataframe is in the df variable
    print(df)

Note: I have inserted Sales and Income Fields for later on.

First Output

         No. Ticker   Sales  Income  ...     P/E    Price  Change    Volume
        0    1   AAPL                ...   30.71   172.17   0.10%  84987917
        1    2   MSFT                ...   35.10   314.04   0.05%  32609473
        2    3   TSLA                ...  333.21  1026.96  -3.54%  27835827
        3    4     FB                ...   23.72   331.79  -0.20%  1446342
    [4 rows x 13 columns]

Additional Code

    ticker = (df.iloc[0,1])
    #---------- Pulling the data from chosen stock ticker ----------#
    
    url = ('https://finviz.com/quote.ashx?t=' + ticker.upper())
    
    req = requests.get(url,headers=headers)
    table = pd.read_html(req.text, attrs = {"class":"snapshot-table2"} )
    df = table[0]
    
    print(f'{df[0][3]}: {df[1][3]}: {df[0][2]}: {df[1][2]}')

Output after adding additional code above

            No. Ticker      Sales  Income  ...     P/E    Price  Change    Volume
            0    1   AAPL                  ...   30.71   172.17   0.10%  84987917
            1    2   MSFT                  ...   35.10   314.04   0.05%  32609473
            2    3   TSLA                  ...  333.21  1026.96  -3.54%  27835827
            3    4     FB                  ...   23.72   331.79  -0.20%  1446342
        [4 rows x 13 columns]
    
        Sales: 365.82B: Income: 94.68B

Desired Output

            No. Ticker   Sales  Income   ...     P/E    Price  Change    Volume
            0    1   AAPL 365.82B 94.68M  ...   30.71   172.17   0.10%  84987917
            1    2   MSFT                 ...   35.10   314.04   0.05%  32609473
            2    3   TSLA                 ...  333.21  1026.96  -3.54%  27835827
            3    4     FB                 ...   23.72   331.79  -0.20%  1446342
        [4 rows x 13 columns]

>Solution :

How to achieve?

Put your additional code into a def and use zip() in combination with map() to update your columns:

df['Sales'],df['Income'] = zip(*df['Ticker'].map(lambda x: get_ticker(x)))

Example

def get_ticker(ticker):
    #---------- Pulling the data from chosen stock ticker ----------#
    
    url = ('https://finviz.com/quote.ashx?t=' + ticker.upper())
    
    req = requests.get(url,headers=headers)
    table = pd.read_html(req.text, attrs = {"class":"snapshot-table2"} )
    df = table[0]
    return(df[1][3],df[1][2])


df['Sales'],df['Income'] = zip(*df['Ticker'].map(lambda x: get_ticker(x)))

Output

No.	Ticker	Sales	Income	Company	Sector	Industry	Country	Market Cap	P/E	Price	Change	Volume
1	AAPL	365.82B	94.68B	Apple Inc.	Technology	Consumer Electronics	USA	2821.90B	30.71	172.17	0.10%	84987917
2	MSFT	176.25B	67.88B	Microsoft Corporation	Technology	Software – Infrastructure	USA	2356.61B	35.10	314.04	0.05%	32609473
3	TSLA	46.85B	3.47B	Tesla, Inc.	Consumer Cyclical	Auto Manufacturers	USA	1031.33B	333.21	1026.96	-3.54%	27835827
4	FB	112.33B	40.30B	Meta Platforms, Inc.	Communication Services	Internet Content & Information	USA	924.82B	23.72	331.79	-0.20%	14463421
5	BRK-B	268.68B	–	Berkshire Hathaway Inc.	Financial	Insurance – Diversified	USA	777.07B	–	319.78	2.09%	5540310