I chose to analyze and measure three different risk categories that would be found within S&P 500 firms.
I used textual analysis to measure these risks. I first downloaded the annual reports (10K’s) of all of the current S&P 500 companies (this is done in download_text_files.ipynb). Then, I extracted the text from each 10K report and used regular expressions to count how many times certain words appeared next to eachother within the text. With a higher the count, that would imply that the given risk would be higher for that particular firm. (this is done in measure_risk.ipynb)
For example, in measuring the Supply Dependence Risk, I used the following regular expression:
(['(supply|supplies|supplier|logistics|labor)','(risk|harm|adverse|negative|cost|suffer)'],15)This expression finds words such as “supply” and “labor” and if they are near any of the words such as “risk” and “adverse” (within 15 words), then that would add one to the total risk exposure of that firm.
The following expressions were used in determining the other risk factors:
Risk Factor | Word group 1 | Word group 2 | Max num words between groups |
---|---|---|---|
Competitor Based Risk | competitor,competition,competitive,position,customer | risk,harm,factor,adverse | 15 |
Regulatory/Political Based Risk (1) | government,regulation,FDA,law,politic,compliance | risk,harm,cost,time,liability,adverse | 15 |
Regulatory/Political Based Risk (2) | government,regulation,FDA,law,politic,compliance | risk,harm,cost,time,liability,adverse | 7 |
Regulatory/Political Based Risk (3) | regulat,SEC,politic,international,tax | harm,negativ,adverse,decreas,susceptible | 15 |
Supply Dependence Risk
Supply chain was a large issue at the beginning of the covid outbreak. There are several firms that are more reliant on the supply of materials, labor, and other factors. I chose to measure this risk specifically because I wanted to see if there would be data to support my hypothesis that there would be lower returns for firms that depend on supply chain during the start of a pandemic-like outbreak.
Competitor Based Risk
Almost all firms have competition, but some face more competition than others. I chose to measure this type of risk because I was interested in learning how competition within different industries and firms affected returns during economic downturn.
Regulatory/Political Risk
Regulatory issues is something most firms have to deal with, and come covid came an onslaught of new regulations such as mask mandates and policy controls. I was interested in capturing correlations that could suggest that firms with a higher exposure to this risk were negatively affected regarding their returns when covid started.
The following code will output a summary of statistical properties regarding the risk measurements.
#Load output/sp500_accting_plus_textrisks.csv file
import seaborn as sns
import pandas as pd
from statsmodels.formula.api import ols as sm_ols
sample_csv = "output/sp500_accting_plus_textrisks.csv"
sample_firms_risks = pd.read_csv(sample_csv)
sample_firms_risks = sample_firms_risks.drop(['Unnamed: 0'], axis=1)
sample_firms_risks = sample_firms_risks.rename(columns={
'Regulatory and Political-based Risk Exposure': 'Regulatory Risk 1',
'Regulatory and Political-based(2) Risk Exposure': 'Regulatory Risk 2',
'Regulatory and Political-based(3) Risk Exposure': 'Regulatory Risk 3',
'Competitor-based Risk Exposure': 'Competitor Risk',
'Supply Dependence-based Risk Exposure': 'Supply Risk'})
#Describe statistical data of the risks
sample_firms_risks[['Regulatory Risk 1',
'Regulatory Risk 2',
'Regulatory Risk 3',
'Competitor Risk',
'Supply Risk']].describe()
Regulatory Risk 1 | Regulatory Risk 2 | Regulatory Risk 3 | Competitor Risk | Supply Risk | |
---|---|---|---|---|---|
count | 492.000000 | 492.000000 | 492.000000 | 492.000000 | 492.000000 |
mean | 10.932927 | 5.093496 | 2.097561 | 6.843496 | 4.034553 |
std | 9.564529 | 5.839974 | 2.103879 | 5.361656 | 4.016365 |
min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 5.000000 | 1.000000 | 1.000000 | 3.000000 | 1.000000 |
50% | 8.000000 | 3.000000 | 2.000000 | 6.000000 | 3.000000 |
75% | 13.000000 | 6.000000 | 3.000000 | 9.000000 | 6.000000 |
max | 71.000000 | 39.000000 | 15.000000 | 35.000000 | 27.000000 |
As displayed in the output above of 492 datapoints, it is clear that I have a variation of risk measurements not only within risk categories, but between risk categories. For instance, all of my risk measurements have non-zero data within at least the 25% percentile, and a decently sized range from the 25%-75% percentile. This means that I have variation within my results, which is critical in measuring the potential correlation of these risk factors in respective to the firms returns. Further, the mean output for regulatory risk 1 is significantly different from regulatory risk 2 and 3, thus demonstrating the differences within my use of risk measurement.
reg1zero = sample_firms_risks[['Regulatory Risk 1']].value_counts()[0]
reg2zero = sample_firms_risks[['Regulatory Risk 2']].value_counts()[0]
reg3zero = sample_firms_risks[['Regulatory Risk 3']].value_counts()[0]
compzero = sample_firms_risks[['Competitor Risk']].value_counts()[0]
supplyzero = sample_firms_risks[['Supply Risk']].value_counts()[0]
data = {'Firms with no values': [reg1zero, reg2zero, reg3zero, compzero, supplyzero],
'% of firms with no values': [(reg1zero/492)*100, (reg2zero/492)*100,
(reg3zero/492)*100, (compzero/492)*100, (supplyzero/492)*100]}
df = pd.DataFrame(data, index =['Regulatory Risk 1',
'Regulatory Risk 2',
'Regulatory Risk 3',
'Competitor Risk',
'Supply Risk'])
df
Firms with no values | % of firms with no values | |
---|---|---|
Regulatory Risk 1 | 5 | 1.016260 |
Regulatory Risk 2 | 58 | 11.788618 |
Regulatory Risk 3 | 117 | 23.780488 |
Competitor Risk | 14 | 2.845528 |
Supply Risk | 96 | 19.512195 |
Additionally, the above data table shows the number of firms that had a “0” value within the respective risk measurement. Considering that there are 492 firms, the amount of firms that had no values is quite low. Only 1% of firms had no “Regulatory Risk 1” and the maximum amout of “no values” was within the “Regulatory Risk 3” category with 23.8%.
The following are several examples of risk matches that resulted from my defined expressions. In these examples, it is clear that the type of risk is found within the corporation.
Regulatory/Political Risk </br>
Abbott (ABT): “Abbott is subject to numerous governmental regulations and it can be costly to comply with these regulations and to develop compliant products and processes.”
Bank of America (BAC): “Economic or geopolitical stress in one or more countries could have a negative impact regionally or globally, resulting in, among other things, market volatility.”
Boeing (BA): “We conduct a significant portion of our business pursuant to U.S. government contracts, which are subject to unique risks.”
Competitor Risk </br>
Assurant (AIZ): “Significant competitive pressures, changes in customer preferences and disruption could adversely affect our results of operations.”
Discover (DFS): “These competitive factors affect our ability to attract and retain customers, increase usage of our products and maximize the revenue generated by our products.”
Northrop Grumman(NOC) “If we are unable to attract and retain a qualified workforce, we may be unable to maintain our competitive position and our future success could be materially adversely affected.”
Supply Dependence Risk </br>
Medtronic (MDT) “A reduction or interruption in supply, and an inability to develop alternative sources for such supply, could adversely affect our ability to manufacture our products in a timely or cost-effective manner and could result in lost sales.”
Moderna (MRNA) “Failure to effectively maintain our cold-chain supply logistics, by us or third parties, has in the past and could in the future lead to additional manufacturing costs and delays in our ability to supply required quantities for clinical trials or otherwise.”
Nike (NKE) “Continued volatility in the availability and prices for commodities and raw materials we use in our products and in our supply chain (such as cotton or petroleum derivatives) could have a material adverse effect on our costs, gross margins and profitability.”
Using an arbitrary risk value, I have narrowed down the top ~10 firms who independently have the highest supply dependence, regulator/political, and competitor based risk.
Firms With High Supply Dependence Risk
high_supply_risk = sample_firms_risks.loc[sample_firms_risks['Supply Risk'] >= 16]
high_supply_risk = high_supply_risk[['Symbol', 'Security']]
high_supply_risk
Symbol | Security | |
---|---|---|
48 | APTV | Aptiv |
60 | BAX | Baxter |
128 | CTRA | Coterra |
176 | EL | Estée Lauder Companies |
180 | ES | Eversource |
181 | EXC | Exelon |
321 | MOS | Mosaic |
342 | NUE | Nucor |
415 | SO | Southern Company |
435 | TSLA | Tesla |
Firms With High Regulatory/Political Risk
high_reg_risk = sample_firms_risks.loc[sample_firms_risks['Regulatory Risk 1'] >= 40]
high_reg_risk = high_reg_risk[['Symbol', 'Security']]
high_reg_risk
Symbol | Security | |
---|---|---|
59 | BAC | Bank of America |
141 | DXCM | DexCom |
237 | HII | Huntington Ingalls Industries |
273 | LHX | L3Harris |
278 | LDOS | Leidos |
285 | LMT | Lockheed Martin |
312 | MRNA | Moderna |
378 | PTC | PTC |
419 | STT | State Street |
Firms With High Competitor Risk
high_comp_risk = sample_firms_risks.loc[sample_firms_risks['Competitor Risk'] >= 22]
high_comp_risk = high_comp_risk[['Symbol', 'Security']]
high_comp_risk
Symbol | Security | |
---|---|---|
50 | AIZ | Assurant |
59 | BAC | Bank of America |
118 | CMA | Comerica |
144 | DFS | Discover |
156 | DUK | Duke Energy |
236 | HUM | Humana |
237 | HII | Huntington Ingalls Industries |
328 | NEM | Newmont |
338 | NOC | Northrop Grumman |
368 | PNC | PNC Financial Services |
443 | TRV | Travelers |
Based off of this data, the risk measurements are likely “valid” in terms of that they capture, generally, the predisposed risk of each firm in a quantifiable nature. First, in doing simple searches within a random firm’s 10K, the areas where the firm addresses their risk regarding a specific topic are generally caught by the regular expression. It is easier to find areas where the regular expression correctly caught the risk presented than not.
Furthermore, the outputs of the top ~10 firms of each risk, in majority, make sense.
For example, the top firms with Supply Dependence Risk mainly revolved around firms that had a large role in the energy (gas, electric) or auto industry, which makes complete sense. Energy and auto companies are dependent on supply or suppliers and would it would adversely affect them if they changed suddenly.
The top firms in the Regulatory/Political Risk were within a broad category of industries including pharmaceuticals, aerospace, engineering, and defense technology. Despite this broad range, it does make sense that these specific firms/industries would have a higher regulatory risk exposure.
The top firms in the Competitor Risk were companies such as Bank of America, Discover, Duke Energy, and PNC. Despite these firms being powerhouses in their respective industries, they do face intense competition which explains their high hit rate for the competitor risk category.
The code below will first calculate the compounded weekly returns of each firm from 03/09/2020 - 03/13/2020. It will then merge this data into the risk dataset (sample_firms_risks) to create one final dataframe containing all necessary observations (risk data, accounting data, return data). This dataframe is called final_sample.
#Read in the firm_returns dataset
firm_returns = pd.read_stata("2019-2020-stock_rets cleaned.dta")
firm_returns = firm_returns[['date', 'ticker', 'ret']]
#restrict dataset to show specified dates only
firm_returns = firm_returns[(firm_returns['date'] == 20200309) |
(firm_returns['date'] == 20200310) |
(firm_returns['date'] == 20200311) |
(firm_returns['date'] == 20200312) |
(firm_returns['date'] == 20200313)]
firm_returns['ret'] = pd.to_numeric(firm_returns['ret']) #make the 'ret' column a numeric type
firm_returns = firm_returns.assign(R = (1+firm_returns['ret'])) #add new column 'R' which is simply 1 + 'ret' col
#group the dataset by ticker
firm_returns = firm_returns.groupby(['ticker'])
#compute weekly firm returns for each firm
wkly_firm_returns = firm_returns['R'].prod() - 1
#clean up DF
wkly_firm_returns = wkly_firm_returns.to_frame() #convert series into DF
wkly_firm_returns = wkly_firm_returns.rename(columns={'R': 'Weekly Return 03/09-03/13'})
wkly_firm_returns = wkly_firm_returns.reset_index()
#merge sample_firms_risks into the weekly return DF to get one final DF
final_sample = sample_firms_risks.merge(wkly_firm_returns,
left_on='Symbol',
right_on='ticker',
how='left')
final_sample = final_sample.drop(['ticker'],axis=1)
Number of Firms in Dataset
print("There are",final_sample['Regulatory Risk 1'].count(),"number of firms with risk data, and",
final_sample['Weekly Return 03/09-03/13'].count(),"firms with return data.")
There are 492 number of firms with risk data, and 488 firms with return data.
Summary, Statistics, EDA
print("The number of rows and variables within the final dataset is:",final_sample.shape,"\n")
print("The full list of variables (columns) in the dataset is:\n\n",final_sample.columns)
The number of rows and variables within the final dataset is: (492, 47)
The full list of variables (columns) in the dataset is:
Index(['Symbol', 'Security', 'Regulatory Risk 1', 'Regulatory Risk 2',
'Regulatory Risk 3', 'Competitor Risk', 'Supply Risk', 'gvkey',
'lpermno', 'datadate', 'fyear', 'sic', 'sic3', 'td', 'long_debt_dum',
'me', 'l_a', 'l_sale', 'div_d', 'age', 'atr', 'smalltaxlosscarry',
'largetaxlosscarry', 'l_emp', 'l_ppent', 'l_laborratio', 'Inv',
'Ch_Cash', 'Div', 'Ch_Debt', 'Ch_Eqty', 'Ch_WC', 'CF', 'td_a', 'td_mv',
'mb', 'prof_a', 'ppe_a', 'cash_a', 'xrd_a', 'dltt_a', 'invopps_FG09',
'sales_g', 'dv_a', 'short_debt', '_merge', 'Weekly Return 03/09-03/13'],
dtype='object')
To understand what each of the variable names mean, please visit this link:</br> https://github.com/LeDataSciFi/ledatascifi-2022/blob/main/data/ccm_variable_descriptions.csv
The following table is the summary statistics of each variable within the dataset:
final_sample.describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
Regulatory Risk 1 | 492.0 | 10.932927 | 9.564529 | 0.000000 | 5.000000 | 8.000000 | 13.000000 | 7.100000e+01 |
Regulatory Risk 2 | 492.0 | 5.093496 | 5.839974 | 0.000000 | 1.000000 | 3.000000 | 6.000000 | 3.900000e+01 |
Regulatory Risk 3 | 492.0 | 2.097561 | 2.103879 | 0.000000 | 1.000000 | 2.000000 | 3.000000 | 1.500000e+01 |
Competitor Risk | 492.0 | 6.843496 | 5.361656 | 0.000000 | 3.000000 | 6.000000 | 9.000000 | 3.500000e+01 |
Supply Risk | 492.0 | 4.034553 | 4.016365 | 0.000000 | 1.000000 | 3.000000 | 6.000000 | 2.700000e+01 |
gvkey | 352.0 | 45655.275568 | 61312.551897 | 1045.000000 | 6357.250000 | 13973.000000 | 61612.250000 | 3.160560e+05 |
lpermno | 352.0 | 53572.482955 | 30231.047786 | 10104.000000 | 19474.750000 | 58246.000000 | 82644.250000 | 9.343600e+04 |
fyear | 352.0 | 2018.886364 | 0.317821 | 2018.000000 | 2019.000000 | 2019.000000 | 2019.000000 | 2.019000e+03 |
sic | 352.0 | 4330.014205 | 1949.495636 | 100.000000 | 2849.250000 | 3786.000000 | 5507.750000 | 8.742000e+03 |
sic3 | 352.0 | 432.784091 | 194.979666 | 10.000000 | 284.750000 | 378.500000 | 550.750000 | 8.740000e+02 |
td | 352.0 | 12119.868341 | 21713.135840 | 0.000000 | 1817.975000 | 5079.242500 | 12152.750000 | 1.884020e+05 |
long_debt_dum | 352.0 | 0.982955 | 0.129625 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000e+00 |
me | 352.0 | 57432.237628 | 116693.961167 | 2963.886500 | 13191.317000 | 22455.288000 | 51945.526250 | 1.023856e+06 |
l_a | 352.0 | 9.708541 | 1.229763 | 6.569794 | 8.800306 | 9.686929 | 10.557991 | 1.322070e+01 |
l_sale | 352.0 | 9.317635 | 1.256966 | 4.097822 | 8.472732 | 9.231821 | 10.018299 | 1.314555e+01 |
div_d | 352.0 | 0.744318 | 0.436865 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000e+00 |
age | 352.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 |
atr | 352.0 | 0.234018 | 0.235356 | 0.000000 | 0.125926 | 0.199817 | 0.241626 | 1.000000e+00 |
smalltaxlosscarry | 273.0 | 0.721612 | 0.449029 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000e+00 |
largetaxlosscarry | 273.0 | 0.197802 | 0.399074 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000e+00 |
l_emp | 352.0 | 3.327105 | 1.160559 | 0.455524 | 2.440164 | 3.263833 | 4.197118 | 6.025866e+00 |
l_ppent | 352.0 | 7.904878 | 1.546549 | 3.690204 | 6.795529 | 7.825421 | 9.037236 | 1.111335e+01 |
l_laborratio | 352.0 | 4.650632 | 1.317926 | 0.511044 | 3.834218 | 4.385719 | 5.330502 | 9.931146e+00 |
Inv | 352.0 | 0.054387 | 0.084964 | -0.329408 | 0.020732 | 0.048013 | 0.089258 | 4.238831e-01 |
Ch_Cash | 352.0 | 0.008847 | 0.065049 | -0.315808 | -0.007998 | 0.003960 | 0.023985 | 3.837106e-01 |
Div | 352.0 | 0.025462 | 0.027004 | 0.000000 | 0.000000 | 0.020492 | 0.037559 | 1.385936e-01 |
Ch_Debt | 352.0 | 0.014027 | 0.072364 | -0.265326 | -0.019568 | 0.000000 | 0.031572 | 4.217628e-01 |
Ch_Eqty | 352.0 | -0.042757 | 0.058526 | -0.282758 | -0.062731 | -0.023404 | -0.002746 | 1.741915e-01 |
Ch_WC | 352.0 | 0.011417 | 0.044580 | -0.252402 | -0.005327 | 0.006631 | 0.024374 | 3.726431e-01 |
CF | 352.0 | 0.123344 | 0.077283 | -0.288764 | 0.074645 | 0.113733 | 0.162095 | 3.332969e-01 |
td_a | 352.0 | 0.328435 | 0.192516 | 0.000000 | 0.204722 | 0.319227 | 0.432302 | 1.245754e+00 |
td_mv | 352.0 | 0.185137 | 0.140769 | 0.000000 | 0.092256 | 0.160139 | 0.262688 | 8.095309e-01 |
mb | 352.0 | 3.027132 | 2.091248 | 0.877849 | 1.583681 | 2.412643 | 3.655906 | 1.308288e+01 |
prof_a | 352.0 | 0.151264 | 0.074569 | -0.323828 | 0.101974 | 0.138951 | 0.186640 | 3.903839e-01 |
ppe_a | 352.0 | 0.247675 | 0.219468 | 0.009521 | 0.091756 | 0.162726 | 0.336247 | 9.285623e-01 |
cash_a | 352.0 | 0.126782 | 0.138790 | 0.002073 | 0.032281 | 0.072970 | 0.167961 | 6.946123e-01 |
xrd_a | 352.0 | 0.031364 | 0.050330 | 0.000000 | 0.000000 | 0.009533 | 0.043306 | 3.367946e-01 |
dltt_a | 352.0 | 0.295314 | 0.181393 | 0.000000 | 0.176565 | 0.283319 | 0.386684 | 1.071959e+00 |
invopps_FG09 | 331.0 | 2.702297 | 2.106053 | 0.405435 | 1.250688 | 2.160294 | 3.298536 | 1.216423e+01 |
sales_g | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
dv_a | 352.0 | 0.025462 | 0.027004 | 0.000000 | 0.000000 | 0.020492 | 0.037559 | 1.385936e-01 |
short_debt | 346.0 | 0.112720 | 0.111597 | 0.000000 | 0.026783 | 0.085531 | 0.152638 | 7.610294e-01 |
Weekly Return 03/09-03/13 | 488.0 | -0.122035 | 0.090437 | -0.610145 | -0.159448 | -0.106716 | -0.065169 | 1.177476e-01 |
The following table shows the number of unique values for the specified variables:
final_sample[['Regulatory Risk 1','Regulatory Risk 2',
'Regulatory Risk 3','Competitor Risk',
'Supply Risk','Weekly Return 03/09-03/13']].nunique()
Regulatory Risk 1 44
Regulatory Risk 2 33
Regulatory Risk 3 14
Competitor Risk 31
Supply Risk 23
Weekly Return 03/09-03/13 488
dtype: int64
1. Some of the 10K’s were not properly downloaded. </br> Some of the 10K’s that were downloaded are not actually full 10K reports and rather are a very short version of them, or a wrong document. For example the firms “A” and “V” both were not downloaded correctly. Both of these firms have “0” values for all of the risk data collected. This should not affect the overall data too much, since there are over 490 firms, however there could be other 10K reports downloaded incorrectly. </br> </br> 2. Some of the accounting data is missing. </br> There are several columns within the accounting data that are clearly wrong. For example sales_g has all ‘NaN’ data, whereas the age category is all ‘0’. This won’t affect my analysis as long as I do not use these types of data within my correlations.
This section will use the weekly returns from 03/09 - 03/13 and risk data to but will measure potential correlations between the two variables.
Measurement: Supply Dependence Risk
ret_v_supply = sns.lmplot(data=final_sample,
x='Supply Risk', y='Weekly Return 03/09-03/13').set(
title="Firm Week Return v. Supply Dependence Risk ",
xlabel="Num Supply Dependence Risk Hits",
ylabel="Firm Week Return 03/09-03/13")
ret_v_supply;
Regression Analysis for Supply Dependence
sm_ols('Q("Weekly Return 03/09-03/13") ~ Q("Supply Risk")',data = final_sample).fit().summary()
Dep. Variable: | Q("Weekly Return 03/09-03/13") | R-squared: | 0.012 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.009 |
Method: | Least Squares | F-statistic: | 5.657 |
Date: | Fri, 25 Mar 2022 | Prob (F-statistic): | 0.0178 |
Time: | 12:06:52 | Log-Likelihood: | 483.60 |
No. Observations: | 488 | AIC: | -963.2 |
Df Residuals: | 486 | BIC: | -954.8 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -0.1123 | 0.006 | -19.439 | 0.000 | -0.124 | -0.101 |
Q("Supply Risk") | -0.0024 | 0.001 | -2.379 | 0.018 | -0.004 | -0.000 |
Omnibus: | 172.409 | Durbin-Watson: | 2.136 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 662.686 |
Skew: | -1.578 | Prob(JB): | 1.26e-144 |
Kurtosis: | 7.758 | Cond. No. | 8.15 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Measurement: Competitor Risk
ret_v_comp = sns.lmplot(data=final_sample,
x='Competitor Risk', y='Weekly Return 03/09-03/13').set(
title="Firm Week Return v. Competitor Risk ",
xlabel="Num Competitor Risk Hits",
ylabel="Firm Week Return 03/09-03/13")
ret_v_comp;
Regression Analysis for Competitor Risk
sm_ols('Q("Weekly Return 03/09-03/13") ~ Q("Competitor Risk")',data = final_sample).fit().summary()
Dep. Variable: | Q("Weekly Return 03/09-03/13") | R-squared: | 0.000 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.002 |
Method: | Least Squares | F-statistic: | 0.1401 |
Date: | Fri, 25 Mar 2022 | Prob (F-statistic): | 0.708 |
Time: | 12:06:53 | Log-Likelihood: | 480.84 |
No. Observations: | 488 | AIC: | -957.7 |
Df Residuals: | 486 | BIC: | -949.3 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -0.1240 | 0.007 | -18.682 | 0.000 | -0.137 | -0.111 |
Q("Competitor Risk") | 0.0003 | 0.001 | 0.374 | 0.708 | -0.001 | 0.002 |
Omnibus: | 166.003 | Durbin-Watson: | 2.111 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 602.293 |
Skew: | -1.536 | Prob(JB): | 1.64e-131 |
Kurtosis: | 7.492 | Cond. No. | 14.1 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Measurement: Regulatory/Political Risk (all variations)
ret_v_reg1 = sns.lmplot(data=final_sample,
x='Regulatory Risk 1', y='Weekly Return 03/09-03/13').set(
title="Firm Week Return v. Regulatory Risk (1) ",
xlabel="Num Regulatory Risk (1) Hits",
ylabel="Firm Week Return 03/09-03/13")
ret_v_reg1;
ret_v_reg2 = sns.lmplot(data=final_sample,
x='Regulatory Risk 2', y='Weekly Return 03/09-03/13').set(
title="Firm Week Return v. Regulatory Risk (2) ",
xlabel="Num Regulatory Risk (2) Hits",
ylabel="Firm Week Return 03/09-03/13")
ret_v_reg2;
ret_v_reg3 = sns.lmplot(data=final_sample,
x='Regulatory Risk 3', y='Weekly Return 03/09-03/13').set(
title="Firm Week Return v. Regulatory Risk (3) ",
xlabel="Num Regulatory Risk (3) Hits",
ylabel="Firm Week Return 03/09-03/13")
ret_v_reg3;
Regression Analysis for Regulatory Risk (1)
sm_ols('Q("Weekly Return 03/09-03/13") ~ Q("Regulatory Risk 1")',data = final_sample).fit().summary()
Dep. Variable: | Q("Weekly Return 03/09-03/13") | R-squared: | 0.001 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.001 |
Method: | Least Squares | F-statistic: | 0.3067 |
Date: | Fri, 25 Mar 2022 | Prob (F-statistic): | 0.580 |
Time: | 12:06:56 | Log-Likelihood: | 480.93 |
No. Observations: | 488 | AIC: | -957.9 |
Df Residuals: | 486 | BIC: | -949.5 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -0.1194 | 0.006 | -19.226 | 0.000 | -0.132 | -0.107 |
Q("Regulatory Risk 1") | -0.0002 | 0.000 | -0.554 | 0.580 | -0.001 | 0.001 |
Omnibus: | 168.798 | Durbin-Watson: | 2.118 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 627.751 |
Skew: | -1.555 | Prob(JB): | 4.85e-137 |
Kurtosis: | 7.605 | Cond. No. | 22.1 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Risk Data vs. Return Analysis </br> Interestingly, none of my risk factors I measured had a strong negative or positive correlation with the returns from 03/09 to 03/24. However, certain risks had slight negative correlations. The most significant one was with Supply Dependence Risk. As shown with the regression analysis there is an intercept of -0.1123 (mostly negative returns) and a slope of-0.0024. It is expected that this should be a negative slope since firms with a more dependence on supply and supply chains would be disrupted during the initial outbreak of covid. However, I did expect there to be a steeper slope.
In regards to all of the regulatory variations and competitor risk data, they showed almost no, or very slight correlations with the return data. This means that having this type of risk did not really affect the firms differently when the pandemic hit. In contrast, I expected specifically the regulatory risk to have a negative correlation compared to the weekly returns.
This section will use the same type of analysis as presented before but will measure important accounting data to returns as opposed to risk data.
Measurement: PPE as a % of Assets
ret_v_ppe_a = sns.lmplot(data=final_sample,
x='ppe_a', y='Weekly Return 03/09-03/13').set(
title="Firm Week Return v. PPE as % of Assets",
xlabel="PPE as a % of Assets",
ylabel="Firm Week Return 03/09-03/13")
ret_v_ppe_a;
Measurement: Cash as a % of Assets
ret_v_cash_a = sns.lmplot(data=final_sample,
x='cash_a', y='Weekly Return 03/09-03/13').set(
title="Firm Week Return v. Cash as % of Assets",
xlabel="Cash as a % of Assets",
ylabel="Firm Week Return 03/09-03/13")
ret_v_cash_a;
Accounting Data vs. Return Analysis </br> I isolated two different accounting data fields in order to see correlations with weekly returns. The first one was PPE as a percentage of Assets. As expected, there is a negative correlation between the two. When a firm has more PPE on hand, that means the firm has more leverage and thus will be more susceptible to economic impacts (if market does poor, these firms do really poor and vise versa). In contrary, there is a positive correlation between returns and cash as a % of assets. Firms with higher cash on hand did better during the initial covid outbreak, which also can be expected.
This section will use the same type of analysis as presented before but will measure the returns on 03/24 (stimulus day) as opposed to the previous 03/09 - 03/13.
#Read in the firm_returns dataset
firm_returns2 = pd.read_stata("2019-2020-stock_rets cleaned.dta")
firm_returns2 = firm_returns2[['date', 'ticker', 'ret']]
#restrict dataset to show specified dates only
firm_returns2 = firm_returns2[(firm_returns2['date'] == 20200324)]
firm_returns2['ret'] = pd.to_numeric(firm_returns2['ret']) #make the 'ret' column a numeric type
firm_returns2 = firm_returns2.assign(R = (1+firm_returns2['ret'])) #add new column 'R' which is simply 1 + 'ret' col
#group the dataset by ticker
firm_returns2 = firm_returns2.groupby(['ticker'])
#compute weekly firm returns for each firm
wkly_firm_returns2 = firm_returns2['R'].prod() - 1
#clean up DF
wkly_firm_returns2 = wkly_firm_returns2.to_frame() #convert series into DF
wkly_firm_returns2 = wkly_firm_returns2.rename(columns={'R': 'Weekly Return 03/24'})
wkly_firm_returns2 = wkly_firm_returns2.reset_index()
#merge sample_firms_risks into the weekly return DF to get one final DF
final_sample_stimmy = sample_firms_risks.merge(wkly_firm_returns2,
left_on='Symbol',
right_on='ticker',
how='left')
final_sample_stimmy = final_sample_stimmy.drop(['ticker'],axis=1)
Measurement: Supply Dependence Risk
ret_v_supply_stimmy = sns.lmplot(data=final_sample_stimmy,
x='Supply Risk', y='Weekly Return 03/24').set(
title="Firm Week Return v. Supply Dependence Risk ",
xlabel="Num Supply Dependence Risk Hits",
ylabel="Firm Week Return 03/24")
ret_v_supply_stimmy;
Measurement: Competitor Risk
ret_v_comp_stimmy = sns.lmplot(data=final_sample_stimmy,
x='Competitor Risk', y='Weekly Return 03/24').set(
title="Firm Week Return v. Competitor Risk ",
xlabel="Num Competitor Risk Hits",
ylabel="Firm Week Return 03/24")
ret_v_comp_stimmy;
Measurement: Regulatory/Political Risk (1)
ret_v_reg1_stimmy = sns.lmplot(data=final_sample_stimmy,
x='Regulatory Risk 1', y='Weekly Return 03/24').set(
title="Firm Week Return v. Regulatory Risk (1) ",
xlabel="Num Regulatory Risk (1) Hits",
ylabel="Firm Week Return 03/24")
ret_v_reg1_stimmy;
Measurement: PPE as a % of Assets
ret_v_ppe_a_stimmy = sns.lmplot(data=final_sample_stimmy,
x='ppe_a', y='Weekly Return 03/24').set(
title="Firm Week Return v. PPE as % of Assets",
xlabel="PPE as a % of Assets",
ylabel="Firm Week Return 03/24")
ret_v_ppe_a_stimmy;
Stimmy Day Return Data Analysis </br> With stimulus day announced on March 24, 2020, the returns over that period skyrocked (comparatively to previous returns in that month due to covid). Every single firm in my dataset had a positive return on that day. Thus, when analyzing the returns vs. several risk measurements, it would be predictable of such correlation to be the opposite of what my previous correlations displayed.
Interestingly, this proved to be somewhat correct. The PPE vs. returns switched to a postive correlation, and the Competitor Risk vs. returns switched to be a slight positive correlation as well. However the supply dependence risk showed no correlation at all, likewise with the regulatory risk.