Data Project - Stock Market Analysis

Risk Analysis Monte Carlo

We have already learnt about Monte Carlo method in the previous post - Monte Carlo. In this post I am going to discuss on how can that method be used for determining the maximum risk involved in investing to a stock.

#import libraries
import pandas as pd
from pandas import Series,DataFrame
import numpy as np

# For Visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# For reading stock data from yahoo
import pandas_datareader as pdr
from datetime import datetime
from __future__ import division
import as web
tech_list = ['AAPL','GOOG','MSFT','AMZN']

# Set up End and Start times for data grab
end =
start = datetime(end.year - 1,end.month,

for stock in tech_list:   
    globals()[stock] = web.DataReader(stock,'yahoo',start,end)    
#Descriptive & Exploratory analysis
Open High Low Close Volume Adj Close
count 252.000000 252.000000 252.000000 252.000000 2.520000e+02 252.000000
mean 115.796667 116.627698 115.155357 115.973928 3.193192e+07 115.217969
std 15.948756 15.931637 15.978002 15.994669 1.433693e+07 16.482845
min 90.000000 91.669998 89.470001 90.339996 1.147590e+07 89.008370
25% 104.767498 105.959999 104.060001 105.499998 2.352475e+07 104.394087
50% 113.225003 114.055001 112.389999 113.069999 2.818830e+07 112.006823
75% 128.062502 129.665000 127.875000 128.832501 3.588788e+07 128.276348
max 147.539993 148.089996 146.839996 147.509995 1.119850e+08 147.509995
# Let's see historical view of the closing price for the last one year
AAPL['Adj Close'].plot(legend=True,figsize=(10,4))


# Now let's plot the total volume of stock being traded each day over the past year


#Calculate and plot the simple moving average (SMA) for upto 10, 20 and 50 days
moving_avg_day = [10,20,50]

for ma in moving_avg_day:
    column_name = "MA for %s days" % (str(ma))
    AAPL[column_name] =pd.rolling_mean(AAPL['Adj Close'],ma)
AAPL[['Adj Close','MA for 10 days','MA for 20 days','MA for 50 days']].plot(subplots=False,figsize=(15,6))


#Now lets analyse the Daily Return Analysis as our first step to determing the volatility of the stock prices

# We'll use pct_change to find the percent change for each day
AAPL['Daily Return'] = AAPL['Adj Close'].pct_change()
# Then we'll plot the daily return percentage
AAPL['Daily Return'].plot(figsize=(12,4),legend=True,linestyle='--',marker='o')

#Average daily return
#Using dropna to eliminate NaN values
sns.distplot(AAPL['Daily Return'].dropna(),bins=100,color='red')


#Now lets do a comparative study to analyze the returns of all the stocks in our list

# Grab all the closing prices for the tech stocks into one new DataFrame
closing_df = web.DataReader(['AAPL','GOOG','MSFT','AMZN'],'yahoo',start,end)['Adj Close'] 
# Let's take a quick look
2016-05-05 91.865625 659.090027 701.429993 48.660218
2016-05-06 91.353293 673.950012 711.119995 49.098687
2016-05-09 91.422261 679.750000 712.900024 48.786888
2016-05-10 92.042972 703.070007 723.179993 49.712543
2016-05-11 91.146389 713.229980 715.289978 49.741774
# Make a new DataFrame for storing stock daily return
tech_rets = closing_df.pct_change()
# Using joinplot to compare the daily returns of Google and Microsoft
<seaborn.axisgrid.JointGrid at 0x24cfc1bf080>


The correlation coefficient of 0.6 indicates a strong correlation between the daily return values of Microsoft and Google. This inplies that if google’s stock value increases, microsoft’s stock value and hence daily return increases and vice versa. This analysis can further be extended by first segmentation of stock based on industry and then further analysis can be done to determine the choice of stock within that industry.

# We can also simply call pairplot on our DataFrame for visual analysis of all the comparisons
<seaborn.axisgrid.PairGrid at 0x24cfc6a4dd8>


#Another analysis on the closing prices for each stock and their individual comparisons
# Call PairPLot on the DataFrame
close_fig = sns.PairGrid(closing_df)

# Using map_upper we can specify what the upper triangle will look like.

# We can also define the lower triangle in the figure, including the plot type (kde) or the color map (BluePurple)

# Finally we'll define the diagonal as a series of histogram plots of the closing price
<seaborn.axisgrid.PairGrid at 0x24cfdf8a160>


#For a quick look at the correlation analysis for all stocks, we can plot for the daily returns using corrplot
from seaborn.linearmodels import corrplot as cor
C:\Users\saj16\Anaconda3\lib\site-packages\seaborn\ UserWarning: The `corrplot` function has been deprecated in favor of `heatmap` and will be removed in a forthcoming release. Please update your code.
  warnings.warn(("The `corrplot` function has been deprecated in favor "
C:\Users\saj16\Anaconda3\lib\site-packages\seaborn\ UserWarning: The `symmatplot` function has been deprecated in favor of `heatmap` and will be removed in a forthcoming release. Please update your code.
  warnings.warn(("The `symmatplot` function has been deprecated in favor "

<matplotlib.axes._subplots.AxesSubplot at 0x24cfff485f8>


Risk Analysis: Bootstraping method, Monte Carlo method

#It is important to note the risk involved with the expected return for each stock visually before starting the analysis
rets = tech_rets.dropna()

area = np.pi*20

plt.scatter(rets.mean(), rets.std(),alpha = 0.5,s =area)

# Setting the x and y limits & plt axis titles
plt.xlabel('Expected returns')

for label, x, y in zip(rets.columns, rets.mean(), rets.std()):
        xy = (x, y), xytext = (50, 50),
        textcoords = 'offset points', ha = 'right', va = 'bottom',
        arrowprops = dict(arrowstyle = '-', connectionstyle = 'arc3,rad=-0.3'))


As seen in the above plot the less the expected return, lesser the risk involved. Next lets determine the “Value at Risk”, that is the worst daily loss that can be encountered with 95 % Confidence.


The 0.05 empirical quantile of daily returns is at -0.015. That means that with 95% confidence, worst daily loss will not exceed 1.5%. If we have a 1 million dollar investment, our one day 5% VaR is 0.015 * 1,000,000 = $15,000.

Value at Risk using the Monte Carlo method

Using the Monte Carlo to run many trials with random market conditions, then we’ll calculate portfolio losses for each trial. After this, we’ll use the aggregation of all these simulations to establish how risky the stock is. Let’s start with a brief explanation of what we’re going to do:

We will use the geometric Brownian motion (GBM), which is technically known as a Markov process. This means that the stock price follows a random walk and is consistent with (at the very least) the weak form of the efficient market hypothesis (EMH): past price information is already incorporated and the next price movement is “conditionally independent” of past price movements. This means that the past information on the price of a stock is independent of where the stock price will be in the future, basically meaning, you can’t perfectly predict the future solely based on the previous price of a stock.

The equation for geometric Browninan motion is given by the following equation:

Where S is the stock price, mu is the expected return (which we calculated earlier),sigma is the standard deviation of the returns, t is time, and epsilon is the random variable. We can mulitply both sides by the stock price (S) to rearrange the formula and solve for the stock price.

Now we see that the change in the stock price is the current stock price multiplied by two terms. The first term is known as “drift”, which is the average daily return multiplied by the change of time. The second term is known as “shock”, for each tiem period the stock will “drift” and then experience a “shock” which will randomly push the stock price up or down. By simulating this series of steps of drift and shock thousands of times, we can begin to do a simulation of where we might expect the stock price to be.

# Set up time horizon and delta values
days = 365
dt = 1/days

# Calculate mu (drift) from the expected return data we got for AAPL
mu = rets.mean()['GOOG']

# Calculate volatility of the stock from the std() of the average return
sigma = rets.std()['GOOG']
def stock_monte_carlo(start_price,days,mu,sigma):
    ''' This function takes in starting stock price, days of simulation, mu, sigma and returns simulated price array'''

    price = np.zeros(days)
    price[0] = start_price    
    shock = np.zeros(days)
    drift = np.zeros(days)
    # Calculating and returning price array for number of days
    for x in range(1,days):
        shock[x] = np.random.normal(loc=mu * dt, scale=sigma * np.sqrt(dt))
        drift[x] = mu * dt
        price[x] = price[x-1] + (price[x-1] * (drift[x] + shock[x]))
    return price
# Get start price from GOOG.head()
start_price = 560.85
simulations = np.zeros(100)

for run in range(100):
    simulations[run] = stock_monte_carlo(start_price,days,mu,sigma)[days-1]
plt.title('Monte Carlo Analysis for Google')
<matplotlib.text.Text at 0x24c805d4278>


# Now we'lll define q as the 1% empirical qunatile, this basically means that 99% of the values should fall between here
q = np.percentile(simulations, 1)
# Now let's plot the distribution of the end prices

# Using plt.figtext to fill in some additional information onto the plot

# Starting Price
plt.figtext(0.6, 0.8, s="Start price: $%.2f" %start_price)
# Mean ending price
plt.figtext(0.6, 0.7, "Mean final price: $%.2f" % simulations.mean())

# Variance of the price (within 99% confidence interval)
plt.figtext(0.6, 0.6, "VaR(0.99): $%.2f" % (start_price - q,))

# Display 1% quantile
plt.figtext(0.15, 0.6, "q(0.99): $%.2f" % q)

# Plot a line at the 1% quantile result
plt.axvline(x=q, linewidth=4, color='r')

# Title
plt.title(u"Final price distribution for Google Stock after %s days" % days, weight='bold');


This basically means that for every initial stock you purchase there is about $13.61 at risk 1% of the time as predicted by Monte Carlo Simulation.