3-3-3-15. Mini-Project: Statistics From Stock Data

Statistics from Stock Data

# We import pandas into Python
import pandas as pd

# We read in a stock data data file into a data frame and see what it looks like
df = pd.read_csv('./GOOG.csv')

# We display the first 5 rows of the DataFrame
df.head()
To Do
# We load the Google stock data into a DataFrame
google_stock = 

# We load the Apple stock data into a DataFrame
apple_stock = 

# We load the Amazon stock data into a DataFrame
amazon_stock = 

You can check that you have loaded the data correctly by displaying the head of the DataFrames.

# We display the google_stock DataFrame
google_stock.head()

You will now join the three DataFrames above to create a single new DataFrame that contains all the Adj Close for all the stocks. Let’s start by creating an empty DataFrame that has as row indices calendar days between 2000-01-01 and 2016-12-31. We will use the pd.date_range() function to create the calendar dates first and then we will create a DataFrame that uses those dates as row indices:

# We create calendar dates between '2000-01-01' and  '2016-12-31'
dates = pd.date_range('2000-01-01', '2016-12-31')

# We create and empty DataFrame that uses the above dates as indices
all_stocks = pd.DataFrame(index = dates)

To Do
# Change the Adj Close column label to Google
google_stock = 

# Change the Adj Close column label to Apple
apple_stock = 

# Change the Adj Close column label to Amazon
amazon_stock = 

You can check that the column labels have been changed correctly by displaying the datadrames

# We display the google_stock DataFrame
google_stock.head()

Now that we have unique column labels, we can join the individual DataFrames to the all_stocks DataFrame. For this we will use the dataframe.join() function. The function dataframe1.join(dataframe2) joins dataframe1 with dataframe2. We will join each dataframe one by one to the all_stocks dataframe. Fill in the code below to join the dataframes, the first join has been made for you:

# We join the Google stock to all_stocks
all_stocks = all_stocks.join(google_stock)

# We join the Apple stock to all_stocks
all_stocks = 

# We join the Amazon stock to all_stocks
all_stocks =

You can check that the dataframes have been joined correctly by displaying the all_stocks dataframe

# We display the all_stocks DataFrame
all_stocks.head()
# Describe the all_stocks DataFrame
all_stocks.describe()

To Do

# Print the coumn-wise count of NaN values, if any, in the all_stocks dataframe.


# Remove any rows that contain NaN values. Do this operation inplace. 

Now that you have eliminated any NaN values we can now calculate some basic statistics on the stock prices. Fill in the code below

# Print the average stock price for each stock

# Print the median stock price for each stock

# Print the standard deviation of the stock price for each stock  

# Print the correlation between stocks

We will now look at how we can compute some rolling statistics, also known as moving statistics. We can calculate for example the rolling mean (moving average) of the Google stock price by using the Pandas dataframe.rolling().mean() method. The dataframe.rolling(N).mean() calculates the rolling mean over an N-day window. In other words, we can take a look at the average stock price every N days using the above method. Fill in the code below to calculate the average stock price every 150 days for Google stock

# We compute the rolling mean using a 150-Day window for Google stock
rollingMean = 

We can also visualize the rolling mean by plotting the data in our dataframe. In the following lessons you will learn how to use Matplotlib to visualize data. For now I will just import matplotlib and plot the Google stock data on top of the rolling mean. You can play around by changing the rolling mean window and see how the plot changes.

# this allows plots to be rendered in the notebook


# We import matplotlib into Python
import matplotlib.pyplot as plt


# We plot the Google stock data
plt.plot(all_stocks['Google'])

# We plot the rolling mean ontop of our Google stock data
plt.plot(rollingMean)
plt.legend(['Google Stock Price', 'Rolling Mean'])
plt.show()

Solution:

# We import pandas into Python
import pandas as pd

# We read in a stock data data file into a data frame and see what it looks like
df = pd.read_csv('./GOOG.csv')

# We display the first 5 rows of the DataFrame
df.head()

# We load the Google stock data into a DataFrame
google_stock = pd.read_csv('GOOG.csv', index_col = ['Date'],  parse_dates = True, usecols = ['Date', 'Adj Close'])

# We load the Apple stock data into a DataFrame
apple_stock = pd.read_csv('AAPL.csv',  index_col = ['Date'],  parse_dates = True, usecols = ['Date', 'Adj Close'])
                       
# We load the Amazon stock data into a DataFrame
amazon_stock = pd.read_csv('AMZN.csv', index_col = ['Date'],  parse_dates = True, usecols = ['Date', 'Adj Close'])

# We display the google_stock DataFrame
google_stock.head()

# We create calendar dates between '2000-01-01' and  '2016-12-31'
dates = pd.date_range('2000-01-01', '2016-12-31')

# We create and empty DataFrame that uses the above dates as indices
all_stocks = pd.DataFrame(index = dates)

# Change the Adj Close column label to Google
google_stock = google_stock.rename(columns = {'Adj Close' : 'Google'})

# Change the Adj Close column label to Apple
apple_stock = apple_stock.rename(columns = {'Adj Close' : 'Apple'})

# Change the Adj Close column label to Amazon
amazon_stock = amazon_stock.rename(columns = {'Adj Close' : 'Amazon'})

# We display the google_stock DataFrame
google_stock.head()

# We join the Google stock to all_stocks
all_stocks = all_stocks.join(google_stock)

# We join the Apple stock to all_stocks
all_stocks = all_stocks.join(apple_stock)

# We join the Amazon stock to all_stocks
all_stocks = all_stocks.join(amazon_stock)

# We display the all_stocks DataFrame
all_stocks.head()

# Describe the all_stocks DataFrame
all_stocks.describe()

# Print the coumn-wise count of NaN values, if any, in the all_stocks dataframe
all_stocks.isnull().sum()

# Remove any rows that contain NaN values. Do this operation inplace. 
all_stocks.dropna(axis=0, inplace = True)

# Print the average stock price for each stock
print('The average stock price for each stock is: \n', all_stocks.mean(), '\n')
# Note: You can get the same result by printing it individually one-by-one as `all_stocks['Google'].mean()`

# Print the median stock price for each stock
print('The median stock price for each stock is: \n', all_stocks.median(), '\n')

# Print the standard deviation of the stock price for each stock  
print('The standard deviation of the stock price for each stock  is: \n', all_stocks.std(), '\n')

# Print the correlation between stocks
print('The correlation between stocks is: \n', all_stocks.corr(), '\n')

# We compute the rolling mean using a 150-Day window for Google stock
rollingMean = all_stocks['Google'].rolling(150).mean()
rollingMean

# this allows plots to be rendered in the notebook


# We import matplotlib into Python
import matplotlib.pyplot as plt


# We plot the Google stock data
plt.plot(all_stocks['Google'])

# We plot the rolling mean ontop of our Google stock data
plt.plot(rollingMean)
plt.legend(['Google Stock Price', 'Rolling Mean'])
plt.show()

이 글 공유하기:

이것이 좋아요:

좋아하기 가져오는 중...
%d 블로거가 이것을 좋아합니다: