2 Managing Financial Data
In this chapter we will learn how to download/import financial time series data from various sources. Most importantly we will use the quantmod
library through the package tidyquant
to download financial data from a variety of sources. We will also mention other possibilities as well as the fact that one can always import and ‘.xlsx’ (Excel) file.
2.1 Getting Data
2.1.1 quantmod
and the tidyquant
package
The tidyquant
package comes with a variety of readily compiled data sets and allows access to several different data sources. We start this chapter by loading the tidyverse
and the tidyquant
package. Basically tidyquant
provides four different types of commands:
- Get a Stock Index using
tq_index()
or data form a Stock Exchange usingtq_exchange()
- Get Quantitative Data using
tq_get()
- Transmute data (change in place) using
tq_transmute()
and Mutate data (add new variable) by usingtq_mutate()
. These functions provide functionality that integratesxts
,zoo
,quantmod
,TTR
and thePerformanceAnalytics
packages. - Do some performance analysis using
tq_performance()
and aggregate portfolios usingtq_portfolio()
.
The tidyquant
package includes several vignettes to help us get up to speed quickly:
-
Introduction to
tidyquant
withvignette("TQ00-introduction-to-tidyquant")
-
Core Functions in tidyquant with
vignette("TQ01-core-functions-in-tidyquant")
-
R Quantitative Analysis Package Integrations in
tidyquant
withvignette("TQ02-quant-integrations-in-tidyquant")
-
Scaling and Modeling with
tidyquant
withvignette("TQ03-scaling-and-modeling-with-tidyquant")
-
Charting with
tidyquant
withvignette("TQ04-charting-with-tidyquant")
-
Performance Analysis with
tidyquant
we will take a closer look at Performance Analysis in section 3.3
We start by loading the relevant packages and then employ tq_exchange_options()
to give us a list of all stock exchanges available through the package.
library(tidyverse)
library(tidyquant)
tq_exchange_options()
#> [1] "AMEX" "NASDAQ" "NYSE"
In a second step we download data from all three exchanges and have a brief look. We start with data from AMEX7, followed by NASDAQ8 and the most famous NYSE9
amex <- tq_exchange("AMEX")
nasdaq <- tq_exchange("NASDAQ")
nyse <- tq_exchange("NYSE")
# amex
amex
# nasdaq
nasdaq
# nyse
nyse
#> # A tibble: 274 x 7
#> symbol company last.sale.price market.cap country ipo.year industry
#> <chr> <chr> <dbl> <dbl> <chr> <int> <chr>
#> 1 AAMC Altisource As~ 25.7 52827918 United S~ NA "Investme~
#> 2 AAU Almaden Miner~ 0.398 54682731 Canada 2015 ""
#> # ... with 272 more rows
#> # A tibble: 4,503 x 7
#> symbol company last.sale.price market.cap country ipo.year industry
#> <chr> <chr> <dbl> <dbl> <chr> <int> <chr>
#> 1 AACG ATA Creativi~ 2.68 84959184 China NA Service to ~
#> 2 AADI Aadi Bioscie~ 29.4 611603870 United ~ NA Biotechnolo~
#> # ... with 4,501 more rows
#> # A tibble: 3,135 x 7
#> symbol company last.sale.price market.cap country ipo.year industry
#> <chr> <chr> <dbl> <dbl> <chr> <int> <chr>
#> 1 A Agilent Techn~ 174. 52655580872 "United~ 1999 Electrica~
#> 2 AA Alcoa Corpora~ 48.8 9124674397 "" 2016 Metal Fab~
#> # ... with 3,133 more rows
In a next step, we show a list of all indices that are available through the package using tq_index_options()
:
tq_index_options() # find all indices available
#> [1] "DOW" "DOWGLOBAL" "SP400" "SP500" "SP600"
We download the holdings for all five indices10. To not having to download all individual index holdings, we make use of map()
from the purrr
package contained in the tidyverse
. Thereby we apply the function tq_index()
to each vector element and save the output in a list. In a second step, we name the different list elements using names()
so that collapsing the list using bind_rows()
and specifying the name of the .id variable as ‘IndexID’ gets us the name of each index in an extra .id column. From each of the five indices (DOW11, DOWGLOBAL12, SP40013, SP50014 and SP60015) we show the first two lines using group_by()
and slice()
. For all indices we find the following variables: Stock symbol, company name, identifier and the current weight in the index.16
indexholdings <- c("DOW","DOWGLOBAL","SP400","SP500","SP600") %>% map(tq_index)
names(indexholdings) <- c("DOW","DOWGLOBAL","SP400","SP500","SP600")
indexholdings %>% bind_rows(.id = "IndexID") %>% group_by(IndexID) %>%
slice(1:2)
#> # A tibble: 10 x 9
#> # Groups: IndexID [5]
#> IndexID symbol company identifier sedol weight sector shares_held
#> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl>
#> 1 DOW UNH UnitedHealth~ 91324P10 29177~ 0.0790 Health C~ 5714596
#> 2 DOW GS Goldman Sach~ 38141G10 24079~ 0.0752 Financia~ 5714596
#> 3 DOWGLOBAL MT-NL ArcelorMitta~ BYPBS6 BYPBS~ 0.0132 Materials 38443
#> 4 DOWGLOBAL MTN-ZA MTN Group Li~ 656320 65632~ 0.0124 Communic~ 142262
#> 5 SP400 CGNX Cognex Corpo~ 19242210 22082~ 0.00685 Informat~ 1551993
#> 6 SP400 MOH Molina Healt~ 60855R10 22127~ 0.00671 Health C~ 513214
#> # ... with 4 more rows, and 1 more variable: local_currency <chr>
Last, we get a list of all valid data sets (websites) that we can get data from by employing the tq_get()
command.
tq_get_options() # find all data sources available
#> [1] "stock.prices" "stock.prices.japan" "dividends"
#> [4] "splits" "economic.data" "quandl"
#> [7] "quandl.datatable" "tiingo" "tiingo.iex"
#> [10] "tiingo.crypto" "alphavantager" "alphavantage"
#> [13] "rblpapi"
Those are (at the time of writing this book):
- ‘stock.prices’: Get the open, high, low, close, volume and adjusted stock prices for a stock symbol from Yahoo Finance.
- ‘dividends’: Retrieves the dividends for a stock symbol from Yahoo Finance.
- ‘splits’: Get the split ratio for a stock symbol from Yahoo Finance.
- ‘stock.prices.japan’: Get the open, high, low, close, volume and adjusted stock prices for a stock symbol from Yahoo Finance Japan.
- ’economic.data*: Get economic data from FRED.
- ‘quandl’: Get data sets from Quandl. Requires a valid api-key, see
quandl_api_key
. - ‘quandl.datatable’: Get data tables from Quandl.
- ‘tiingo’: Get data sets from Tiingo. Requires a valid api-key, see
tiingo_api_key
. - ‘tiingo.iex’: Get data sets from IEX, The Investors Exchange through Tiingo.
- ‘tiingo.crypto’: Get cryptocurrency prices collected by Tiingo.
- ‘alphavantager’: Get data sets from Alpha Vantager. Requires a valid api-key, see
av_api_key
. - ‘rblpapi’: Get data sets from Bloomberg. Requires a valid connection to a Bloomberg terminal, see
Rblpapi::blpConnect()
.
The data set we will be using in the following consists of the ten largest stocks within the S&P500 that had an IPO before January 2000. Therefore we need to merge both data sets using inner_join()
because we only want to keep symbols from the S&P500 that are also traded on NYSE or NASDAQ:
stocks.selection <- indexholdings %>% bind_rows(.id = "IndexID") %>%
filter(IndexID=="SP500") %>% select(-IndexID) %>%
inner_join(rbind(nyse,nasdaq) %>%
select(symbol,last.sale.price,market.cap,ipo.year),by=c("symbol")) %>% # join data sets
dplyr::filter(ipo.year<2000,!is.na(market.cap)) %>% # filter years with ipo<2000 or ipo=NA
arrange(desc(weight)) %>% # sort in descending order
slice(1:10)
stocks.selection
In a next step, we download stock prices from Yahoo Finance by using tq_get()
. Data from that source usually comes in the OHLC format (open,high,low,close) with additional information (volume, adjusted). In plain text this means the first (open) and last (close) as well as the highest (high) and lowest (low) traded price of the day. Volume shows the physical number of shares traded of that stock today, whereas adjusted is adjusted for dividends using the methodology from the Center for Research in Security Prices. All variables are adjusted for stock splits. To better understand dividend adjustment, we could check
#> # A tibble: 1 x 3
#> # Groups: symbol [1]
#> symbol date value
#> <chr> <date> <dbl>
#> 1 MSFT 2004-11-15 3.08
#> # A tibble: 3 x 8
#> # Groups: symbol [1]
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 MSFT 2004-11-12 30.2 30.2 29.8 30.0 162269000 19.2
#> 2 MSFT 2004-11-15 27.3 27.5 27.2 27.4 104468000 19.5
#> 3 MSFT 2004-11-16 27.3 27.3 27.0 27.1 64522600 19.3
where one clearly sees the paid dividend to reduce the OHLC values, without having an impact on the adjusted price.
So we download data for all ten stocks as well as the S&P500-index for the time period 2000-01-01 to 2020-12-31.
stocks.prices <- stocks.selection$symbol %>%
tq_get(get = "stock.prices",from = "2000-01-01",to = "2020-12-31") %>%
group_by(symbol)
index.prices <- "^GSPC" %>%
tq_get(get = "stock.prices",from = "2000-01-01",to = "2020-12-31")
#> # A tibble: 20 x 8
#> # Groups: symbol [10]
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AAPL 2000-01-03 0.936 1.00 0.908 0.999 535796800 0.858
#> 2 AAPL 2000-01-04 0.967 0.988 0.903 0.915 512377600 0.786
#> 3 ADBE 2000-01-03 16.8 16.9 16.1 16.4 7384400 16.3
#> 4 ADBE 2000-01-04 15.8 16.5 15.0 15.0 7813200 14.9
#> 5 AMZN 2000-01-03 81.5 89.6 79.0 89.4 16117600 89.4
#> 6 AMZN 2000-01-04 85.4 91.5 81.8 81.9 17487400 81.9
#> # ... with 14 more rows
Dividends and stock splits can also be downloaded:
stocks.dividends <- stocks.selection$symbol %>%
tq_get(get = "dividends",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
#> # A tibble: 6 x 3
#> # Groups: symbol [3]
#> symbol date value
#> <chr> <date> <dbl>
#> 1 AAPL 2012-08-09 0.00338
#> 2 AAPL 2012-11-07 0.00338
#> 3 ADBE 2000-03-23 0.00156
#> 4 ADBE 2000-06-30 0.00156
#> 5 CSCO 2011-03-29 0.06
#> 6 CSCO 2011-07-05 0.06
stocks.splits <- stocks.selection$symbol %>%
tq_get(get = "splits",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
#> # A tibble: 6 x 3
#> # Groups: symbol [3]
#> symbol date value
#> <chr> <date> <dbl>
#> 1 AAPL 2000-06-21 0.5
#> 2 AAPL 2005-02-28 0.5
#> 3 AAPL 2014-06-09 0.143
#> 4 ADBE 2000-10-25 0.5
#> 5 ADBE 2005-05-24 0.5
#> 6 CSCO 2000-03-23 0.5
A variety of (professional) data services are integrated into tidyquant
(through quantmod
) which I will list in the following subsections:
2.1.2 Quandl
Quandl provides access to many different financial and economic databases. To use it, one should acquire an api key by creating a Quandl account.17 Searches can be done using quandl_search()
(I personally would use their homepage to do that). Data can be downloaded as before with tq_get()
, be aware that you can download either single time series or entire datatables with the arguments get = "quandl"
and get = "quandl.datatable"
. Note that in the example for ‘Apple’ below, the adjusted close prices are different from the ones of Yahoo. An example for a datatable is Zacks Fundamentals Collection B.
quandl_api_key("enter-your-api-key-here")
quandl_search(query = "Oil", database_code = "NSE", per_page = 3)
quandl.aapl <- c("WIKI/AAPL") %>%
tq_get(get = "quandl",
from = "2000-01-01",
to = "2019-12-31",
column_index = 11, # numeric column number (e.g. 1)
collapse = "daily", # can be “none”, “daily”, “weekly”, “monthly”, “quarterly”, “annual”
transform = "none") # for summarizing data: “none”, “diff”, “rdiff”, “cumul”, “normalize”
#> # A tibble: 3 x 14
#> symbol date open high low close volume ex.dividend split.ratio
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 WIKI/AAPL 2000-01-03 105. 112. 102. 112. 4783900 0 1
#> 2 WIKI/AAPL 2000-01-04 108. 111. 101. 102. 4574800 0 1
#> 3 WIKI/AAPL 2000-01-05 104. 111. 103 104 6949300 0 1
#> # ... with 5 more variables: adj.open <dbl>, adj.high <dbl>, adj.low <dbl>,
#> # adj.close <dbl>, adj.volume <dbl>
2.1.3 Alpha Vantage
Alpha Vantage provides access to a real-time and historical financial data. Here we also need to get and set an api key (for free). Then we can download either time series data specifying av_fun="TIME_SERIES_DAILY_ADJUSTED"
or intraday data (av_fun="TIME_SERIES_INTRADAY"
and interval="5min"
) via tq_get()
.
av_api_key("enter-your-api-key-here")
alpha.aapl <- c("AAPL") %>%
tq_get(get = "alphavantager",
av_fun="TIME_SERIES_DAILY_ADJUSTED") # for daily data
alpha.aapl.id <- c("AAPL") %>%
tq_get(get = "alphavantager",
av_fun="TIME_SERIES_INTRADAY", # for intraday data
interval="5min") # 5 minute intervals
#> # A tibble: 5 x 10
#> symbol timestamp open high low close adjusted_close volume
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AAPL 2021-05-07 131. 131. 129. 130. 130. 78973273
#> 2 AAPL 2021-05-10 129. 130. 127. 127. 127. 88071229
#> 3 AAPL 2021-05-11 124. 126. 123. 126. 126. 126142826
#> 4 AAPL 2021-05-12 123. 125. 122. 123. 123. 112172282
#> 5 AAPL 2021-05-13 125. 126. 124. 125. 125. 105861339
#> # ... with 2 more variables: dividend_amount <dbl>, split_coefficient <dbl>
alpha.aapl.id <- c("AAPL") %>%
tq_get(get = "alphavantager",
av_fun="TIME_SERIES_INTRADAY", # for intraday data
interval="5min") # 5 minute intervals
#> # A tibble: 5 x 7
#> symbol timestamp open high low close volume
#> <chr> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AAPL 2021-09-28 11:45:00 143. 143. 143. 143. 863237
#> 2 AAPL 2021-09-28 11:50:00 143. 143. 143. 143. 1028675
#> 3 AAPL 2021-09-28 11:55:00 143. 143. 142. 143. 1216676
#> 4 AAPL 2021-09-28 12:00:00 143. 143. 143. 143. 794039
#> 5 AAPL 2021-09-28 12:05:00 143. 143. 142. 142. 1299687
2.1.4 FRED (Economic Data)
A large quantity of economic data can be extracted from the Federal Reserve Economic Data (FRED) database. Below we download the 1M- and 3M- risk-free-rate for the US. Note that these are annualized rates!
#> # A tibble: 6 x 3
#> # Groups: symbol [2]
#> symbol date price
#> <chr> <date> <dbl>
#> 1 TB1YR 2020-09-01 0.13
#> 2 TB1YR 2020-08-01 0.13
#> 3 TB1YR 2020-07-01 0.15
#> 4 TB3MS 2020-09-01 0.11
#> 5 TB3MS 2020-08-01 0.1
#> 6 TB3MS 2020-07-01 0.13
2.1.5 Bloomberg and Datastream
Bloomberg is officially integrated into the tidyquant
-package, but one needs to have Bloomberg running on the terminal in use. Refinitiv’s Datastream is not integrated but has a nice R
-Interface in the package DatastreamDSWS2R
. However, you need to have the Refinitiv Datastream Web Services API licensed! In this case the package allows for convenient retrieval of data. If this is not the case, then you have to manually retrieve your data using the Excel-Client, save it as “.xlsx”-file, and import using readxl::read_xlsx()
from the readxl
-package.
2.1.6 Fama-French Data (Kenneth French’s Data Library)
To download Fama-French data in batch there is a package FFdownload
that I maintain and that now can be installed via install.packages("FFdownload")
from CRAN or install its development version via remotes::install_github("sstoeckl/ffdownload")
from my github page. Currently you can either download all data or skip the (large) daily files using the command exclude_daily=TRUE
or specify an inputlist
of file names that the package will try to match to the data sets of the website. For a detailed reference on how to use the package please see this article on my website. The result is a list
of time series in xts
format having the following structure: ‘FFdata$dataset$timeframe$filecontent’.
The data sets (factors and factor portfolios) provided on Kenneth French’s data library all relate to research in asset pricing conducted by Kenneth French and Nobel laureate Eugene Fama. The most famous of their papers specify the famous three-factor model containing the market factor as well as a size and value factor (Fama and French 1992) and the five-factor model that adds profitability and investment (Fama and French 2014). One additional factor that has gained fame and can also be downloaded on their website is the momentum factor by Carhart (1997).
library(FFdownload)
library(timetk)
FFdownload(output_file = "data/FFdata_20201007.RData", # output file for the final data set
inputlist = c("F-F_Research_Data_Factors","F-F_Research_Data_Factors_daily_CSV","F-F_Momentum_Factor","F-F_Momentum_Factor_daily_CSV"),
exclude_daily = TRUE) # exclude daily data
load(file = "data/FFdata_20201007.RData")
factors <- FFData$`x_F-F_Research_Data_Factors`$monthly$Temp2 %>%
tk_tbl(rename_index="date") %>% # make tibble
left_join(FFData$`x_F-F_Momentum_Factor`$monthly$Temp2 %>%
tk_tbl(rename_index="date"), by ="date") %>%
mutate(date=as.Date(date, frac=1)) %>% # make proper month-end date format
gather(key=FFvar,value = price,-date) # gather into tidy format
factors %>% group_by(FFvar) %>% slice(1,n())
#> # A tibble: 10 x 3
#> # Groups: FFvar [5]
#> date FFvar price
#> <date> <chr> <dbl>
#> 1 2000-01-31 HML -1.89
#> 2 2020-12-31 HML -1.43
#> 3 2000-01-31 Mkt.RF -4.74
#> 4 2020-12-31 Mkt.RF 4.63
#> 5 2000-01-31 Mom 1.92
#> 6 2020-12-31 Mom -2.26
#> # ... with 4 more rows
2.1.7 Cryptocurrency Data
Cryptocurrency data could be downloaded from Coinmarketcap.com using my crypto2
-package that can be installed via install.packages("crypto2")
from CRAN or install its development version via remotes::install_github("sstoeckl/crypto2")
from my github page. Available functions are
-
crypto_list()
a list of all coins that are listed as either being active, delisted or untracked according to the CMC API documentation -
crypto_info()
a list of all information available for available coins according to the CMC API documentation -
crypto_history()
the most powerful function of this package that allows to download the entire available history for all coins covered by CMC according to the CMC API documentation -
fiat_list()
a mapping of all fiat currencies (plus precious metals) available via the CMC API documentation -
exchange_list()
a list of all exchanges available as either being active, delisted or untracked according to the CMC API documentation -
exchange_info()
a list of all information available for all given exchanges according to the CMC API documentation
Be aware, that crypto currencies (CCs) are very volatile, risky and not controlled by any authority. A large number of coins has been de-listed from Coinmarketcap.com, which often led to a total loss for all investors in that crypto currency.
We start by downloading a list of all active crypto currencies.
library(crypto2)
# List all active coins
coins <- crypto_list(only_active=TRUE)
#> # A tibble: 6,881 x 8
#> id name symbol slug rank is_active first_historica~ last_historical~
#> <int> <chr> <chr> <chr> <int> <int> <date> <date>
#> 1 1 Bitcoin BTC bitcoin 1 1 2013-04-28 2021-09-29
#> 2 2 Litecoin LTC litecoin 16 1 2013-04-28 2021-09-29
#> 3 3 Namecoin NMC namecoin 716 1 2013-04-28 2021-09-29
#> 4 4 Terracoin TRC terracoin 2062 1 2013-04-28 2021-09-29
#> 5 5 Peercoin PPC peercoin 762 1 2013-04-28 2021-09-29
#> # ... with 6,876 more rows
From those crypto currencies, we select the 10 longest living coins and download additional information:
# retrieve information for all (the first 3) of those coins
coin_info <- crypto_info(coins,limit=3)
#> > Scraping crypto info
#> > Processing historical crypto data
#> # A tibble: 3 x 19
#> id name symbol category description slug logo subreddit notice
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 Bitcoin BTC coin "## **What Is ~ bitco~ https:~ bitcoin ""
#> 2 2 Litecoin LTC coin "## What Is Li~ litec~ https:~ litecoin ""
#> 3 3 Namecoin NMC coin "Namecoin (NMC~ namec~ https:~ namecoin ""
#> # ... with 10 more variables: date_added <chr>, twitter_username <chr>,
#> # is_hidden <int>, date_launched <lgl>,
#> # self_reported_circulating_supply <lgl>, self_reported_tags <lgl>,
#> # status <dttm>, tags <list>, urls <list>, platform <lgl>
[TBD]
2.2 Manipulate Data
A variety of transformations can be applied to (financial) time series data. We present some examples while merging together our stock file with the index, the risk-free rate from FRED and the Fama-French-Factors.
Doing data transformations in tidy data sets is either called via transmute()
(change variable/data set, only return calculated column) or mutate()
(add transformed variable). In the tidyquant
-package these functions are called tq_transmute()
and tq_mutate()
, because they simultaneously allow changes of periodicity (daily to monthly) and therefore the returned data set can have less rows than before. The core of these functions is the provision of a mutate_fun
that can come from the the xts/zoo
, quantmod
(Quantitative Financial Modelling & Trading Framework for R) and TTR
(Technical Trading Rules) packages.
In a first step we show how to change the periodicity of the data (where we keep the adjusted closing price and the volume information). For this we make use of the mutate_fun
__to_period()__
where we specify the period
as (assuming daily data) ‘weeks,’ ‘months,’ ‘quarters’ or ‘years.’ The option index_at
allows us to select a final index and the corresponding format, namely ‘yearmon,’ ‘yearqtr’ (both formats only make sense with monthly/quarterly aggregation), ‘firstof,’ ‘lastof’ (specifying the first/last date of the period
), startof
or endof
(detailing the first last date in the data given the respective period
). select
chooses which input columns to use/keep, while name
overrides column-names. For certain aggregation times there exist ready-made commands to.weekly()
, to.monthly()
, to.quarterly()
and to.yearly()
. In the following we aggregate to weekly and monthly adjusted prices. Note, that the to-period()
functions always select the last observation per period.
stocks.prices %>%
tq_transmute(select = c(adjusted,volume), mutate_fun = to.weekly,
indexAt = "lastof") %>% slice(1,2) %>% head(4)
#> # A tibble: 4 x 4
#> # Groups: symbol [2]
#> symbol date adjusted volume
#> <chr> <date> <dbl> <dbl>
#> 1 AAPL 2000-01-07 0.763 460734400
#> 2 AAPL 2000-01-14 0.770 390376000
#> 3 ADBE 2000-01-07 16.1 8253200
#> 4 ADBE 2000-01-14 16.5 5136800
stocks.prices %>%
tq_transmute(select = c(adjusted,volume), mutate_fun = to.monthly,
indexAt = "yearmon") %>% slice(1,2) %>% head(4)
#> # A tibble: 4 x 4
#> # Groups: symbol [2]
#> symbol date adjusted volume
#> <chr> <yearmon> <dbl> <dbl>
#> 1 AAPL Jan 2000 0.795 701680000
#> 2 AAPL Feb 2000 0.879 368961600
#> 3 ADBE Jan 2000 13.7 16459200
#> 4 ADBE Feb 2000 25.3 6718800
From this we can either calculate returns per period manually or use the function periodReturn()
which allows to specify a period
that takes care of the last step while simultaneously calculating the desired return. There are two types of returns available using type
, the ‘arithmetic’ return and the ‘log’ return. Given \(P_t\) to be the price of an asset at time \(t\), we let \(p_t=ln(P_t)\) be its log-price.18
- Then the arithmetic or discrete return is calculated as: \(R_t=\frac{P_{t} - P_{t-1}}{P_{t-1}}=\frac{P_{t}}{P_{t-1}}-1\), while
- the log or continuously compounded return is: \(p_t - p_{t-1}=\left(\frac{P_t}{P_{t-1}}\right)\).
Both have their advantages, as the arithmetic return allows for easier cross-sectional aggregation (i.e. the formation of portfolios as weighted sum of returns) while the log-return allows for easier time series aggregation (i.e. the log-return in a month is just the sum of the daily log-returns). In the following we calculate monthly returns19 for the ten stocks and the index. We then merge the adjusted price and return information for each stock, while simultaneously merging the return of the S&P500 index as well as the 4 Fama-French-Carhart-Factors (Mkt.RF, SMB, HML, Mom) plus the risk-free interest rate (RF).
stocks.returns <- stocks.prices %>%
tq_transmute(select = adjusted, mutate_fun = to.monthly, indexAt = "yearmon") %>%
left_join(stocks.prices %>%
tq_transmute(select = adjusted, mutate_fun = periodReturn, period="monthly",
type="arithmetic", indexAt = "yearmon") %>%
dplyr::rename(return = monthly.returns),by = c("symbol","date")) %>%
left_join(index.prices %>%
tq_transmute(select = adjusted, mutate_fun = periodReturn, period="monthly",
type = "arithmetic", indexAt = "yearmon") %>%
dplyr::rename(sp500 = monthly.returns),by = c("date")) %>%
left_join(factors %>% mutate(price = price/100) %>% mutate(date=as.yearmon(date)) %>%
pivot_wider(names_from = FFvar, values_from = price),by="date")
stocks.returns %>% ungroup() %>% slice(1:2,(n()-1):n()) # show first and last entry
#> # A tibble: 4 x 10
#> symbol date adjusted return sp500 Mkt.RF SMB HML RF Mom
#> <chr> <yearmon> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AAPL Jan 2000 0.795 -0.0731 -0.0418 -0.0474 0.0579 -0.0189 0.0041 0.0192
#> 2 AAPL Feb 2000 0.879 0.105 -0.0201 0.0245 0.215 -0.0981 0.0043 0.182
#> 3 UPS Nov 2020 168. 0.0955 0.108 0.125 0.0565 0.0213 0.0001 -0.124
#> 4 UPS Dez 2020 164. -0.0283 0.0305 0.0463 0.0482 -0.0143 0.0001 -0.0226