HighFrequencyDataTools        package:fBasics        R Documentation

_T_o_o_l_s _f_o_r _F_X _H_i_g_h _F_r_e_q_u_e_n_c_y _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     A collection and description of functions for the  management of
     high frequency financial market time  series, especially for FX
     series collected from a  Reuters data feed. The collection
     includes functions  for the management of dates and times
     formatted in  the ISO-8601 string "CCYYMMDDhhmm", functions for 
     filter and outlier detection of high frequency FX  data records as
     collected from a Reuters data feed,  and functions which can be
     used to calculate log-prices,  log-returns, to extract subsamples,
     to interpolate  in time, to build business time scales, and to 
     de-seasonalize and de-volatilize high frequency  financial market
     data. 

     'CCYYMMDDhhmm' Dates and Times functions are:

       'xjulian'       Julian minute counts for 'CCYYMMDDhhmm' formats,
       'xdate'         'CCYYMMDDhhmm' from Julian minute counts,
       'xday.of.week'  day of week from 'CCYYMMDDhhmm' dates/times,
       'xleap.year'    Decides if 'CCYYMMDDhhmm' is a leap year or not.

     Filter and outlier detection functions are:     

       'fxdata.contributors'  Creates a table with contributor names,
       'fxdata.parser'        Parses FX contributors and delay times,
       'fxdata.filter'        Filters price and spread values from FX records,
       'fxdata.varmin'        Aggregates records to variable minutes format.

     Functions for De-Seasonalization and De-Volatilization:

       'xts.log'      Calculates logarithms for xts time series values,
       'xts.diff'     Differentiates xts time series values with lag=1,
       'xts.cut'      Cuts a piece out of a xts time series,
       'xts.interp'   Interpolates for equidistant time steps,
       'xts.map'      Creates a volatility adjusted time-mapping,
       'xts.upsilon'  Interpolates a time series in upsilon time,
       'xts.dvs'      Creates a de-volatilizised time series,
       'xts.dwh'      Plots intra-daily/weekly histograms.

_U_s_a_g_e:

     xjulian(xdates, origin = 19600101)
     xdate(xjulians, origin = 19600101)
     xday.of.week(xdates)
     xleap.year(xdates)

     fxdata.contributors(x, include = 10)
     fxdata.parser(x, parser.table)
     fxdata.filter(x, parameter = "strong", doprint = TRUE)
     fxdata.varmin(x, digits = 4)

     xts.log(xts)
     xts.diff(xts)
     xts.cut(xts, from.date, to.date)
     xts.interp(xts, deltat = 1, method = "constant")
     xts.map(xts, mean.deltat, alpha) 
     xts.upsilon(xts, weekly.map = seq(from = 59, by =60, length = 168), 
         method = "constant", doplot = TRUE, ...)
     xts.dvs(xts, k, volatility, doplot = TRUE, ...) 
     xts.dwh(xts, deltat = 60, period = "weekly", dolog = TRUE, 
         dodiff = TRUE, doplot = TRUE) 

_A_r_g_u_m_e_n_t_s:

   alpha: the scaling exponent, a numeric value. For a random walk this
          will be 2. 

  deltat: the time in minutes between interpolated data points,  by
          default 1 minute. 

  digits: an integer value, the number of digits for the 'BID' and
          'ASK' price. By default five. 

dolog, dodiff: two logicals. Should the logarithm of the input data be
          taken? Should the difference of the input data be taken?
          Note, if both 'dolog' and 'dodiff' are set to true,  the
          input data are expected to be price values. 

  doplot: a logical. Should a plot be displayed? 

 doprint: a logical, should the filter parameters be printed? 

from.date, to.date: ISO-8601 start and end dates, [CCYYMMDD]. 

 include: the contributors are sorted by frequency, the 'include'
          market makers are slected, an integer value. By default 10.) 

       k: sampling frequency, an integer value. Takes values on a
          sequence of the order of 10 data points. 

mean.deltat: the average size of the time intervals in minutes, an
          integer value. 

  method: a character string naming the interpolation method, either
          "linear" or "constant". 

  origin: the origin date of the counter, in ISO-8601 date format,
          [CCYYMMDDhhmm]. By default January 1st, 1960. 

parameter: a character string, either 'strong' or 'weak' denoting the
          filter parameter settings. 

parser.table: the table of contributors produced by
          'fxdata.contributors', a data.frame. In this table market
          leaders are marked. 

  period: a string, either "weekly", "daily" or "both" selecting the 
          type of the histogram. By default "weekly". 

volatility: average volatility, a numeric value. Takes values of the
          order of the variance of the time series data. 

weekly.map: an integer vector of time intervals, by default 168 hourly
          intervals, spanning one week. Volatility based maps can be
          created by the function 'xts.map'. 

       x: a 6 column standardized FX data frame with XDATE, DELAY, 
          CONTRIBUTOR, BID, ASKL and FLAG fields. 

  xdates: a numeric vector of ISO-8601 formatted Gregorian dates/times, 
           [CCYYMMDDhhmm]. 

xjulians: a numeric vector of Julian Minute Counts. 

     xts: a list with date/time 't' in ISO-8601 format, [CCYYMMDDhhmm],
          and data values 'x'. 

     ...: arguments to be passed. 

_D_e_t_a_i_l_s:

     *Date and Time Functions:* 

      Note, that the 'x*' indicates "extended" Date format including 
     Time management functionality, whereas in 'sjulian', 'sdate', 
     etc. the 's*' indicates "standard" or "simple" Date format, 
     handling days, months years and centuries. 

     *The Data Preprocessing Process:* 

      'fxdata.contributors' creates a contributor list from a  FX high
     frequency from a Reuters data feed and marks the market  leaders.
     'fxdata.parser' selects with the information from the contributors
     list data records from market leaders. As input serves a
     standardized high frequency data file. Then the function
     'fxdata.filter' filters the FX data records and finally the
     function 'fxdata.varmin' creates a "variable  minutes" formatted
     data file, i.e. all data records within the same  minute are
     averaged. The preprocessed data are the starting point for further
     investiuagtions. 

     *The Standardized FX high frequency data file structure:* 

      'x' is a standardized data frame with 6 columns. The first 
     column gives the dates/time 'XDATE' in ISO-8601 format 
     [CCYYMMDDhhmm], the second column is a measure for the feed 
     'DELAY', the third column denotes the 'CONTRIBUTOR'  code, the
     fifth and six columns are the 'BID' and 'ASK'  price, and the last
     column is an information 'FLAG', to  add additional information.  

     *The Contributor List:* 

      The output of the 'fxdata.contributors' function is used  as
     input for the function  'fxdata.parser', which allows to extract
     the contributors  marked as market makers in the output table. 

     *The Parser:* 

      The functions 'fxdata.parser' parses the data. The parser table,
     'parser.table', is a data frame with 4  columns: 'CONTRIBUTOR'
     denotes a code naming the contributor,  'COUNTS' gives the number
     of counts, how often the contributor  appeared in the file,
     'PERCENT' the same as a percent value, 'SELECT' denotes a logical
     valie, if TRUE the contributor  belongs to the group of the market
     makers, otherwise not. 

     *Variable Minutes Formatted Files:* 

      The function 'fxdata.varmin' creates data records within a
     variable minutes format. 

     *Log Prices and Log Returns:* 

      The function 'xts.log' is mainly used to create log-prices from 
     high frequency price records and the function 'xts.diff' is used
     to create log-returns.  

     *Subsamples:* 

      Th function 'xts.cut' is mainly used to create a subsample from
     data records. If the start and/or end date are out of the time
     range  the time series is simply forward/backward extrapolated
     with the first  and/or last value. 

     *Interpolation:* 

      The function 'xts.interp' is used to interpolate data records. 
     The method allows for two different kinds of interpolations,
     either  '"linear"' for a linear interpolation or '"constant"' for 
     a constant interpolation keeping the previous value in time (left 
     value) within the interpolation region. 

     *Business Time Maps:* 

      The function 'xts.map' is mainly used to create the time map 
     required by the  function 'xts.upsilon'. Important: The argument
     'xts' must  start on a Monday and end on a Sunday. Use 'xts.cut'
     to guarantue  this. 

     *De-Seasonalization:* 

      The function 'xts.upsilon' is  used to create data records with 
     volatility  adjusted time steps obtained from the "upsilon time"
     approach. These  time steps can be taken from the time map
     crreated by the function  'xts.map'. The data records are
     interpolated according to this  time schedule. 

     *De-Volatilization:* 

      The de-volatilization algorithm is based on Zhou's approach. The
     algorithm used by the function 'xts.dvs' reduces the sample
     frequency by keeping the variance of  the price changes constant,
     therefore the name "de-volatilization".  The procedure removes
     volatility by sampling data at different dates  for different
     times. When the market is highly volatile more data are sampled.
     Equivalently, the time is stretched. When the market is less
     volatile, less data are sampled. Equivalently, the time is
     compressed. Although the resulting subsequence has unequally space
     calendar date/time intervals, it produces an almost equally
     volatile time series. This time series is called a de-volatilized
     time series, or "dv-Series". 

     *Daily/Weekly Historgram Plots:* 

      Financial market data exhibit seasonal structures over the day or
      week. This can be made explicit by daily or weekly histogram
     plots  of the data using the function 'xts.dwh'.

_V_a_l_u_e:

     *Date and Time Functions:* 

     'xjulian' 
      returns a numeric vector of Julian minute counts. 

     'xdates' 
      returns a numeric vector of ISO-8601 formatted dates/times, i.e.
     [CCYYMMDDhhmm]. 

     'xday.of.week' 
      returns a numeric vector with entries  between '0' (Sunday) and 
     '6' (Saturday). 

     'xleap.year' 
      returns a logical vector with entries TRUE or FALSE, weather the
     date  falls in a leap year or not. 

     *Filter and Outlier Detection:* 

     'fxdata.contributors' 
      returns a dataframe with the following columns: 'CONTRIBUTOR', 
     the code naming the contributor, a character string; 'COUNTS', 
     the counts, how often the contributor appeared in the file, an
     integer; 'PERCENT', the same in percent, a numeric value;
     'SELECT',  a logical. If TRUE the contributor belongs to the group
     of the  'n' market makers, otherwise not. 

     'fxdata.parser', 'fxdata.filter' 
      return a data frame with the same structure as 'x', i.e. a 
     standardized FX high frequencey data file structure. 

     'fxdata.varmin' 
      return a data frame with the same structure as 'x', i.e. a 
     standardized FX high frequencey data file structure. The second
     column  named 'DELAY' is not used and set to zero for each data
     record. The  third column 'CONTRIBUTOR' is set to "MEAN", the
     method how the  variable minute record was evaluated. The last
     column 'FLAG' count the number of values from which the variable
     minute data  record was evaluated. 

     *De-seasonalization and de-volatilization:* 

     All functions beside 'xts.map' and 'xts.dwh' return a list with
     the following two components: 't', the date/time in ISO8601
     format, [CCYYMMDDhhmm], the same as the input data 'xts$t', 'x',
     the logarithmic values of the input data records 'xts$x',  a
     numeric vector. 

     'xts.map' 
      returns list with the following two components: 'xmap', a numeric
     vector with the time intervals, 'ymap', a numeric vector, the
     values to be mapped. 

     'xts.dws' 
       If 'daily' was selected, a list with the following two 
     components is returned:  'td', the daily histogram breaks, 'xd',
     the daily histogram freqencies.
      If 'weekly' was selected, a list with the following two 
     components is returned:  'tw', the weekly histogram breaks, 'xw',
     the weekly histogram freqencies,
      If 'both' was selected, a list with all four components  is
     returned.

_N_o_t_e:

     These functions were written originally for R Version 1.5. Only
     minor changes were made to make these functions available for
     Version 1.9. Date and time classes are outdated, but the functions
     are still working.

     The file 'fdax97m.csv' is too large and therefore not part of  the
     'fBasics' distribution. Please contact _inf@rmetrics.org_.

_A_u_t_h_o_r(_s):

     Diethelm Wuertz for the Rmetrics R-port.

_R_e_f_e_r_e_n_c_e_s:

     ISO-8601 (1988);  _Data Elements and Interchange Formats -
     Information Interchange, Representation of Dates and Time_,
     International Organization for Standardization, Reference Number
     ISO 8601, 14 pages.

     Zhou B. (1995); _Forecasting Foreign Exchange Rates Subject to
     De-volatilization_,  in: Freedman R.S., Klein A.R., Lederman J.
     eds., Artificial Intelligence  in the Capital Markets, Irwin
     Publishing, Chicago, p. 137-156.

     Guillaume D.M., Dacorogna M.M., Dave R.R., Mueller U.A., Olsen
     R.B.,  Pictet O.V. (1997);  _From the bird's eye to the
     microscope:  a survey of new stylized facts of the intra-daily
     foreign  exchange markets_,  Finance and Stochastics 1, 95-129.

_E_x_a_m_p_l_e_s:

     ## xjulian - 
        xmpBasics("\nStart: Julian Counts > ")
        # Return the number of minute counts for the last day in every 
        # month for year 2000 beginning January 1st, 2001 at 16:00:
        xjulian(c(20000131, 20000229, 20000331, 20000430, 20000531, 20000630,
          20000731, 20000831, 20000930, 20001031, 20001130, 20001231)*10000+1600,
          origin = 20000101)
          
     ## xdate - 
        xmpBasics("\nNext: Convert Julian Counts to Dates > ")
        # Return the number of minute counts for th
        # Manage Date/Time in Extended Date/Time Format, ISO-8601
        # Date: 1973-01-01 15:30
        xjulian(197301011530)
        print(xdate(xjulian(197301011530)), digits = 9)
       
     ## xday.of.week -
        # Calculate the day of week for 1973-01-01 16:15
        xmpBasics("\nNext: Compute Day of Week > ")
        xday.of.week(197301011615)
             
     ## xleap.year -
        xmpBasics("\nNext: Check for Leap Years > ")
        # Falls Februar 1st, 2000 16:15 in a leap year?
        xleap.year(200002011615)    
        
     ## fxdata.contributors - 
        xmpBasics("\nStart: Filter Contributors > ")
        # Print contributor list:
        data(usdthb)
        usdthb[1:25, ]
        # Create contributor list:
        fxdata.contributors(usdthb, include = 5)
        
     ## fxdata.parser - 
        xmpBasics("\nNext: Parse Records > ")
        # Parse data:
        # Create a contributor list and mark the first 5 market makers:
        parser.table = fxdata.contributors(usdthb, include=5)
        # Parse the market makers and print the first 25 entries:
        fxdata.parser(usdthb, parser.table)[1:25,]
        
     ## fxdata.filter - 
        xmpBasics("\nNext: Filter Records > ")
        # Filter data and plot unfiltered data:
        par(mfrow = c(2, 1))
        NumberOfRecords = length(usdthb[,1])
        NumberOfRecords
        plot(usdthb[,4], type="l", 
             xlab = "Tick Number from Reuters THB=", 
             ylab = "100*log(Bid[n]/Bid[1])      Bid",
             ylim = c(-20,30), main="USDTHB June 1997 unfiltered")
        lines(x=c(1, NumberOfRecords), y=rep(usdthb[1,4], 2), col = 4)
        lines(-100*log(usdthb[1,4]/usdthb[,4]))
        lines(x = c(1, NumberOfRecords), y = c(0, 0), col = 4)
        # Filter the data:
        usdthb = fxdata.filter(usdthb, parameter="strong")
        # Quick And Dirty Time Scaling
        Records = length(usdthb$accepted[,4])
        scale = NumberOfRecords/Records
        # Plot filtered data:
        plot(x=(1:Records)*scale, y=usdthb$accepted[, 4], type = "l", 
             xlab = "Tick Number from Reuters THB=", 
             ylab = "100*log(Bid[n]/Bid[1])      Bid", 
             ylim = c(-20,30), main="USDTHB June 1997 filtered")
        y = rep(usdthb$accepted[1, 4], 2)
        lines(x=c(1, NumberOfRecords), y=y, col=4)
        y = -100*log(usdthb$accepted[1, 4]/usdthb$accepted[, 4])
        lines(x = (1:Records)*scale, y=y)
        lines(x = c(1, NumberOfRecords), y = c(0, 0), col = 4)
        
     ## fxdata.varmin - 
        xmpBasics("\nNext: Variable Minute Records > ")
        # Variable Minute Records from filter accepted Data,
        # create a varmin file and print the first 25 entries:
        fxdata.varmin(usdthb$accepted, digits = 5)[1:25, ]  
        
     ## xts.log - 
        xmpBasics("\nStart: Log Prices of FX Data > ")
        # Calculate log-prices from AUDUSD bid prices
        options(digits = 10)
        data(audusd)
        prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"])
        # Print the first 25 entries:
        log.prices = xts.log(prices)
        as.data.frame(log.prices)[1:25, ]
        
     ## xts.diff - 
        xmpBasics("\nNext: Returns of FX Data > ")
        # Calculate one hourly AUDUSD log-returns
        prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"])
        # Calculate the returns and print the first 25 entries:
        data.frame(xts.diff(xts.log(prices)))[1:25, ]
        
     ## xts.cut - 
        xmpBasics("\nNext: Cut out a Piece From a FX File > ")
        # Retrieve the AUDUSD bid quotes for October 21, 1997, 16:00 
        prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"])
        # Retrieve prices and print the first 25 entries:
        data.frame(xts.cut(prices, from.date = 19971021, 
              to.date = 19971021))[1:25,]

     ## xts.interp - 
        xmpBasics("\nNext: Interpolate of FX Data > ")
        # Interpolate AUDUSD bid prices 
        # on a 15 minutes  time scale for October 21, 1997:
        prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"])
        # Interpolate the prices and print the first 25 entries:
        data.frame(xts.interp(prices, deltat = 15))[1:25, ]
        
     ## xts.map - 
        xmpBasics("\nNext: Create Business Time Map > ")
        options(object.size = 5e8)
        par(mfrow = c(2, 1))
        # Load and plot prices:
        data(fdax9710)
        index = list(t = fdax9710[,"XDATE"], x = fdax9710[,"FDAX"])  
        # Start on Monday - end on Sunday, 3 weeks:
        index = xts.cut(index, from.date=19971006, to.date=19971026)
        plot(index$x, type = "l", xlab = "Prices", main = "Prices in event time")   
        # Create hourly upsilon time map - start on Monday - end on Sunday:
        tmap = xts.map(index, mean.deltat = 60, alpha = 1.05)
        plot(x = tmap$xmap, y = tmap$ymap, ylim = c(0, max(tmap$x)), type="l", 
          main = "Time Mapping")   
        tmap 
        
     ## xts.upsilon -  
        xmpBasics("\nNext: De-seasonalize in Upsilon Time > ")
        index = list(t = fdax9710[,"XDATE"], x = fdax9710[,"FDAX"])  
        # Start on Monday - end on Sunday, 3 weeks:
        index = xts.cut(index, from.date = 19971006, to.date = 19971026)
        plot(index$x, type = "l", xlab = "Prices", main = "Prices in event time")   
        # Create hourly upsilon time map - start on Monday - end on Sunday:
        tmap = xts.map(index, mean.deltat = 60, alpha = 1.05)
        # Extract data records according to time map:
        index.ups = xts.upsilon(index, weekly.map = tmap$ymap, 
          main="Prices in Upsilon time")
         
     ## xts.dvs - 
        xmpBasics("\nNext: De-volatilize Time Series > ")
        index = list(t=fdax9710[,"XDATE"], x=fdax9710[,"FDAX"])  
        # Start on Monday - end on Sunday, 3 weeks:
        index = xts.cut(index, from.date=19971006, to.date=19971026)
        plot(index$x, type = "l", ylab = "Prices", main = "Prices in event time")    
        # Devolatilize Time Series With dv-Series Algorithm:
        index.dvs = xts.dvs(index, k = 8, 
          volatility = 13.15*var(diff(log(index$x))), main = "Prices from dv-series") 

     ## Not run: 
     ## xts.dws -
        xmpBasics("\nNext: Plot daily/weekly Charts > ")
        # NOTE:
        # The file this-is-escaped-code{ is too large and therefore not part 
        # of  this distribution. Please contact \emph{inf@rmetrics.org}.
        data(fdax97m)
        xts = list(t = fdax97m[,"XDATE"], x = fdax97m[,"FDAX"])
        # Start on Monday - end on Sunday, 3 weeks:
        xts = xts.cut(index, from.date = 19970106, to.date = 19971228)
        # Create Daily and Weekly Histograms:
        result = xts.dwh (xts, period = "both", dolog = TRUE, 
          dodiff = TRUE, deltat = 30, doplot = TRUE)
     ## End(Not run)      

