Time series prediction techniques have been used in many real-time world applications such as financial prediction, electric utility load forecasting, weather and environmental state prediction, And forecasting. The physicist system models and The series data, The processes are generally complex for these applications and The models for these systems are usually not known . The accuracy and unconfirmed estimation of the time series Dan produced by these systems cannot always be obtained using well known linear techniques, The estimation process requires more advanced time series prediction algorithms. This paper provides a survey of series prediction applications using a novel machine learning approach: The Support Vector those. The physicist


2. Nicholas I. Sapankevych, Raytheon, USA Ravi Sankar University of South Florida, USA Introduction T he purpose of this paper is to present a general survey of Support Vec- tor Machine (SVM) applications for time series prediction. This sur- vey is based on publications and information found in technical books and journals as well as other informative data sources such as SVM technology-oriented websites such as http://www.support-vector.net and http://www.kernel-machines.org. SVMs used for time series prediction span many practical application areas from financial market prediction to elec- tric utility load forecasting to medical and other scientific fields. Table 1 sum- marizes the number of SVM time series prediction publications in this survey paper with respect to application: As noted in Table 1, the two predominant (published) research activities are financial market prediction and electric utility forecasting.There are several other applications listed-from control system applications, environment and weather forecasting and other applications involving non-linear processes. It should be noted that the focus of this survey is on the applications and the general numerical accuracy of the SVM techniques associated with time series prediction. Where applicable, notes are made in this survey with respect © BRAND X PICTURES to the training methodologies used to “tune” the SVMs for specific applica- tions. Although training time (numerical computation time) is an important design criteria, most of the applications listed in this survey use data sets that are mainly static or change slowly with time (such as stock forecasting using Digital Object Identifier 10.1109/MCI.2009.932254 MAY 2009 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 25

3. TABLE 1 Number of SVM time series prediction publications of application(s) using SVM time series prediction methods, listed by application. 2) a brief discussion of the observations generated from this survey with respect to the technical merits and challenges asso- NUMBER OF ciated with SVM time series prediction, and 3) a resource for PUBLISHED PAPERS SUMMARIZED IN the reader to locate and research SVMs and their applications. APPLICATION THIS SURVEY FINANCIAL MARKET PREDICTION 21 Time Series Prediction Summary ELECTRIC UTILITY FORECASTING 17 The purpose of this section is to provide references and a gen- CONTROL SYSTEMS AND SIGNAL 8 eral outline for time series prediction theory. There are vast PROCESSING MISCELLANEOUS APPLICATIONS 8 amounts of technical references, books, and journal articles GENERAL BUSINESS APPLICATIONS 5 detailing time series prediction algorithms and theory for both ENVIRONMENTAL PARAMETER 4 linear and non-linear prediction applications. The reader is ESTIMATION MACHINE RELIABILITY FORECASTING 3 encouraged to research classical publications such as Orfanidis [1] and Kalman [2] for more details. daily closing prices). The reader is directed to the web based ref- Fundamentally, the goal of time series prediction is to esti- erences listed in the reference section of this survey to learn mate some future value based on current and past data samples. more about training techniques. Mathematically stated: Traditionally SVMs, as well as other learning algorithms such as Neural Networks, are used for classification in pattern recog- x^ 1 t 1 D t 2 5 f 1 x 1 t 2 a 2 , x 1 t 2 b 2 , x 1 t 2 c 2 , c 2 , (1) nition applications. These learning algorithms have also been applied to general regression analysis: the estimation of a function where, in this specific example, x^ is the predicted value of a (one by fitting a curve to a set of data points.The application of SVMs dimensional) discrete time series x. to general regression analysis case is called Support Vector The objective of time series prediction is to find a function Regression (SVR) and is vital for many of the time series predic- f (x) such that x^ , the predicted value of the time series at a future tion applications described in this paper. For comparison,Table 2 point in time is unbiased and consistent. Where i is an index to a below contrasts selected attributes and challenges associated with discrete time series value and N is the total number of samples. It some of the most common classic methods, artificial neural net- should be noted that another measure of a predictor’s goodness work (ANN) based time series prediction methods, and SVR: is efficiency as related to bias. The Cramér-Rao bound provides A more detailed performance summary of intelligent “tools” the lower bound for the variance of unbiased estimators [1]. If the (i.e., ANN based methods) for time series prediction, specifically estimator achieves this bound, then it is said to be efficient. This financial market time series prediction applications, can be found analysis was not provided in any of the papers summarized in in Table 1 of [48]. this survey. The primary objective of this paper is to provide a survey of Estimators generally fall into two categories: linear and non- SVM time series prediction literature and data sources accom- linear. Over the past several decades, a vast amount of technical panied by the following: 1) a brief summary of the broad range literature has been written about linear prediction: the estimation TABLE 2 Summary of advantages and challenges of classical, ANN based, and SVR time series prediction methods. TIME SERIES PREDICTION METHOD ADVANTAGES CHALLENGES AUTOREGRESSIVE FILTER CAN BE COMPUTATIONALLY EFFICIENT FOR LOW ASSUMES LINEAR, STATIONARY PROCESSES ORDER MODELS CAN BE COMPUTATIONALLY EXPENSIVE FOR CONVERGENCE GUARANTEED HIGHER ORDER MODELS MINIMIZES MEAN SQUARE ERROR BY DESIGN KALMAN FILTER COMPUTATIONALLY EFFICIENT BY DESIGN ASSUMES LINEAR, STATIONARY PROCESSES CONVERGENCE GUARANTEED ASSUMES PROCESS MODEL IS KNOWN MINIMIZES MEAN SQUARE ERROR BY DESIGN MULTI-LAYER PERCEPTRON NOT MODEL DEPENDENT NUMBER OF FREE PARAMETERS LARGE NOT DEPENDENT ON LINEAR, STATIONARY SELECTION OF FREE PARAMETERS USUALLY PROCESSES CALCULATED EMPIRICALLY CAN BE COMPUTATIONALLY EFFICIENT (FEED NOT GUARANTEED TO CONVERGE TO OPTIMAL FORWARD PROCESS) SOLUTION CAN BE COMPUTATIONALLY EXPENSIVE (TRAINING PROCESS) SVM/SVR NOT MODEL DEPENDENT SELECTION OF FREE PARAMETERS USUALLY NOT DEPENDENT ON LINEAR, STATIONARY PROCESSES CALCULATED EMPIRICALLY GUARANTEED TO CONVERGE TO OPTIMAL SOLUTION CAN BE COMPUTATIONALLY EXPENSIVE SMALL NUMBER OF FREE PARAMTERS (TRAINING PROCESS) CAN BE COMPUTATIONALLY EFFICIENT 26 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2009

4.of a future value based on the linear combination of past and The scale factor l is commonly referred to as the regularization present values. Real world time series prediction applications constant and this term is often referred to as the capacity control generally do not fall into the category of linear prediction. term. Its function is to reduce “over-fitting” of data and mini- Instead, they are typically characterized by non-linear models. mize bad generalization effects.The empirical risk is defined as: 1 N21 Support Vector Machines for Time Series Prediction Remp 1 f 2 5 a L1x1i2, y1i2, f 1x1i2, w2 2, (5) N i50 Support Vector Machines and Support Vector Regression are based on statistical learning theory, or VC theory (VC – where, i is an index to a discrete time series t 5 5 0, 1, Vapnik, Chervonenkis), developed over the last several 2, c, N21 6 and y(i ) is the “truth” data (training set) of the decades. Many books, journal publications, and electronic predicted value being sought. L(.) is a “loss function” or “cost references currently exist. The reader is directed to Vapnik’s function” to be defined. reference books [3, 4] and an introductory reference book Two of the more common loss functions that are used are by Cristianini/Shawe-Taylor [5] for further study. Brief, the e-insensitive loss function defined by Vapnik and the general descriptions of Vapnik’s learning theory and SVM quadratic loss function typically associated with Least Squares regression can be found in references [6-8]. Finally, many Support Vector Machine (LS-SVM). The details of the LS-SVM publicly available websites exist (at the time this paper was development can be found in [9, 10]. To solve for the optimal written) that also offer an extensive amount of information weights and minimize the regularized risk, a quadratic program- and software for SVMs. See references [94-104]. ming problem is formed (using the e-insensitive loss function): The Support Vector Machine (SVM), developed by Vapnik and others in 1995, is used for many machine learning tasks n 1 such as pattern recognition, object classification, and in the case minimize 7 w 72 1 C a L1y1i2, f 1x1i2, w2 2 2 i51 of time series prediction, regression analysis. Support Vector Regression, or SVR, is the methodology by which a function where is estimated using observed data which in turn “trains” the SVM. This is a departure from more traditional time series pre- L(y1i2, f 1x1i2, w2 diction methodologies in the sense there is no “model” in the 0 y 1 i 2 2 f 1 x 1 i 2 , w 2 0 2e if 0 y 1 i 2 2f 1 x 1 i 2 , w 2 0 $ P strict sense – the data drives the prediction. 5e (6) 0 otherwise. Given a set of time series data x(t ), where t is a series of N discrete samples: t 5 5 0, 1, 2, c, N21 6 , and y(t 1 D) is Equation (9) is referred to as the regularized risk function. The some predicted value in the future (t greater than or equal constant “C” also includes the (1/N ) summation normalization to N ). For a time series prediction algorithm, equation (1) factor and e is the “tube size,” referring to the precision by which defines a function f (x) that will have an output equal to the the function is to be approximated. It should be noted both e predicted value for some prediction horizon. By using and C are both user defined constants and are typically computed regression analysis, equations (2) and (3) both define these empirically. It is inherently assumed that a function f (x) actually prediction functions for linear and non-linear regression exists and the optimization problem is feasible; however, errors applications respectively: may have to be accepted to make the problem feasible. To account for errors, “slack variables” are typically introduced. f 1x2 5 1w # x2 1 b (2) Solving for the optimal weights and bias values is an exercise f 1 x 2 5 1 w # f 1 x 2 2 1 b. (3) in convex optimization, which is made much simpler by using Lagrange multipliers and forming the dual optimization prob- If the data is not linear in its “input” space, the goal is to map lem given by (7): the data x(t ) to a higher dimension “feature” space, via w(x) (referred to as a Kernel Function), then perform a linear regres- 1 N Maximize: 2 1 ai2a *i 2 1 aj2a *j 2 8x 1 i 2 , x 1 j 2 9 sion in the higher dimensional feature space [11]. 2 i,a j51 The goal is to find “optimal” weights w and threshold b as N N well as to define the criteria for finding an “optimal” set of 2 e a 1 ai2a *i 2 1 a y 1 i 2 1 ai2a *i 2 weights. First is the “flatness” of the weights, which can be i51 i51 measured by the Euclidean norm (i.e. minimize 7 w 7 2). Second N is the error generated by the estimation process of the value, Subject to: a 1 ai 2 a *i 2 5 0: ai, a *i [ 3 0, C 4 . (7) i21 also known as the empirical risk, which is to be minimized. The overall goal is then the minimization the regularized risk The solution for the weights is based on the Karush-Kuhn- Rreg( f ) (where f is a function of x(t )) as defined as: Tucker conditions that state at the point of the optimal solution, the product of the variables and constraints equal zero. Thus the l approximation of the function f (x) is given as the sum of the Rreg 1 f 2 5 Remp 1 f 2 1 7 w 7 2. (4) 2 optimal weights times the dot products between the data points as: MAY 2009 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 27

5. N description of the algorithm is given in detail in [5, 13]). f 1 x 2 5 a 1 ai 2 a *i 2 8x, x 1 i 2 9 1 b. (8) This is the most popular among the various methods i51 available that are described in the application summaries. Those data points on or outside the e tube with non-zero It is beyond the scope of this survey paper to analyze and Lagrange multipliers a are defined as Support Vectors. As can compare training algorithms, but the algorithms play an be seen, the optimal weights associated with having non-zero important role in the implementation of the SVR for Lagrange multipliers is typically less than the entire data set, practical applications and are mentioned in the summaries meaning one does not need the entire data set to define f (x). of these applications. The sparseness of this solution is one of several advantages of using this methodology. Financial Data Time Series Prediction To carry out the non-linear regression using SVR, it is nec- Using Support Vector Regression essary to map the input space x(i) into a (possibly) higher Of all the practical applications using SVR for time series pre- dimension feature space w(x(i )). Noting that the solution of diction, financial data time series prediction appears to be the the SVR relies on the dot products of the input data, a kernel most studied along with electrical load forecasting. Twenty one function that satisfies Mercer’s conditions can be generated as: research papers are listed in the references (in chronological order) detailing SVR applications for specifically predicting k 1 x, xr 2 5 8f 1 x 2 , f 1 xr 2 9, (9) stock market index (time series) values and miscellaneous financial market time series data. The inherent noisy, non-sta- which can be directly substituted back into equation (8) tionary and chaotic nature of this type of time series data and the optimal weights w can be computed in feature appears to lend itself to the use of non-traditional time series space in exactly the same fashion. There are several kernel prediction algorithms such as SVR. Many different variations functions that satisfy Mercer’s conditions (required for the of SVR and combinations of SVRs with other learning tech- generation of kernel functions) such as Gaussian, polyno- niques are found for financial time series prediction and are mial, and hyperbolic tangent. The use of kernels is the key summarized below. in SVM/SVR applications. It provides the capability of Trafalis and Ince [28] compared SVR to more traditional mapping non-linear data into “feature” spaces that are Neural Network architectures including the feed forward mul- essentially linear, where the optimization process can be tilayer perceptrons using back propagation and radial basis func- duplicated as in the linear case. The use of Gaussian kernels tions for the prediction of stock price indices. Using the appears to be the most prevalent choice, but typically e-insensitive loss function and several different quadratic opti- empirical analyses are necessary in selection of an appropri- mization algorithms, the authors demonstrated the SVR’s supe- ate kernel function. SVR and its derivation are described in rior performance over the other NN based applications for a detail in publications found in [11-27], especially Smola and very small time window of three samples and a very small Schölkopf [13]. prediction horizon of one sample. The resulting SVR architecture is given below in Figure 1 Tay and Cao [29] studied the use of SVR for predicting (reproduced here from Figure 2 in [13]). five specific financial time series sources including the S&P There are several Quadratic Programming (QP) meth- 500 and several foreign bond indices. The results were com- ods that can be used for training SVMs and most of the pared to a feed-forward MLP using back propagation for a algorithms are publicly available (see the text references prediction horizon of five samples (days). The data was [3-5] and the general web based references found in [80- “pre-processed” by applying an exponential moving-aver- 87]). The Sequential Minimization Optimization (SMO) age window and outliers (identified as data beyond two algorithm is one of the most popular methods used for standard deviations) were replaced with relatively close val- solving the QP problem (developed by Platt in 1999—a ues. The data was broken down into three sets: training set, validation set, and test set (typical for neural network train- ing methodologies). The SVR significantly outperformed ∑ Output ∑ αl k (x, xi) + b the BP NN. They conclude that the ability of the SVR to appropriately fit the data (as compared to over-fitting issues α1 α2 ... αl Weights related to MLP based NNs), is one key reason for better performance. They published several other related SVR (·) (·) ... (·) Dot Product (Φ(x) · Φ(xi)) = k(x, xi) applications for financial data time series prediction [31, Φ(x1) Φ(x2) Φ(x) Φ(xl) Mapped Vectors Φ(xi), Φ(x) 32, 33, 39]. An alternative architecture using a “mixture of experts (ME)” approach is presented in [31]. This is a two 7 4 ... 1 Support Vectors x1 . . . xl stage approach with the first stage being a self-organizing Test Vectors x feature map (SOM) and the second stage containing a set 1 of SVR “experts”. The parameters for the kernel functions FIGURE 1 SVM architecture. used, such as C and e, were essentially derived empirically 28 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2009

6.and the overall approach was shown to have not only better prediction perfor- Signal processing, control and communications systems mance as compared to a single SVR applications face an additional challenge of being highly approach, but also superior convergence speed. In [32], Tay and Cao proposed a sensitive to computation timing, as expected in real modified version of SVR for financial time signal processing applications. ser ies prediction called C-ascending SVMs. The goal of this approach is to weight the most cur- Jones Industrial Average (DJIA) was improved vs. using stan- rent (in time) e-insensitive errors and de-weight the more dard AR and RBF models. It is worth pointing out that they distant ones – analogous to the discounted least squares presented a similar discussion of the use of asymmetrical approach. Both linear and exponential weighting functions margin determination for SVR based on the standard devia- were tested against several stock indices including the S&P tion of the data in [35]. A more thorough discussion of this 500. They conclude that better performance (for five sam- approach can be found in H. Yang’s thesis [37]. In [44], the ple prediction horizon) can be obtained using this method same authors propose a two phase SVR training method for as compared to a standard SVR implementation. They pro- detecting outliers in the data, thus reducing the prediction posed another adaptive approach and modification to the error (RMSE and MAE in this case). In contract to, the SVM – the e-Descending Support Vector Machine [33]. “upside” (eu ) and “downside” (ed ) margins are adaptable. Instead of the regularization constant changing with time, They extend their asymmetric margin approach in [34] to the tube width was varied with time and was weighted be adaptable relative to the slack variables (also time depen- exponentially with the most recent data points being penal- dent). The results showed a small increase in prediction per- ized the most. Every training data point will use a different formance at the price of retraining the SVR. tube size e. For both simulated data (weighted sinusoids) Abraham et al. [36] compared the one-step ahead time and financial data sets (stock indices including S&P 500), a ser ies prediction perfor mance of an ANN using the better overall performance in NMSE was found using the e Levenberg-Marquardt algorithm, SVM, Takagi-Sugeno neu- -DSVM with a five sample prediction horizon. In [39], ro-fuzzy model and a Difference Boosting Neural Network Cao and Tay proposed the Adaptive SVM (ASVM) which (DBNN). Only a brief description of SVM for classification modifies both the tube size e (see [33]) and the regulariza- applications was provided. For one step ahead prediction of tion constant C (see [32]). Increasing e will decrease the the Nasdaq-100 index and the NIFTY index, the SVM per- number of support vectors (support vectors in SVR are the formed marginally better. In [41], Abraham and Yeung points on or outside the e tube – the larger e, the smaller extended the work in [36] by combining the outputs of the the number of support vectors). The decrease in support four approaches (ANN using the Levenberg-Marquardt vectors represents a more sparse solution, with a tradeoff in algorithm, SVM, Takagi-Sugeno neuro-fuzzy model and a prediction accuracy. The more recent time samples were Difference Boosting Neural Network (DBNN)). The com- given more weight and had greater influence on the solu- bining of the four outputs is done in two ways: 1) a direct tion. As compared to a weighted back propagation MLP, approach by using source selection using the lowest absolute the ASVM showed better performance for five selected error of the four methods as the decision criteria and 2) by stock indices. using a Genetic Algorithm (GA) to optimize the RMSE, Van Gestel et al. [30] proposed the use of an LS-SVM used MAP, and MAPE (see [41] for fitness function specifics of in a Bayesian evidence framework. Both a point time series the GA). Again using the Nasdaq-100 index and the NIFTY prediction and volatility models for financial stock index pre- index, the one-step-ahead prediction of these intelligent diction are developed in this paper. A marginal improvement in paradigm approaches showed that the direct approach out- MSE, MAE, and Negative Log Likelihood (NLL) was found performed the GA approach. using this method compared to other traditional methods such Ongsritrakul and Soonthronphisaj [38] combined several as auto regressive models using US short term T-bill and approaches including MLPs, decision trees, and SVRs to pre- DAX30 market data. dict gold price. The decision tree feeds “factors” into the SVR Yang et al. [34] proposed a non-fixed and asymmetrical time series prediction process which then serves as input to a margin, along with the use of momentum, to improve the linear regression model, an MLP, and an SVR to predict the SVR’s ability to predict the financial time series. The gold price. The MSE, MAD, and MAPE were computed for e-insensitive loss function is modified to have different the three models and the SVR appears to outperform the other “upside” and “downside” margins (eu and ed ) based on the two models. standard deviation of the input data. The “margin” is the lin- Liang and Sun [40] proposed an adaptive method for modi- ear combination of this standard deviation and the momen- fying the kernel function during the training of an SVR. Using tum term. By applying these time varying parameters to the a Gaussian RBF kernel, the authors proposed to modify the loss function, the authors showed that the MAE for one step kernel function based on the method of information geometry. ahead prediction of the Hang Seng Index (HSI) and Dow Using a conformal mapping, a new method was introduced MAY 2009 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 29

7.which improves the precision of forecasting. An optimal parti- other methods with respect to the root mean square error tion algorithm (OPA) was used to modify the kernel, making (RMSE) except the proposed GCL method. There is a com- the kernel data dependent. Results using S&P 500 time series prehensive table in this reference that shows twenty alternative data as well as Shanghai Stock Exchange (CISSE) data were methods for stock prediction methodologies that have been presented and compared to an unmodified SVR. published within the last ten years. Kim [42] used an SVM for prediction of the Korea com- posite stock price index (KOSPI), not an SVR. By selecting General Business Applications Using Support twelve “technical indicators” (i.e. features), he used the SVM to Vector Regression for Time Series Prediction predict the direction of the daily price change in the stock Following are summaries of five papers that describe the use price. This is a slightly different application in the sense that the of SVR for time series prediction relative to the following SVM only predicts daily direction, not the actual index price general business applications: (1) Electricity Price Forecasting, itself (an SVR application). As compared to a three layer MLP (2) Credit Rating Analysis, (3) Customer “Churning” – Auto using back propagation and a case-based reasoning (CBR – in Insurance Market Prediction, (4) Financial Failure of this application, a nearest-neighbor approach implementation), Dotcoms – Financial Analysis, and (5) Production Value Pre- the SVM approach provided better results than the other two diction of the Taiwanese Machinery Industry. approaches in predicting index direction of change. Sansom et al. [49] compared the performance of an SVR vs. Bao et al. [43] proposed the use of SVR using the an MLP for predicting Australian national Electricity Market e-insensitive loss function, an RBF kernel function and predict, Prices one week ahead (seven samples). Using 90 days of five days (samples) ahead, the stock price of Haier, Inc. (Shang- electricity forecasting prices, they showed that an SVR outper- hai Stock Exchange). The procedure and normalization of the formed an MLP in training time, but obtained similar results in data is very similar to the work of Cao and Tay. accuracy (MAE) which may have been due to the way training Similar to Kim [42], Huang et al. [45] proposed the use of data was selected and used. an SVM for predicting the direction of the NIKKEI 225 index Huang et al. [50] investigated the use of an SVM to estimate based on several inputs including interest rates, CPI, and other corporate credit ratings. This is not a traditional SVR applica- market attribute data. They compared this performance to lin- tion, but an SVM classification problem. They used two data ear discriminate analysis (LDA), quadratic discriminant analysis sets, Taiwan and US corporate rating data, for training both an (QDA), and an Elman back propagation neural network SVM with 21 variables as input and an ANN using back prop- (EBNN). In addition, they proposed a “combining model” that agation. The authors stated that the SVM outperformed the BP weights the output of the SVM with the other classification results for both data sets. methods to produce a prediction. The authors stated that, indi- Hur and Lim [51] compared an SVM for predicting cus- vidually, the SVM performed the best (highest “hit” ratio) and tomer “churn” ratio for auto insurance market prediction. In the “combined model” performed slightly better than the SVM. this application, “churn” represents a customer changing from Bao et al. [46] proposed a Fuzzy Support Vector Machines one auto insurance company to another due to the increased Regression (FSVMR) method for predicting the stock com- accessibility of customers to “on-line” insurance vendors. As in posite index of the Shanghai Stock Exchange. They stated that the business application for Huang et al. [50], this application for two state classification problems, some of the input data does not use an SVR methodology. Fifteen variables were points are corrupted by noise and should be (possibly) discard- selected as input to an SVM, and the SVM was trained to pre- ed while others that are marginal but important should be dict the “churn” ratio (see [51]). SVM was shown to outper- assigned to a class. Using a e-insensitive loss function and an form ANN. RBF kernel function, the FSVMR was trained using a cross- Bose and Pal [52] analyzed the fate of failed dotcoms using validation method to find the variable parameters. They showed an SVM (as in Huang et al. [50] and Hur and Lim [51]), not an that the NMSE using this approach was effective than stand SVR. The goal is to train the SVM to determine if the dotcom alone SVR approaches. would succeed (a “1” classification) or fail (a “0” classification) Cao et al. [47] used an SVR using the e-insensitive loss based on using twenty four “financial ratios” such as total debt function to make one step ahead predictions for the British to total assets ratio, net income to total assets ratio, etc. as the Pound and American Dollar exchange rate. By selecting the vector input to the SVM. The SVM parameters were deter- optimal parameters empirically using a validation data set, the mined empirically and the results show that it was easier to authors stated that this method performed well, but resulting classify a survived dotcom company than a failed one. offsets (time shifts) moves the regression curve to the right Pai and Lin [53] discussed the use of an SVR for predicting (noted by the author for future study). The last financial paper the one-step ahead production values of the Taiwanese machin- by Quek and Ng [48] described a Genetic Complementary ery industry. They compared the performance of the standard Learning (GCL) method for stock market prediction. Although SVR using the e-insensitive loss function and Gaussian kernel the focus of this paper is on GCL, Quek made financial time to a Seasonal Auto-Regressive Integrated Moving Average series performance measurements against other traditional NN (SARIMA) method and a general regression neural network approaches including the SVM, and SVM performed as well as (GRNN). The SARIMA model (developed by Box and 30 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2009

8.Jenkins) was used to model long time period variations such as seasonal dependencies. Of all the practical applications using SVR for time series Using the MAE, MAPE, RMSE and NMSE prediction, financial data time series prediction as performance metrics, they showed that the SVR approach outperformed the other appears to be the most studied along with electrical two methods, especially in the MAE and load forecasting. MAPE metrics. the optimization problem of the LS-SVR is essentially a Environmental Parameter Estimation Using Support matrix calculation. Vector Regression for Time Series Prediction Prem and Srinivasa Raghavan [57] applied SVR in use SVR has been used for the prediction of environmental with the Network Weather Services – a “grid” of computa- parameters such as air quality parameters rainfall estimation tional nodes used for weather prediction. Their goal was to and detection, and weather forecasting. The following sec- optimize the network parameters such that the final weather tion summarizes the four papers for environmental parame- forecast output is the most accurate given the constraints of ter estimation: the computational architecture and network topology (i.e., Lu et al. [54] proposed the use of SVR to forecast air quali- QoS). By accurately predicting the need for different ty parameters. The determination of short term air quality resources required, the overall system can adapt more effi- relies on the use of non-linear regression methods such as ciently and provide better forecasting results (essentially, this SVR. The input data was respirable suspend particles (RSP) as is a dynamic scheduling problem for providing and main- collected with other major pollutants such as nitrogen oxides, taining Weather Prediction Services). As compared to other etc. Using SMO for training and a Gaussian kernel function AR methods, the SVR outperformed the other methods, with arbitrarily selected parameters, the SVR performance especially in multi-step ahead prediction of CPU time and (MAE) for predicting one week ahead outperformed an RBF network bandwidth. network using the same data set. A sensitivity analysis was pro- vided for the free parameters (regularization constant, kernel Electric Utility Load Forecasting Applications Using constants, etc.) and there was no set heuristic for determining Support Vector Regression for Time Series Prediction these parameters. A non-linear prediction problem found in power systems Wang et al. [56] extended their environmental pollution research is the forecasting of electrical power consumption prediction work from [54] and compare an SVR approach demands by consumers. There are many beneficial aspects to to an adaptive radial basis function (ARBF) network and an the accurate prediction of electrical utility load forecasting ARBF using principle component analysis (PCA). Again, including proper maintenance of electrical energy supply, the they tried to forecast respirable suspended particulate (RSP) efficient utilization of electrical power resources, and the proper concentrations with a 72 hour forecast horizon. The free administration and dissemination of these resources as related parameters of the SVR were determined empirically using to the cost of these resources to the consumer. Seventeen MAE, RMSE, and Wilmott’s index of agreement (WIA) as research papers concerning electricity load forecasting are metrics for indicating the most accurate predictions. For the summarized below: presented data, the SVM outperformed the ARBF network Chang et al. [58] proposed an SVR approach for the and the ARBF/PCA network for the three day ahead pre- EUNITE Network Competition which is the prediction of diction horizon. They implied, as in [54], the challenge daily maximal electrical load of January 1999 based on tem- remains to find a suitable heuristic to determine the free perature and electricity loading data from 1997 to 1998. It is parameters of the SVR. interesting to note that there is clearly a periodic component Trafalis et al. [55] applied both SVR and Least Squares within the data set due to the seasonal variation of consumer (LS) SVR methodologies to predict rainfall estimation and electricity demand, “holiday” effects (use of less electricity the presence of rain using WSR-88D radar data (also known during major holidays), and the impact of weather on elec- as NEXRAD). For the rainfall rate estimation problem, the tricity demand. Their inputs were several attributes, including LS-SVR using a polynomial kernel (refer to [55] as well as binary attributes for indicating which day of the week it is, is work from Suykens [9, 10] for more information on LS- it a holiday, etc. From these attributes, they formulated the SVR) outperformed the SVR using a Gaussian RBF kernel predicted max load, which is a numerical value. They con- and a linear regression technique in the MSE. For the detec- cluded that the use of the temperature data did not work as tion of rainfall (SVM type classification problem), they men- well because of the inherent difficulty in predicting tempera- tioned that the SVR slightly outperformed the LS-SVR for ture and they also concluded that this SVR approach was fea- the detection (classification of rain existence) of rain in a geo- sible for determining an accurate prediction model. Chen graphic grid. The authors pointed out that the use of the LS- et al. [62] approach described in [58] was the winning SVR loses the sparseness quality of the representation of the approach for the EUNITE Network Competition. The paper solution as compared to the SVM, noting that the solution of describes the SVM implementation. With respect to the MAY 2009 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 31

9. associated with the electrical load forecast- SVMs used for time series prediction span many ing problem such as temperature, humidity, practical application areas from financial market and global solar radiation. Their selection of an RBF kernel is based on stated shortcom- prediction to electric utility load forecasting to medical ings of other kernel functions such as poly- and other scientific fields. nomial or sigmoid (complexity as example). The authors stated that this approach is design details, it is interesting to note that the use of tempera- superior to other NN based approaches in performance and ture in their model actually decreases the accuracy of their small model parameter selection. predictions, which they state as sensitivity to the variance of Bao et al. [64] proposed the use of a self-organizing map the output to improper temperature estimations. They experi- (SOM) along with an SVM to predict short term electrical mented with inputs excluding the previous (in time) load data load forecasts based on EUNITE competition data. The pur- and found poorer performance (note that the inputs to the pose of the SOM is to cluster training data, based on time sam- SVR are not only time series load data). ple (i.e. day) and correlate the same weather conditions found Mohandes [59] compared the results of a standard SVR on the training day(s) to the present day’s weather conditions. using a sigmoid kernel function and the e-insensitive loss func- In terms of performance, the authors stated that this hybrid tion to a standard autoregressive model of order one for short approach outperforms the SVM by itself. It should also be term forecasting electrical load forecasting (short term meaning noted that smoothing (preprocessing) the data, in their case, less than one week prediction horizon). The preprocessing of worsened the MAPE performance. the data included the elimination of annual periodicity and lin- Pai and Hong [65] proposed a Recurrent Support Vector early increasing trends from the data. The author showed that Machine with Genetic Algorithms (RSVMG) for the fore- the performance of the SVR, with respect to the RMSE, was casting of electrical loads. The Genetic Algorithms (GA’s) much lower than the AR method, especially as the number of were used to determine the free parameters of the SVMs, spe- training points was increased. cifically the regularization constant (C), the tube size (e), and The electricity supply industry (ESI), generator companies, the Gaussian kernel parameter s. A recurrent SVM (RSVM) and other electrical utility load entities depend on load fore- was detailed as one of their approaches, which uses a standard casting to maximize revenues. So, Sansom and Saha [60] pro- MLP with back propagation combined with the SVM archi- posed the use of an SVM and not SVR for forecasting the tecture. The output of the ANN was fed back (recurrent) to wholesale (spot) electricity prices and compares performance the inputs of the MLP prior to the SVM architecture. The to a linear regression (as a sum of sine waves) and an ANN authors compared the RSVMG approach to the SVMG trained using back propagation. In their research, the inputs to model and the ANN model and results show, with respect to these prediction methodologies were a set of 14 “attributes” MAE, the superior performance of using the GA approach to including spot price at different previous time samples, require select model parameters as well as the introduction of feed- capacity, etc. Using the SVM approach appeared to work well back (recursion) into the NN architecture. as compared to the other methodologies, but the authors stated Ji et al. [66] proposed the use of mutual information (MI - that this approach, under certain circumstances where selected the computation of Shannon’s entropy) to select the “best” data points (attributes) were removed from the problem, per- input variables, i.e., the data points that maximize MI. Then, an formed far worse than the other approaches with respect to LS-SVM is trained to make the prediction up to six samples MAE. They mentioned that the superior SVM performance ahead. The first two-thirds of the data set (Poland Electricity with all the data may have been “luck” and recommended fur- Dataset) were used to train the LS-SVM. There were two ther research. methodologies compared, “direct” forecast where the predic- Tian and Noore [61] proposed the use of an SVM to pre- tion horizon is calculated directly from N samples and “recur- dict a wide span of forecast horizons (hourly and several days sive” forecast where one-step ahead prediction is calculated up ahead) of electrical load using measurements from Berkeley, to the desired prediction horizon (six samples in this case). The California. Their modeling also takes into account temperature authors stated that direct prediction performed better in MSE and humidity as factors into training the SVM as well as a nor- than recursive. malization of the data to a range of 3 0, 1 4 . As compared to a Zhang [67] discussed the use of SVM for short-term load cosine radial basis function neural network and a feed forward forecasting. The author stated that most linear models such as neural network, the SVM approach using electrical load, tem- Kalman filtering, AR, and ARMA models are not typically suf- perature and humidity outperformed the other methods in ficient to model the nonlinearities associated with short term MSE, RMSE, and Durbin-Watson d statistic. load forecasting processes. The use of SVR, with both electrical Dong et al. [63] discussed the use of SVM to predict “land- load data and corresponding weather time series data, appears lord energy consumption” – the electrical load necessary for to outperform other NN based techniques including a back- large commercial buildings to operate normally (use of air con- propagation neural network. The author also used cross valida- ditioning, elevators, etc.). Their work considers other factors tion to select the free parameters of the RBF kernel function as 32 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2009

10.well as the regularization constant. The MAPE of the SVM approach was lower General business applications of SVR span credit rating than that of the BPNN. analysis to auto insurance market prediction to production Li et al. [68] proposed the use of both rate prediction. SVR and a “Similar Day Method” to predict the next day forecasting of electrical loads. The purpose of the “Similar Day Method” was to identify days Similar to the work by Pai and Hong [65], Hsu et al. [72] where the sampled data (electrical load, weather, etc.) is similar described an alternative genetic algorithm based approach to to the present day and use this information to “amend” the selecting SVR parameters (GA-SVR). Using the real-valued result given by the SVR result. This essentially “corrects” the genetic algorithm (RGA), They designed a GA-SVR using the output of the SVR. The authors stated that this method is an same data as used in the EUNITE Network competition as effective short term load forecasting method as compared to described in Chang et al. [58, 62]. The GA was used to adap- using SVM alone. tively select the regularization parameter and the sigma value of Pai and Hong [69] discussed the use of SVM for short term the Gaussian kernel function. Using MAPE (the same metric load forecasting using a simulated annealing (SA) algorithm, used for the EUNITE competition), RMSE, and Max Error which is based on the annealing process of material physics, to metrics, the authors showed that the use of a genetic algorithm select the SVM parameters. The simulated annealing algorithm to adaptively select the parameters of the SVR outperformed combined with SVR is called SVMSA. Essentially, initial values the winners of the EUNITE competition [58, 62]. of s, C, and e (the free parameters associated with the kernel He et al. [74] proposed a novel hybrid algorithm for short function and the loss function) are set and the SA algorithm term electrical load forecasting. They proposed using an selects a “provisional” state by randomly adjusting these free ARIMA model to estimate the linear portion of the electrical parameters. A repetition of this procedure is executed until a load time series data and an SVM to estimate the nonlinear final state is found where the MAPE of the SVM is found to residual, where the residual is the difference between the load be at some acceptable level. This technique is compared to the data and the linear estimation. The underlying assumption was ARIMA and general regression neural network (GRNN) [53] that the system model can be divided (equally) into a sum of and for Taiwanese electricity load data, this technique signifi- a linear and non-linear representation. Using MAPE as the cantly outperformed the two other methods. accuracy criteria, the single sample prediction horizon results Wu and Zhang [70] presented a hybrid of several were several percentage points better than the time series approaches for forecasting electrical load. Based on the model by itself. assumption that the electrical load data exhibits both chaotic and periodic behavior, they employed wavelet transforms as Machine Reliability Forecasting Applications Using well as an average mutual information (AMI) algorithm (based Support Vector Regression for Time Series Prediction on chaos theory) along with a Least Squares Support Vector Three papers are summarized below for the prediction of Machine (LS-SVM) to predict the maximal electrical load of machine reliability from mechanical vibration time series sig- EUNITE competition data. Based on other EUNITE publi- nals, automotive related reliability measures, and engine reliabil- cations of prediction results, this technique was claimed by the ity via prediction of MTBF using SVR. The prediction of authors to be superior in performance relative to MAPE and machine reliability is typically non-linear and several traditional maximal error (ME). The authors concluded that the selection (ARIMA as an example) and ANN approaches have been stud- of LS-SVM parameters is “tough” (i.e., assumed selected ied regarding this application; however, the use of SVR for this empirically). particular application has not been widely studied. Espinoza et al. [71] discussed an alternative solution to solv- Yang and Zhang [75] compared the use of an SVR and LS- ing Least Square Support Vector Machines using electrical load SVM vs. a back propagation neural network (BPNN), an RBF forecasting as an example application (note Suykens is a network, and a GRNN for predicting vibration time series sig- co-author for this publication). The goal was not to solve the nals related to the mechanical condition of machinery. For LS-SVM in dual space, but rather in primal space using eigen short term prediction (one step ahead prediction), the SVR value decomposition.The authors also stated that typically there using a Gaussian kernel outperformed all of the other methods are large data sets associated with these kinds of applications including the LS-SVM. For long term prediction (24 samples), and using a sparse representation of the data could provide the RBF network performed better with respect to the NMSE computational benefits. The entropy maximization was pro- as compared to the two SVM methods. Hong et al. [76] posed as one possible technique for generating subsamples of discussed the use of SVMG and RSVMG [65] models for the data. Using an RBF kernel function and cross-validation predicting the “period reliability ratio” for the automotive technique for parameter selection, the authors showed that the industry based on time series data containing vehicle damage maximum MSE found was less than 3% for one hour and 24 incidents and the number of damages repaired. For one-step hour ahead prediction. The authors present a more detailed ahead forecasting, the RSVMG model outperformed ARIMA, version of [71] in [73]. BPNN, ICBPNN and SVMG (no feedback) methods with MAY 2009 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 33

11. described in this paper could be used to SVR has been used for the prediction of environmental predict time series data specifically. parameters such as air quality parameters rainfall Huang and Cheng [80] proposed two different algorithms for admission control estimation and detection, and weather forecasting. and traffic scheduler schemes for internet web servers. The process of web client ser- respect to the RMSE. The key to this approach was the use of vicing is usually first come first serve, a technique that is not both a genetic algorithm and the use of feedback (recurrent well suited to handle “bursty” loads, which, in turn, can nega- network architecture) to aid in the selection of the free param- tively impact cost of internet sales vendors in the form of lost eters of the SVR. Hong and Pai [77] compared the SVR to transactions. The authors proposed a prediction mechanism to three other models (Duane, ARIMA, and GRNN) for the pre- forecast total maximum arrival rate and maximum average diction of engine failure. The authors noted that the prediction waiting time for priority groups using SVR and a fuzzy logic of engine failure is critical in both the repair and design process system. Using an event driven simulator, the authors showed of mechanical engines. The data set used as input was the significant increase in average throughput for two different pri- engine age at the time of unscheduled maintenance actions and ority task groups using SVR vs. the fuzzy logic system and the the outputs of the different models were the predicted engine legacy first come first serve paradigm. age of the next unscheduled maintenance action per mainte- Liu et al. [ 81] discussed methods to control plant nance period. The authors noted that the use of SVR exceeds responses and plant disturbances, treated as separate process, performance with respect to the NRMSE for all other models. using LS-SVM. The goal was to combine the plant output, which includes the plant disturbance, with the output of Control System and Signal Processing the LS-SVM (plant model approximation) to produce an Applications Using Support Vector estimate of the disturbance and fed back this estimate Regression for Time Series Prediction through an “inverse” LS-SVM to negate the disturbance via There are several research papers using SVR for time series the input of the actual plant. For a non-linear modeled prediction in the fields of control systems and signal processing. plant and a one-step-ahead prediction horizon, the authors These applications include: mobile position tracking, Internet successfully demonstrated the use of both SVR and an flow control, adaptive inverse disturbance cancelling, narrow- adaptive method for determining the free parameters of the band interference suppression, antenna beamforming, elevator SVM. The key aspect of this approach was the use of a traffic flow prediction, and dynamically tuned gyroscope drift Bayesian Evidence Framework for the adaptive selection of modeling. These diverse applications face the same nonlinear LS-SVM free parameters. prediction challenges as all of the other applications described Yang and Xie [82] proposed the use of SVR to reduce the in this survey. In addition, some of these applications face an effects of high-power narrowband interference (NBI) in spread additional challenge of being highly sensitive to computation spectrum systems. Adaptive filters used to solve this problem timing, as expected in real time signal processing applications. were time-domain nonlinear LMS adaptive filters (TDAF) and Summarized below are eight publications related to control frequency-domain nonlinear LMS adaptive filters (FDAF) theory and SVR time series prediction: which both have sensitivity to noise in estimating NBI. For this Suykens et al. [78] provided a detailed summary with real specific application, cross validation methods were too time- world (simplified) examples of non-linear control system theory costly to train the SVR and to determine the SVR free param- using Least Squares Support Vector Machines (LS-SVM). eters. Using a Gaussian kernel function, the authors noted that Important discussion topics related to closed loop control theory NBI suppression using SVR is a viable approach for NBI sup- such as local stability analysis were included. Several real world pression where computational time is a more critical aspect of examples were given: state space estimation for non-linear sys- this application. tem, inverted pendulum problem, and a ball and beam example. Ramon et al. [83] used an SVR approach to adaptively Gezici et al. [79] proposed the use of SVR to improve change antenna beam patterns (beamforming) in the presence the position estimation of users of wireless communications of interfering signals arriving at an arbitrary angle of arrival. devices. Multi-path, non-line-of-sight propagation, and mul- This particular application requires the use of complex variables tiple access interference are the main sources of geo-location (real and imaginary components of the objective function asso- error. They proposed the use of a two step process to esti- ciated with the signal weighting for the individual antenna ele- mate the position of the mobile user. First, an SVR (e ments) for the solution which required separate Lagrange -insensitive loss function and Gaussian kernel function) is multipliers for the real and imaginary components of the solu- used to predict an initial location. This process is followed tion. Because this is an adaptive beam forming problem, there is by a Kalman-Bucy (K-B) filter to refine the geo-location. also a computational time constraint. The authors used an alter- Although this application used the K-B filter for position native optimization method: the iterative reweighted least estimation, it is not a specific time series prediction applica- squares (IWRLS). Using a modified cost function (quadratic tion (rather, a tracking problem). However, the processes for data and linear for “outliers”), the authors demonstrated a 34 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2009

12.significant decrease in bit error rate (BER) as compared to a Ralaivola and d’Alche-Buc [87] discussed the modeling of minimum mean square error based algorithm. non-linear dynamic systems using SVR in conjunction with Luo et al. [84] proposed the use of an LS-SVM for the pre- Kalman filters. The discussion is based on the transformation diction of elevator traffic flow. ANNs have been used to study of the non-linear time series equation into a linear equation this problem and the LS-SVM was used here to improve the by the use of kernel functions. The authors proposed the use control system’s ability to predict traffic flow in order to of SVR to map the transformed data from the feature space improve elevator service quality. Using three different groups of back into the input space, noting that they use one SVR for elevator traffic data, the authors demonstrated the feasibility of each dimension of the input space (with the kernel trans- the LS-SVM for predicting traffic flow. There is a significant formed data as inputs to the SVR). Using both a one-step- computational tradeoff between the sparseness of the LS-SVM ahead prediction horizon as well as a 100 sample prediction solution compared to a standard SVR using other non- horizon for Mackey-Glass time series data and laser time quadratic loss functions and the computational complexity series data from the Santa Fe competition (see [87] for associated with the training of the LS-SVM. details), the authors showed results using both polynomial and Xu et al. [85] compared the use of an SVR using accumu- Gaussian kernel functions and state that this approach could lated generated operation (AGO) based on grey theory to an be comparable to other Kalman filtering approaches processes RBF neural network, a grey model, and a standard SVR to such as Extended Kalman Filters (EKF) or Unscented Kalman predict the drift of a dynamically tuned gyroscope. The AGO Filters (UKF). algorithm was used to pre-process the drift data in order to Ralaivola and d’Alche-Buc [88] extended their work from reduce noise and complexity of the original data set. Then, the [87] and proposed the Kernel Kalman Filter (KKF) where the SVM was trained and an inverse AGO algorithm (IAGO) was non-linear input space is transformed to a higher dimension applied after the SVM training to compute the model. A feature space by the use of kernels described in [87]. This B-spline kernel function was used for this application. As com- Kalman filtering method was used for both the filtering and pared to the RBF network, the AGO-SVM approach showed smoothing functions. The authors proposed the use of an superior performance in both the MAE and NMSE by almost Expectation-Maximization (EM) approach to determine the an order of magnitude. free parameters of the model, including the kernel parameters. Using Mackey-Glass data, Ikeda series data (laser dynamics), Miscellaneous Applications Using Support and Lorenz attractor data, the authors state the KKF accuracy Vector Regression for Time Series Prediction in the RMSE sense are superior to the MLP and SVR predic- There are eight other research papers describing the use of tion methods for both one-step-ahead prediction as well as SVR for time series prediction that were not specifically multiple step ahead prediction. associated with any of the discussed categories in this survey. Chang et al. [89] proposed applying an SVR to an unsuper- One paper pertains to a biological neuron application, two vised learning problem, specifically the unsupervised segmenta- papers describe the use of SVR to Kalman filtering meth- tion of time series data. Unsupervised segmentation can be ods, an application of switching dynamics associated with applied to many time series applications such as speech recog- unsupervised segmentation, two papers on SVM application nition, signal classification, and brain data. The authors used the to natural gas load forecasting, transportation travel time SVR as one component of a “competing” SVM architecture estimation, and the use of particle swarm optimization which is based on an “annealed competition of experts” (ACE) (PSO) used in conjunction with SVR. The references [87, methodology. They were specifically, predicting weighting coef- 88, 89] do not actually use SVR to directly solve a time ficients based on error terms. Simulated chaotic time series data, series prediction problem, but rather embed the use SVR Mackey-Glass data, and Santa-Fe data were used as input to this into their respective approaches. The work presented in the methodology and the proposed architecture appears feasible for last paper is not specifically a financial time series prediction the prediction of non-linear time series data. application, although financial time series data was used to Liu et al. [90] applied SVR to natural gas load forecasting evaluate an alternative SVM training methodology. These including factors related to natural gas loading such as weather eight publications are summarized below. related parameters (i.e., temperature, etc.), day of the week, hol- Frontzek et al. [86] used SVR to learn the dynamics of bio- idays, etc. The results of using SVR to predict seven day ahead logical neurons of Australian crayfish. To model this biological load forecasts were compared to a multi-layer perceptron ANN neural network, they used time series data of a pyloric dilator using a self-organizing feature map (SOFM) and the SVR out- neuron and SVR, with the e-insensitive loss function and a performed the hybrid ANN approach by several percentage Gaussian kernel function, for one-step ahead prediction of points in the MAPE. The same authors examined natural gas these time series data and compared results to an RBF network. load forecasting [91] using a Least Squares Support Vector The authors concluded that the Gaussian kernel function out- Machine and compared to an ANN using SOFM. The LS- performed other kernels, the SVR approach “learned” faster SVM implementation had similar performance characteristics than the RBF networks, and more data points (i.e. support vec- as found in [90]. A commercial software package (Natural Gas tors) produced better results. Pipeline Simulation and Load Forecasting – NGPSLF) based MAY 2009 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 35

13. maturing this technology, as many of the The prediction of machine reliability is typically non- papers presented offer different approaches linear and several traditional (ARIMA as an example) for improving SVR performance through the adaptation of free parameters. and ANN approaches have been studied regarding this ❏ There are several choices for solving the application; however, the use of SVR for this particular convex optimization problem inherent in application has not been widely studied. the solution of the SVM. ❏ There are no measures of prediction uncertainty (i.e. covariance) associated on LS-SVM was developed and implemented specifically for with the predictions (note Relevance this application. Vector Machines address this issue, see Wu et al. [92] used an SVR to analyze and predict travel [105] as example). time for highway traffic. Travel time prediction is essential for For further information regarding this (and other) SVR travel information systems, including the estimation of en-route based techniques, the authors encourage readers to start with times. Using a Gaussian kernel function and a standard SVR the introductory papers, texts, and especially the publicly avail- implementation, their SVR showed improved RME and able websites [94-104]. These websites serve a valuable purpose RMSE results as compared to two other travel time prediction in advancing and disseminating this viable technology to the methods: current travel time prediction and historical mean scientific community and can act as a viable resource and data- prediction methods. base for current SVM/SVR applications. Zhang and Hu [93] proposed the use of Particle Swarm Optimization (PSO) for selecting certain features of data to Challenges Associated with Using reduce the inputs to an SVM (essentially data pruning). Also, SVR for Time Series Prediction the PSO was used to optimize the SVM free parameters as From the technical literature reviewed in this paper, there well. Using a financial time series data set as input (CBOT-US), appears to be several challenges associated with the use of SVR the authors showed that the PSO feature selection procedure in prediction of highly non-linear time series prediction appli- was comparable to other genetic feature selection algorithms in cations. Below is a summary of some of these technical chal- terms of minimizing the prediction error. The main advantage lenges and issues: demonstrated with this approach is the great improvement in 1) Selection of kernel function: The choice of kernel function computation time as compared to the other methodologies. appeared somewhat arbitrary, although the vast majority of the applications listed above use the Gaussian kernel. Some Discussion efforts empirically determine that the use of the Gaussian In the wide spectrum of time series prediction applications using kernel is superior to other kernel functions, but in general, SVR techniques, the fundamental reason for considering SVR there appears to be no formal proof of optimality. as an approach for time series prediction is the non-linear aspect 2) Free parameter selection: Some research has been done with of the prediction problem. This non-linear aspect of the applica- respect to adaptively changing the free parameters associ- tions is common throughout all of the discussed applications. ated with SVR training to improve prediction results, There are several broad observations and generalizations that can including the use of sophisticated genetic-based algo- be made based on the brief summaries presented in this paper. rithms. Again, there is no “optimal” method for adaptation Traditional (and more sophisticated) model-based tech- of the SVR parameters. niques generally do not perform as well as the SVR in predict- 3) Use of SVR in “real time” applications: For the vast amount ing time series generated from non-linear systems. This is based of the applications mentioned in this paper, all but two of on the fact that the SVR “lets the data speak for itself ” whereas them required some sort of “real time” computational the model-based techniques typically can not model the non- demands. There is very little mention of the computa- linear processes well. tional cost of deriving the results, most likely due to the ❏ Traditional Artificial Neural Network (ANN) based tech- static nature of the datasets being analyzed. niques such as Multi-Layer Perceptrons do not necessarily 4) Managing the trade space complexity of technical advantages: The perform as well as the SVR. This can be due to their inher- technical tradeoffs and nuances between the design of the ent limitation in not being able to guarantee a global mini- SVR system, the sparseness of the solution, the accuracy of mum for the optimization of the network. By design, the the solution, and the computational efficiency in finding the SVR guarantees this global minimum solution and are typi- solution have not been summarily defined in any of the cally superior in their ability to generalize. papers reviewed. Several of these aspects have been analyzed ❏ There is no predetermined heuristic for the choice of sever- together, but not all of them as a whole. al parameters and designs for the SVR – it appears to be 5) Selection of SVR optimization techniques: Several QP opti- very application specific (as well as individual designer spe- mization packages exist (publicly available) to train the cific). This appears to be one of the largest challenges in SVR and the SMO algorithm appears to be the most 36 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2009

14. popular. The reader is referred to the web based references [16] K. L. Lee and S. A. Billings, “Time series prediction using support vector machines, the orthogonal and regularized orthogonal least-squares algorithms,” Int. J. Syst. Sci., vol. of this survey paper for more information on a selected 33, no. 10, pp. 811–821, 2002. set of training methodologies. [17] L. Cao and Q. Gu, “Dynamic support vector machines for non-stationary time series forecasting,” Intell. Data Anal., vol. 6, no. 1, pp. 67–83, 2002. 6) Determining when to use the Least Squares SVM technique: [18] J.-Y. Zhu, B. Ren, H.-X. Zhang, and Z.-T. Deng, “Time series prediction via new LS-SVM approaches are sometimes more efficient to support vector machines,” in Proc. 1st Int. Conf. on Machine Learning and Cybernetics, Nov. 2002, vol. 1, pp. 364–366. implement at the expense of sparseness of the solution. [19] S. Rüping and K. Morik, “Support vector machines and learning about time,” 2003 LS-SVM did not always outperform the SVR approaches IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2003), Apr. 6–10, 2003, vol. 4, pp. IV-864–IV-867. in some of the listed applications. [20] L. J. Cao, “Support vector machine experts for time series prediction,” Neurocomput- 7) Selection of performance metrics and benchmarks: There are no ing, vol. 51, pp. 321–339, Apr. 2003. [21] Y. Mei-Ying and W. Xiao-Dong, “Chaotic time series prediction using least squares sets of metrics and benchmarks for SVR approaches, support vector machines,” Chin. Phys., vol. 13, no. 4, pp. 454–458, Apr. 2004. although several publicly available data sets are used to [22] J. M. Górriz, C. G. Puntonet, M. Salmerón, R. Martin-Clemente, and S. Hornillo- Mellado, “Using confidence interval of a regularization network,” in Proc. 12th IEEE compare performances. RMSE and MAPE are the most Mediterranean Electrotechnical Conf. (MELECON 2004), May 12–15, 2004, pp. 343–346. typical metric for goodness of the solution. [23] A. Hornstein and U. Parlitz, “Bias reduction for time series models based on support vector regression,” Int. J. Bifurcation Chaos, vol. 14, no. 6, pp. 1947–1956, 2004. [24] J. M. Górriz, C. G. Puntonet, M. Salmerón, and J. J. G. de la Rosa, “A new model for Conclusion time series forecasting using radial basis functions and exogenous data,” Neural Comput. Appl., vol. 13, no. 2, pp. 101–111, June 2004. Support Vector Machines/Support Vector Regression (SVR) are [25] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Stat. Com- powerful learning mechanisms that have been developed and put., vol. 14, no. 3, pp. 199–222, Aug. 2004. [26] Y.-F. Deng, X. Jin, and Y.-X. Zhong, “Ensemble SVR for prediction of time se- matured over the last 15 years. They provide a method for pre- ries,” in Proc. 4th Int. Conf. on Machine Learning and Cybernetics, Aug. 18–21, 2005, pp. dicting and forecasting time series for a myriad of applications 3528–3534. [27] J. An, Z.-O. Wang, Q. Yang, and Z. Ma, “A SVM function approximation approach including financial market forecasting, weather and environmen- with good performances in interpolation and extrapolation,” in Proc. 4th Int. Conf. on tal parameter estimation, electrical utility loading prediction, Machine Learning and Cybernetics, Aug. 18–21, 2005, pp. 1648-1653. [28] T. B. Trafalis and H. Ince, “Support vector machine for regression and applications to machine reliability forecasting, various signal processing and financial forecasting,” in Proc. IEEE-INNS-ENNS Int. Joint Conf. on Neural Networks 2000 control system applications, and several other applications (IJCNN 2000), vol. 6, July 24–27, 2000, pp. 348–353. [29] F. E. H. Tay and L. J. Cao, “Application of support vector machines in financial time detailed in this survey paper. Non-traditional time series predic- series forecasting,” Omega, vol. 29, pp. 309–317, 2001. tion methods are necessary for these types of applications due to [30] T. Van Gestel, J. A. K. Suykens, D.-E. Baestaens, A. Lambrechts, G. Lanckriet, B. Vandaele, B. De Moor, and J. Vandewall, “Financial time series prediction using least the highly non-linear aspects of the data and processes. squares support vector machines within the evidence framework,” IEEE Trans. Neural In conclusion, SVR research continues to be a viable Netw., vol. 12, no. 4, pp. 809–821, July 2001. [31] F. E. H. Tay and L. J. Cao, “Improved financial time series forecasting by combining approach in the prediction of time series data in non-linear sys- support vector machines with self-organizing feature map,” Intell. Data Anal., vol. 5, no. tems. Many methods and alternatives exist in the design of 4, pp. 339–354, 2001. [32] F. E. H. Tay and L. J. Cao, “Modified support vector machines in financial time series SVRs and a great deal of flexibility is left to the designer in its forecasting,” Neurocomputing, vol. 48, pp. 847–861, Oct. 2002. implementation.This survey presents a summary of these meth- [33] F. E. H. Tay and L. J. Cao, “e-descending support vector machines for financial time series forecasting,” Neural Process. Lett., vol. 15, no. 2, pp. 179–195, 2002. ods with their associated applications’ papers references. [34] H. Yang, I. King, and L. Chan, “Non-fixed and asymetrical margin approach to stock market prediction using support vector regression,” in Proc. 9th Int. Conf. on Neural Information Processing (ICONIP ’02), Nov. 18–22, 2002, vol. 3, pp. 1398–1402. References [35] H. Yang, L. Chan, and I. King, “Support vector machine regression for volatile stock [1] S. J. Orfanidis, Optimum Signal Processing: An Introduction, 2nd ed. New York: Mac- market prediction,” in Proc. 3rd Int. Conf. on Intelligent Data Engineering and Automated Millan, 1988. Learning (IDEAL’02), Springer-Verlag, 2002, pp. 391–396. [2] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Trans. [36] A. Abraham, N. S. Philip, and P. Saratchandran, “Modeling chaotic behavior of stock ASME, J. Basic Eng., ser. D, vol. 82, pp. 35–45, 1960. indices using intelligent paradigms,” Int. J. Neural, Parallel, Scientific Computat., vol. 11, no. [3] V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995. 1/2, pp. 143–160, 2003. [4] V. N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998. [37] H. Yang, “Margin variations in support vector regression for the stock market prediction”, [5] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Ph.D. dissertation, Chinese Univ. of Hong Kong, June 2003. Kernel-Based Learning Methods. Cambridge, U.K.: Cambridge Univ. Press, 2000. [38] P. Ongsritrakul and N. Soonthornphisaj, “Apply decision tree and support vector [6] V. N. Vapnik, “An overview of statistical learning theory,” IEEE Trans. Neural Netw., regression to predict the gold price,” in Proc. Int. Joint Conf. on Neural Networks – 2003, July vol. 10, no. 5, pp. 988–999, Sept. 1999. 20–24, 2003, vol. 4, pp. 2488–2492. [7] H. Drucker, C. C. J. Burges, L. Kaufman, A. Smola, and V. Vapnik, “Support vector [39] L. J. Cao and F. E. H. Tay, “Support vector machine with adaptive parameters in regression machines,” Adv. Neural Inform. Process. Syst., no. 9, pp. 155–161, 1997. financial time series forecasting,” IEEE Trans. Neural Netw., vol. 14, no. 6, pp. 1506–1518, [8] B. Schölkopf, A. J. Smola, and C. Burges, Advances in Kernel Methods––Support Vector Nov. 2003. Learning. Cambridge, MA: MIT Press, 1999. [40] Y. Liang and Y. Sun, “An improved method of support vector machine and its application [9] J. A. K. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” to financial time series forecasting,” Prog. Natural Sci., vol. 13, no. 9, pp. 696–700, 2003. Neural Process. Lett., vol. 9, no. 3, pp. 293–300, June 1999. [41] A. Abraham and A. A. Yeung, “Integrating ensemble of intelligent systems for mod- [10] J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle, eling stock indices,” Lect. Notes Comput. Sci., vol. 2687, pp. 774–781, 2003. Least Squares Support Vector Machines. Singapore: World Scientific, 2002. [42] K.-J. Kim, “Financial time series forecasting using support vector machines,” Neuro- [11] K.-R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik, computing, vol. 55, pp. 307–319, 2003. “Predicting time series with support vector machines,” in Proc. Int. Conf. on Artificial [43] Y. Bao, Y. Lu, and J. Zhang, “Forecasting stock price by SVMS regression,” Lect. Neural Networks, Springer, 1997. Notes Comput. Sci., vol. 3192, pp. 295–303, 2004. [12] S. Mukherjee, E. Osuna, and F. Girosi, “Nonlinear prediction of chaotic time series [44] H. Yang, K. Huang, L. Chan, I. King, and M. R. Lyu, “Outliers treatment in support using support vector machines,” in Proc. 1997 IEEE Workshop – Neural Networks for Signal vector regression for financial time series prediction,” in Proc. 11th Int. Conf. on Neural Processing VII, Sept. 24–26, 1997, pp. 511–520. Information Processing (ICONIP 2004), Lecture Notes in Computer Science, 2004, vol. 3316, [13] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Royal Hol- pp. 1260–1265. loway College, London, UK, NeuroCOLT Tech. Rep., 1998. [45] W. Huang, Y. Nakamori, and S.-Y. Wang, “Forecasting stock market movement [14] N. de Freitas, M. Milo, P. Clarkson, M. Niranjan, and A. Gee, “Sequential support direction with support vector machine,” Comput. Operat. Res., vol. 32, no. 10, pp. 2513– vector machines,” in Proc. 1999 IEEE Signal Processing Society Workshop – Neural Networks 2522, Oct. 2005. for Signal Processing IX, Aug. 23–25, 1999, pp. 31–40. [46] Y.-K. Bao, Z.-T. Liu, L. Guo, and W. Wang, “Forecasting stock composite index by [15] S. Rüping, “SVM kernels for time series analysis,” CS Dept., AI Unit, University of fuzzy support vector machines regression,” in Proc. 4th Int. Conf. on Machine Learning and Dortmund, Dortmund, Germany, 2001. Cybernetics, Aug. 18–21, 2005, pp. 3535–3540. MAY 2009 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 37

15.[47] D.-Z. Cao, S.-L. Pang, and Y.-H. Bai, “Forecasting exchange rate using support [74] Y. He, Y. Zhu, and D. Duan, “Research on hybrid ARIMA and support vector ma- vector machines,” in Proc. 4th Int. Conf. on Machine Learning and Cybernetics, Aug. 18–21, chine model in short term load forecasting,” in Proc. Int. Conf. on Intelligent Systems Design 2005, pp. 3448–3452. and Applications (ISDA ’06), Oct. 2006, vol. 1, pp. 804–809. [48] T. Z. Tan, C. Quek, and G. S. Ng, “Brain-inspired genetic complimentary learning [75] J. Yang and Y. Zhang, “Application research of support vector machines in condi- for stock market prediction,” in Proc. 2005 IEEE Congress on Evolutionary Computation, tion trend prediction of mechanical equipment,” in Proc. 2nd Int. Symp. on Neural Net- Sept. 2–5, 2005, vol. 3, pp. 2653–2660. works (ISNN 2005), Lecture Notes in Computer Science, May 30–June 1, 2005, vol. 3498, [49] D. C. Sansom, T. Downs, and T. K. Saha, “Evaluation of support vector pp. 857–864. machine based forecasting tool in electricity price forecasting for australian na- [76] W.-C. Hong, P.-F. Pai, C.-T. Chen, and P.-T. Chang, “Recurrent support vector ma- tional electricity market participants,” presented at the Australian Universities Power chines in reliability prediction,” Lect. Notes Comput. Sci., vol. 3610, pp. 619–629, 2005. Engineering Conf., 2002. [77] W.-C. Hong and P.-F. Pai, “Predicting engine reliability using support vector ma- [50] Z. Huang, H. Chen, C.-J. Hsu, W.-H. Chen, and S. Wu, “Credit rating analysis with chines,” Int. J. Adv. Manuf. Technol., vol. 28, no. 1/2, pp. 154–161, Feb. 2006. support vector machines and neural networks: A market comparative study,” Decision Sup- [78] J. A. K. Suykens, J. Vandewalle, and B. De Moor, “Optimal control by least squares port Syst., vol. 37, pp. 543–558, 2004. support vector machines,” Neural Netw., vol. 14, no. 1, pp. 23–35, 2001. [51] Y. Hur and S. Lim, “Customer churning prediction using support vector ma- [79] S. Gezici, H. Kobayashi, and H. V. Poor, “A new approach to mobile position track- chines in online auto insurance service,” in Proc. 2nd Int. Symp. on Neural Networks ing,” in Proc. IEEE Sarnoff Symp. on Advances in Wired and Wireless Communications, Mar. (ISNN 2005), Lecture Notes in Computer Science, May 30–June 1, 2005, vol. 3497, pp. 11–12, 2003, pp. 204–207. 928–933. [80] C.-J. Huang and C.-L. Cheng, “Application of support vector machines to admission [52] I. Bose and R. Pal, “Using support vector machines to evaluate financial fate of control for proportional differentiated services enabled internet servers,” in 2004 Proc. Int. dotcoms,” in Proc. Pacific Asia Conf. on Information Systems 2005, July 7–10, 2005, pp. Conf. on Hybrid Intelligent Systems, Dec. 5–8, 2004, pp. 248–253. 521–528. [81] X. Liu, J. Yi, and D. Zhao, “Adaptive inverse disturbance cancelling control system [53] P.-F. Pai and C.-S. Lin, “Using support vector machines to forecast the production based on least square support vector machines,” in Proc. 2005 American Control Conf., June values of the machinery industry in Taiwan,” Int. J. Adv. Manuf. Technol., vol. 27, no. 1/2, 8–10, 2005, pp. 2625–2629. pp. 205–210, Nov. 2005. [82] Q. Yang and S. Xie, “An application of support vector regression on narrow-band [54] W. Lu, W. Wang, A. Y. T. Leung, S.-M. Lo, R. K. K. Yuen, Z. Xu, and H. Fan, “Air interference suppression in spread spectrum systems,” Lect. Notes Comput. Sci., vol. 3611, pollutant parameter forecasting using support vector machines,” in Proc. 2002 Int. Joint pp. 442–450, Aug. 2005. Conf. on Neural Networks (IJCNN ’02), May 12–17, 2002, vol. 1, pp. 630–635. [83] M. Martínez Ramón, N. Xu, and C. G. Christodoulou, “Beamforming using sup- [55] T. B. Trafalis, B. Santosa, and M. B. Richman, “Prediction of rainfall from WSR- port vector machines,” IEEE Antennas Wireless Propagat. Lett., vol. 4, pp. 439–442, 2005. 88D radar using kernel-based methods,” Int. J. Smart Eng. Syst. Design, vol. 5, no. 4, pp. [84] F. Luo, Y.-G. Xu, and J.-Z. Cao, “Elevator traffic f low prediction with least squares 429–438, Oct.–Dec. 2003. support vector machines,” in Proc. 4th Int. Conf. on Machine Learning and Cybernetics, Aug. [56] W. Wang, Z. Xu, and J. W. Lu, “Three improved neural network models for air qual- 18–21, 2005, pp. 4266–4270. ity forecasting,” Eng. Computat., vol. 20, no. 2, pp. 192–210, 2003. [85] G. Xu, W. Tian, and Z. Jin, “An AGO-SVM drift modeling method for a dynami- [57] H. Prem and N. R. Srinivasa Raghavan, “A support vector machine based approach cally tuned gyroscope,” Measure. Sci. Technol., vol. 17, no. 1, pp. 161–167, Jan. 2006. for forecasting of network weather services,” J. Grid Computing, vol. 4, no. 1, pp. 89–114, [86] T. Frontzek, T. N. Lal, and R. Eckmiller, “Learning and prediction of the nonlinear Mar. 2006. dynamics of biological neurons with support vector machines,” in Proc. Int. Conf. on Ar- [58] M.-W. Chang, B.-J. Chen, and C.-J. Lin, “EUNITE network competition: Electric- tificial Neural Networks (ICANN 2001), Lecture Notes in Computer Science, 2001, vol. 2130, ity load forecasting,” Nov. 2001. pp. 390–398. [59] M. Mohandes, “Support vector machines for short-term load forecasting,” Int. J. [87] L. Ralaivola and F. d’Alche-Buc, “Dynamical modeling with kernels for nonlinear Energy Res., vol. 26, no. 4, pp. 335–345, Mar. 2002. time series prediction,” in Proc. Neural Information Processing Systems (NIPS 2003), Dec. [60] D. C. Sansom and T. K. Saha, “Energy constrained generation dispatch based on 8–13, 2003. price forecasts including expected values and risk,” in Proc. IEEE Power Energy Society [88] L. Ralaivola and F. d’Alche-Buc, “Nonlinear time series filtering, smoothing, and learn- General Meeting 2004, June 6–10, 2004, vol. 1, pp. 261–266. ing using the kernel Kalman filter,” Universite Pierre et Marie Curie, Paris, France, Tech. [61] L. Tian and A. Noore, “A novel approach for short-term load forecasting using sup- Rep., 2004. port vector machines,” Int. J. Neural Syst., vol. 14, no. 5, pp. 329–335, Aug. 2004. [89] M.-W. Chang, C.-J. Lin, and R. C.-H. Weng, “Analysis of switching dynamics [62] B.-J. Chen, M.-W. Chang, and C.-J. Lin, “Load forecasting using support vector with competing support vector machines,” IEEE Trans. Neural Netw., vol. 15, no. 3, pp. machines: A study on EUNITE competition 2001,” IEEE Trans. Power Syst., vol. 19, no. 720–727, May 2004. 4, pp. 1821–1830, Nov. 2004. [90] H. Liu, D. Liu, G. Zheng, and Y. Liang, “Research on natural gas load forecasting [63] B. Dong, C. Cao, and S. E. Lee, “Applying support vector machines to predict build- based on support vector regression,” in Proc. 5th World Congress on Intelligent Control and ing energy consumption in tropical region,” Energy Buildings, vol. 37, no. 5, pp. 545–553, Automation, June 15–19, 2004, pp. 3591–3595. May 2005. [91] H. Liu, D. Liu, Y.-M. Liang, and G. Zheng, “Research on natural gas load forecasting [64] Z. Bao, D. Pi, and Y. Sun, “Short term load forecasting based on self-organizing based on least squares support vector machine,” in Proc. 3rd Int. Conf. on Machine Learning map and support vector machine,” Lect. Notes Comput. Sci., vol. 3610, pp. 688–691, and Cybernetics, Aug. 26–29, 2004, pp. 3124–3128. 2005. [92] C.-H. Wu, J.-M. Ho, and D. T. Lee, “Travel-time prediction with support [65] P.-F. Pai and W. C. Hong, “Forecasting regional electricity load based on recurrent vector regression,” IEEE Trans. Intell. Transport. Syst., vol. 5, no. 4, pp. 276–281, support vector machines with genetic algorithms,” Electric Power Syst. Res., vol. 74, no. Dec. 2004. 3, pp. 417–425, 2005. [93] C. Zhang and H. Hu, “Using PSO algorithm to evolve an optimum input subset for a [66] Y. Ji, J. Hao, N. Reyhani, and A. Lendasse, “Direct and recursive prediction of time SVM in time series forecasting,” in Proc. 2005 IEEE Conf. on Systems, Man, and Cybernetics, series using mutual information selection,” in Proc. IWAAN 2005, June 8–10, 2005, pp. Oct. 10–12, 2005, vol. 4, pp. 3793–3796. 1010–1017. [94] Support Vector Machines. [Online]. Available: http://www.support-vector.net/ [67] M.-G. Zhang, “Short-term load forecasting based on support vector machine regres- index.html sion,” in Proc. 4th Int. Conf. on Machine Learning and Cybernetics, Aug. 18–21, 2005, vol. [95] Kernel Machines. [Online]. Available: http://www.kernel-machines.org/index. 7, pp. 4310–4314. html [68] X. Li, C. Sun, and D. Gong, “Application of support vector machine and simi- [96] International Neural Network Society. [Online]. Available: http://www.inns.org/ lar day method for load forecasting,” Lect. Notes Comput. Sci., vol. 3611, pp. 602–609, [97] Neural Computation. [Online]. Available: http://neco.mitpress.org/ 2005. [98] European Neural Network Society. [Online]. Available: http://www.snn.ru.nl/ [69] P.-F. Pai and W.-C. Hong, “Support vector machines with simulated annealing al- enns/ gorithms in electricity load forecasting,” Energy Conv. Manage., vol. 46, no. 17, pp. 2669– [99] Asia Pacific Neural Network Assembly. [Online]. Available: http://www.apnna.net/ 2688, Oct. 2005. apnna [70] H.-S. Wu and S. Zhang, “Power load forecasting with least squares support vector [100] Japanese Neural Network Society. [Online]. Available: http://www.jnns.org/ machines and chaos theory,” in Proc. Int. Conf. on Neural Networks and Brain, Oct. 13–15, [101] Journal of Artificial Intelligence Research. [Online]. Available: http://www.jair. 2005, vol. 2, pp. 1020–1024. org/ [71] M. Espinoza, J. A. K. Suykens, and B. De Moor, “Load forecasting using fixed-size [102] S. Rüping. [Online]. Available: http://www-ai.cs.uni-dortmund.de/SOFT- least squares support vector machines,” Lect. Notes Comput. Sci., vol. 3512, pp. 1018–1026, WARE/MYSVM/index.html 2005. [103] R. Collobert and S. Bengio, “SVMTorch: Support vector machines for large-scale [72] C.-C. Hsu, C.-H. Wu, S.-C. Chen, and K.-L. Peng, “Dynamically optimizing pa- regression problems,” J. Machine Learning Res., vol. 1, pp. 143–160, Sept. 2001. rameters in support vector regression: An application of electricity load forecasting,” in [104] K. Pelckmans, J. A. K. Suykens, T. Van Gestel, J. De Brabanter, L. Lukas, B. Ham- Proc. 39th Annu. Hawaii Int. Conf. on System Sciences (HICSS 2006), Jan. 4–7, 2006, vol. ers, B. De Moor, and J. Vandewalle. [Online]. Available: http://www.esat.kuleuven.be/ 2, pp. 1–8. sista/lssvmlab/tutorial/lssvmlab_paper0.pdf [73] M. Espinoza, J. A. K. Suykens, and B. De Moor, “Fixed-size least square support vec- [105] M. E. Tipping, S. A. Solla, T. K. Leen, and K.-R. Müller, “The relevance vector tor machines: A large scale application in electrical load forecasting,” Computat. Manage. machine,” in Advances in Neural Information Processing Systems 12. Cambridge, MA: MIT Sci., vol. 3, no. 2, pp. 113–129, Apr. 2006. Press, 2000. 38 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2009