UECM3543 Assignment Report
==========================
## SMART Objectives
Objectives should aim to achieve at least 5% annual rate of return for portfolio with the conditions given and follow **S-M-A-R-T criteria:**
**1. Specific**
To the point and behavioral in nature
**2. Measurable**
Characteristics that will define successful achievement
**3. Attainable**
Possible to attain. But a stretch. "Just out of reach, but not out of sight"
**4. Relevant**
Truly worth taking on; it is of value
**5. Time Bound**
A clearly identified time element.
# KPI SETTING
| PIC | Task | Measure | Actual | Target |
|-----|------|---------|--------|--------|
| Choy Chok Heng | Data Modeller | Number of models built | | Replicate the model as support and try to discover any further improvement.|
| Lim Zhenxun |Data Mining | Number of data processed | | Obtain and clean the data from Bloomberg in order to minimize the error
| Tan Man Lin | Model Tester | Number of error and suggestion | | Trial and Error on the model to find out any further improvement or weakness.|
| Vivian Quek Ee Wan| Reporting | <ul><li>Introduction (20%)</li><li>Assumption (20%)</li><li>Methodology (20%) </li><li>Finding (20%) </li><li>Conclusion (20%) </li></ul> | |Report on our view points towards the model as well as methodology.|
| Lai Weng Key | Strategist | Evaluate Assets Property and Justify | | Create a portfolio which suit with the strategy and able to make at least 5% annualized return. |
# Gantt Chart
```mermaid
gantt
title A GANTT Diagram
section KPI Setting
Duration :des, 2017-05-29, 2017-08-18
Setting Gantt Chart :done , des1, 2017-06-05, 2d
Overall KPI Setting :done, after des1, 3d
section Literature Review
Literature Review :active, 2017-06-10, 2017-07-03,
Read & Filter Articles :done, des2, 2017-06-10, 3d
Brain Storming & pick an article :done, des3, 2017-06-13, 3d
Data Mining & Processing :done, des4, 2017-06-15, 4d
Modelling :done,des5, 2017-06-17, 5d
Trial & Error :done,des6, 2017-06-19, 9d
Reporting :done,des7, 2017-06-29, 3d
section Trading Strategies
Proposal(assignment 2) :done,des8, after des7, 33d
Report :done,des69, after des8, 20d
```
Download and Refer to more detail [Gantt Chart](https://www.dropbox.com/s/d7uogreuecc3nrt/gantt%20chart.xlsx?dl=0) in excel form.
# Assignment 1: Literature Review
### *1.0 Introduction*
We have chosen the article “Evolution of worldwide stock markets, correlation structure and correlation based graphs” by Dong-Ming Song, Michele Tumminello, Wei-Xing Zhou, and Rosario N. Mantegna. (2011) as our investigating papers. There are many ideas have been proposed to determine the daily correlation present among the market indices of the stock exchanges worldwide (57different stock exchange) from 1996 to 2009.
Based on the study, they discovered that the correlation among market indices of different stock exchanged shows both fast and slow dynamics. They showed that the slow dynamics is rising steadily with the development and consolidation of globalization whereas the fast dynamics is associated with the essential event that arise in a specific region and rapidly affect the global system. Besides, by computing correlation matrices, each trading month by using a 3 months evaluation time period, it shows that the correlation matrices contain informations about the world global system that can be investigated by using average values of the correlation, correlation based graphs and the spectral properties of the largest eigenvalues and eigenvectors.
Although this article presents several ideas, we will primarily study on the average correlation of the market indices by providing two experimental results using different instruments to prove whether the above article showed the correct information to us. In this article, we are able to show that there are presence of both slow and fast dynamics.
### *2.0 Assumptions*
1. Evaluation time period is: 260days/trading year(△T= 1), 65days/trading quarter (△T= 0.25)
2. Assume that the market is operating during weekdays by ignoring all the public holidays
### *3.0 Methodology*
Two financial instruments, which are currency and index have been selected and 10 years daily last price of four countries stock market index (KLCI, HSI, SSEC and S&P500) was chosen to carry out the experiment. The data are collected from Bloomberg terminal and sampling method used is random sampling.
The experiment start with extracting data of four counties from January 2006 to December 2016 from the Bloomberg terminal. We found that for some country, the data are not sampled daily. Therefore, we fill up the black space with previous price.
We perform our analysis on the daily logarithmic return, which, for each index i, is defined as:
$$ri(t) = ln Pi(t) − ln Pi(t − 1), $$
where $Pi(t)$ is the price of index i on day t.
Finally, the average correlation between stock market index is calculated and plotted.




---
### *Python code*
-----
```python
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os
import datetime as dt
import math
from scipy.stats import pearsonr
from sklearn.metrics import mean_squared_error
np.set_printoptions(precision=4)
symbols = ['sp500', 'klse', 'shanghai', 'hong kong']
def compile_data():
main_df = pd.DataFrame()
for count, instru in enumerate(symbols):
df = pd.read_csv('data/{}.csv'.format(instru), parse_dates=True, index_col=0)
df.rename(columns = {'Last Price': instru}, inplace=True)
df.drop([], 1, inplace=True)
if main_df.empty:
main_df = df
else:
main_df = main_df.join(df, how='inner')
print('\nOriginal Data Compile')
print(main_df.head())
main_df.to_csv('joined_last_price.csv')
compile_data()
def visualize_data():
df = pd.read_csv('joined_last_price.csv', index_col=0)
df_corr = df.corr()
print(df_corr)
data = df_corr.values
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
heatmap = ax.pcolor(data, cmap=plt.cm.RdYlGn)
fig.colorbar(heatmap)
ax.set_xticks(np.arange(data.shape[0]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.invert_yaxis()
ax.xaxis.tick_top()
column_labels = df_corr.columns
row_labels = df_corr.index
ax.set_xticklabels(column_labels)
ax.set_yticklabels(row_labels)
plt.xticks(rotation=90)
heatmap.set_clim(-1,1)
plt.tight_layout()
plt.show()
fig.savefig('Correlation Coefficient from Original Data.png')
visualize_data()
def LogReturnCompile_Data():
lr_df = pd.DataFrame()
for count, instru in enumerate(symbols):
df = pd.read_csv('data/{}.csv'.format(instru), parse_dates=True, index_col=0)
df['Log Return'] = np.log(df).diff()
df.rename(columns = {'Log Return': instru}, inplace=True)
df.drop(['Last Price'], 1, inplace=True)
if lr_df.empty:
lr_df = df
else:
lr_df = lr_df.join(df, how='inner')
print('\nDaily Log Return Compile')
print(lr_df.head())
print(lr_df.describe())
lr_df.to_csv('Joined Log Return.csv')
LogReturnCompile_Data()
def plotcontour():
df = pd.read_csv('average-correlation.csv', parse_dates=True, index_col=0)
x = np.linspace(2007,2016,len(df.columns))
y = np.linspace(0.25,3,12)
z = df
fig = plt.figure()
plt.xlabel('Time Frame')
plt.ylabel('dt')
plt.title('Average Correlation - Indexs \n')
cp = plt.contourf(x,y,z,cmap=plt.get_cmap('jet'))
plt.colorbar(cp)
plt.show()
fig.savefig('Contour AC.png')
plotcontour()
def AverageCorrelation():
print('Please wait awhile, this is the part we were facing problem. So have a look for it. Thank you for your patient')
def acc(i,j):
df = pd.read_csv('Joined Log Return.csv', parse_dates=True, index_col=0)
a = df['sp500']
b =df['klse']
c = df['shanghai']
d = df['hong kong']
array = (np.corrcoef(a[i:j*65-1+i],a[i:j*65-1+i])
+ np.corrcoef(a[i:j*65-1+i],b[i:j*65-1+i])
+ np.corrcoef(a[i:j*65-1+i],c[i:j*65-1+i])
+ np.corrcoef(a[i:j*65-1+i],d[i:j*65-1+i])) / 4
#print(array)
sum_1 = array.sum() / 2 -1
return sum_1
myArray=[[acc(j,i) for i in range(1,13,1)] for j in range(2542)]
m = np.asmatrix(myArray)
print(m)
cp = plt.contourf(m,cmap=plt.get_cmap('jet'))
plt.colorbar(cp)
plt.show()
AverageCorrelation()
```
#### Above coding can be used only in our indexes datas only. There is a attached python code is listed down below (blue symbol) in order to refer for it.
* We are able to store the value in the array but we couldn't handle the matrix in order to plot the graph. Therefore, we have decided to use Microsoft Excel to create the matrix in order to plot the graph.
#### [Final Coding](https://www.dropbox.com/s/dy6lt5atds3xq9p/assignment%201.rar?dl=0)
----
### *4.0 Limitations*
**4.1 Small sample size chosen**
Since there are hundred over stock market index around the world and only 4 countries stock market index, this small sample size might effect the results. We have calculated the correlation coefficient between KLCI, S&P500,HSI and SSEC. We also determing how the world economy crisis affect these stock markets.
**4.2 Incomplete daily stock price**
The data extract are incomplete and we assume the blank have the same stock prices as previous day. Therefore, the correlation coefficient might be affected.
### *5.0 Discussion*

The dark blue region is the region where the past records are not enough to estimate the correlation with the same scale.
The contour plot of Average Correlation – Indices which consists of KLCI, HSI, SSE shows a relationship between the different time periods of the correlation across 2007 to 2016 with respect to S&P500.
From our observations, there are high correlations when △T is small. This may due to the fast dynamics and fast advancement in Asia market respect to US market. In another word, there are low correlations when △T is increasing. Although there is a financial crisis happening around year 2007 and 2008 but doesn’t affect much on sampled countries. This may due to The Chinese government has taken timely and bold countermeasures to mitigate the impact of the global financial crisis. Since the third quarter of 2008, the Chinese authorities have adopted a combination of an active fiscal policy and a loose monetary policy. The appreciation of the RMB against the US dollar has been halted since the second half of 2008. Although these measures are quite useful to stimulate short-term economic growth, they cannot ensure long-term sustainable growth and might generate new risks and spread it after few years around 2009 and 2010. Furthermore, Malaysia is a Muslim Country and the corporation is engage with Islamic financial products so the influence of the crisis will not affect much. Although, there exist a tiffany blue spike around the year 2013 to 2014 but it doesn’t contradict with our findings because the values are not much different based on the color bar.

The contour plot of Average Correlation – Currencies which consists of MYR, CNY, HKD shows a relationship between different time periods of the correlation across 2007 to 2016 with respect to USD.
From our observations, there’s a large correlation with the currency around year 2011 to 2012, this may due to the financial problem around the Eurozone. Several Eurozone member states such as (Greece, Portugal, Ireland, Spain and Cyprus) were unable to repay or refinance their government debt or to bail out over-indebted banks under their national supervision without the assistance of third parties like other Eurozone countries. Moreover, there’s some influence from pre-Brexit as Brexit around 2016 has caused a fast dynamics.
### *6.0 Conclusion*
From the both empirical result, we second the idea of the paper that the correlation method can show the presence of fast and slow dynamics not only with indices but also can be applied to currencies to observe the influence of global financial events. We also provide evidence that the shorter time period (less than 65 trading days) of correlation among market indices is rather fast.
# *Assignment 2*
## *Proposal*
### *1.0 Introduction*
In assignment 2, our objective is to achieve at least 5% annual rate of return by using the strategies. We use two machine learning methods to carry out this assignment which are neural network and support vector machine(SVM). Firstly,we decided to choose moving average(MA) and linear regression indicator as our partitioners.
#### *Moving Average (MA)*
Moving average (MA) is a mathematical result that is calculated by averaging a number of past data points. Once determined, the resulting average is then plotted onto a chart then connected to create a moving average line in order to allow traders to look at smoothed data rather than focusing on the day-to-day price fluctuations that are inherent in all financial markets. Formula of MA:
$\frac{\sum_{i = 1}^{N} Close Price_{i}}{N} = MA$
The most common applications of MAs are to identify the trend direction and to determine support and resistance levels. A simple moving average is customizable in that it can be calculated for a different number of time periods, simply by adding the closing price of the security for a number of time periods and then dividing this total by the number of time periods, which gives the average price of the security over the time period. A simple moving average smoothes out volatility, and makes it easier to view the price trend of a security. If the simple moving average points up, this means that the security's price is increasing. If it is pointing down it means that the security's price is decreasing. The longer the timeframe for the moving average, the smoother the simple moving average. A shorter-term moving average is more volatile, but its reading is closer to the source data.
#### *Linear Regression Indicator*
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable.

LRI is a “Forecast Indicator”, aiming at forecasting tomorrow’s price plotted today. The Linear Regression indicator shows where prices, statistically, should be trading. One could therefore expect any major deviation from the regression line to be short-lived.LRI uses the least squares method. LRI is in many ways similar to a moving average, but does not exhibit as much "delay. " The advantage of LRI is that it is "fitting" a line to the data points rather than averaging them, hereby creating an indicator, which is highly sensitive to price changes.
### *2.0 Partition*


We plot Moving Average and Linear Regression with the previous n-days daily close price. Then, we use Linear Regression(n) and Moving Average(n) to partition the price trend, where n will different among assets, we will do a trial and error run to scan from [n = 2 to n = 25] to assign the n which will give generate highest profit. When the LRC intersect the MA from the top it will indicate a sell signal, vice versa. We will collect the Profit and Loss from this primitive strategy. There are some regions that the strategy will generate negative return and give us a false alarm. Nonetheless, we will have action data like [profit region, profit region, loss region, profit region…] then we will implement the machine learning to learn the profit trend so it will know when it is the false alarm.
### *3.0 Methodology*
#### *Neural Networks*
The machine learning methods that we chose is neural networks. This methods biologically inspired by the structure of the human brain. Neutral networks are forecasting methods that are based on simple mathematical models of the brain. They allow complex nonlinear relationships between the response variable and its predictors. Neural networks used to estimate future values by processing the past and current data.
The neural network can be thought as a networks of “neurons” organized in layers. It contain 3 layers which are input layer, hidden layer and output layer. The input layer feeds past data values into the next (hidden) layer. The black circles represent nodes of the neural network. Besides, the hidden layer encapsulates several complex functions that create predictors; often those functions are hidden from the user. A set of nodes (black circles) at the hidden layer represents mathematical functions that modify the input data; these functions are called neurons. The output layer collects the predictions made in the hidden layer and produces the final result which is the model’s prediction.

The very simplest networks contain no hidden layers and are equivalent to linear regression. The coefficients attached to these predictors are called “weights”. The forecasts are obtained by a linear combination of the inputs. If we add an intermediate layer with hidden neurons, the neural network becomes non-linear. This is known as a multilayer feed-forward network where each layer of nodes receives inputs from the previous layers. The outputs of nodes in one layer are inputs to the next layer. The inputs to each node are combined using a weighted linear combination. The result is then modified by a nonlinear function before being output. In the hidden layer, this is then modified using a nonlinear function such as a sigmoid, to give the input for the next layer. This tends to reduce the effect of extreme input values, thus making the network somewhat robust to outliers. We used sigmoid function due to most sigmoid functions have derivatives that are positive and easy to calculate.
#### *Support Vector Machine (SVM)*
Furthermore, we choose support vector machine as another machine learning method. “Support Vector Machine” (SVM) is based on the concept of decision planes that define decision boundaries. A decision plane is one that separates between a set of objects having different class memberships. SVM is a supervised machine learning algorithm which can be used for both classification and regression challenges. Support Vectors are simply the co-ordinates of individual observation. Support Vector Machine is a frontier which best segregates the two classes (hyper-plane/ line). A schematic example is shown in the illustration below. This example show the objects belong either to class GREEN or RED. The separating line defines a boundary on the right side of which all objects are GREEN and to the left of which all objects are RED.

SVM has a technique called the kernel trick. These are functions which takes low dimensional input space and transform it to a higher dimensional space. For example, it converts not separable problem to separable problem, these functions are called kernels. It is mostly useful in non-linear separation problem. There are number of kernels that can be used in Support Vector Machines models. These include linear, polynomial, radial basis function (RBF) and sigmoid:

Gamma: adjustable parameter of certain kernel functions.
C: Penalty parameter C of the error term. It also controls the tradeoff between smooth decision boundary and classifying the training points correctly
### *Plug the partitions into machine learning methods*
$A x = b$
$A = \begin{pmatrix}
std.deviation_{1} & \bigtriangleup P_{1}\\
std.deviation_{2} & \bigtriangleup P_{2}\\
\\\vdots & \vdots \\
std.deviation_{n-1} & \bigtriangleup P_{n-1}
\end{pmatrix}$
$X = \begin{pmatrix}
Weightage_{1} \\
\vdots \\
Weightage_{n-1}
\end{pmatrix}$
$B = \begin{pmatrix}
std.deviation_{n} \\
\bigtriangleup P_{n}
\end{pmatrix}$
Matrix $A$ is our input with standard deviation and delta P.
Matrix $X$ is our weightage which will be calculated with machine learning by using our historical data [$day(1)$ to $day(n-1)$] to assign some weightage for our input.
where Matrix $B$ is our result for $day(n)$.
# Assignment 2: Trading strategies
### *1.0 Introduction*
Stock price prediction is one of the most widely studied and challenging problems, attracting researchers from many fields including economics, history, finance, mathematics, and computer science. The volatile nature of the stock market makes it difficult to apply simple time-series or regression techniques. Financial institutions and traders have created various proprietary models to try and beat the market for themselves or their clients, but rarely has anyone achieved consistently higher-than-average returns on investment. Nevertheless, the challenge of stock forecasting is so appealing because an improvement of just a few percentage points can increase profit by millions of dollars for these institutions.
Recently, researchers have turned to techniques in the computer science fields of big data and machine learning for stock price forecasting. These apply computational power to extend theories in mathematics and statistics. Machine learning algorithms use given data to “figure out” the solution to a given problem. Big data and machine learning techniques are also the basis for algorithmic and high-frequency trading routines used by financial institutions.
In this paper we focus on a machine learning techniques known as Support Vector Machines (SVM) and Neural Network(NN). We apply machine learning in future index which is KLCI and S&P500, currency (GBP/USD) and bond(US1) respectively. We input six parameters to the model - standard deviation, buy/sell, momentum, duration, ratio and profit & loss. These parameters are calculated from price trend that have been partitioned from the years 1990 through 2017. Our objective of the research is to analyze whether using 2 past partition which is generated from our Linear Regression and Moving Average Formula with Neural Network and Support Vector Machine to predict whether the next action is a good action.
### *2.0 Background Information*
**S&P 500**
The S&P 500, is an American stock market index based on the market capitalizations of 500 large companies having common stock listed on the NYSE or NASDAQ. S&P 500 is a leading indicator of U.S equities and reflector of the performance of the large cap universe, made up of companies which are selected by the economists. The S&P 500 is a market value weighted index. Investors are able to obtain investment products based on the S&P 500 include index funds and exchange-traded funds.
**KLCI**
KLCI, is a capitalisation-weighted stock market index, composed of the 30 largest companies on the Bursa Malaysia by market capitalisation that meet the eligibility requirements of the FTSE Bursa Malaysia Index Ground Rules. The index is jointly operated by FTSE and Bursa Malaysia.
**US1**
US1 is US 1year bond. Bonds represent debt obligations – and therefore are a form of borrowing. Bonds are often referred to as fixed-income investment. Bonds are simply long-term IOUs that represent claims against a firm’s assets. Due to long term effect of the bond, thus the longer the time frame it earn more money whereas the shorter the timeframe, it obtained more loss.
**GBP/USD**
The British pound/U.S. dollar pair is one of the most liquid trades in forex. Currency act as a tool in analysing the condition or economics strength of each country. The interest rate and the performance of the financial sector of each country are the important indicator in weighting the strength of each country which will then reflected in the value of each currency. Short-term currency trading has some distinct benefits over other investments. As a volatile investment, currency can rise in value quickly, resulting in gains that would take months or years to earn in more conservative investments. The higher risk also means faster results and less waiting to see whether a given investment gamble will pay off. So, the shorter the time frame, it can generate more profit compared to longer time frame.
**Moving Average (MA)**
Moving average (MA) is a mathematical result that is calculated by averaging a number of past data points. Once determined, the resulting average is then plotted onto a chart then connected to create a moving average line in order to allow traders to look at smoothed data rather than focusing on the day-to-day price fluctuations that are inherent in all financial markets. Formula of MA:
$\frac{\sum_{i = 1}^{N} Open Price_{i}}{N} = MA$
The most common applications of MAs are to identify the trend direction and to determine support and resistance levels. A simple moving average is customizable in that it can be calculated for a different number of time periods, simply by adding the openning price of the security for a number of time periods and then dividing this total by the number of time periods, which gives the average price of the security over the time period. A simple moving average smoothes out volatility, and makes it easier to view the price trend of a security. If the simple moving average points up, this means that the security's price is increasing. If it is pointing down it means that the security's price is decreasing. The longer the timeframe for the moving average, the smoother the simple moving average. A shorter-term moving average is more volatile, but its reading is closer to the source data.
**Linear Regression Indicator**
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable.

LRI is a “Forecast Indicator”, aiming at forecasting tomorrow’s price plotted today. The Linear Regression indicator shows where prices, statistically, should be trading. One could therefore expect any major deviation from the regression line to be short-lived.LRI uses the least squares method. LRI is in many ways similar to a moving average, but does not exhibit as much "delay. " The advantage of LRI is that it is "fitting" a line to the data points rather than averaging them, hereby creating an indicator, which is highly sensitive to price changes.
### *3.0 Assumption*
Assume that every transaction is incurred at Open Price and there is no delay of executing the Buy or Sell Order.
### *4.0 Methodology*
In this assignment we partitioned the data by using the linear regression indicator (LRC) and also moving average (MA). The flow of our code is shown as below.





We use the partition 1 and 2 with the 6 features (Profit/Loss, Standard Deviation, Ratio, Momentum, Duration, Buy/Sell)to predict whether it is a good action (1 or 0, 1 = take action, 0 = do not take action). And through SVM Machine Learning and Neural Networks Machine Learning, we can find the SVM y_hat and also the NN_y hat for the partition 3 which can determine the accuracy for our prediction of our target (1 or 0). When y_hat is 1, we take action (buy or sell), whereas when y_hat is 0, we do not take any action.

To find the standard deviation, we use this method. Action line 9 and 10 is a partition means that, by using the open price of 9 and 10, we can find the standard deviation of stock price for this partition. Action line 11 to 16 is a partition means that, by using the open price from 11 to 16, we can find the stardard deviation of stock price for that partition.

### *4.1 Feature Selection*
In our model we picked, (Standard Deviation), (Profit and Loss), (Duration), (Ratio), (Momentum), (Buy Sell Signal), standard deviation only indicate us the deviation from the average stock price of the partition, then we need momentum to indicate the direction of the trend as well as the strength along the partition. Besides that, we include the duration to observe that how long the momentum presist. Ratio indicate us that the changes between the beginning of the partition and the end of the partition.
**Table 1: Features used in SVM and NN**
| Feature Name | Formula |
| --- | --- |
|Standard Deviation | $$s=\sqrt{\frac{\sum(x-\overline{x})^2}{n-1}}$$ |
|Duration |$$t_{i+1}-t_{i}$$ |
|Profit and Loss |$$P_{i+1}-P_{i}$$|
|Momentum |$$\sum_{\frac{i=t-n+1}{t}}^{t}y$$ |
|Ratio | $$\frac{P_{i+1}} {P_{i}}$$ |
### *5.0 Discussion and Results*
Linear Regression(n) and Moving Average(n) to partition the price trend. When the LRC intersect the MA from the top it will indicate a sell signal, vice versa. We will collect the Profit and Loss from this primitive strategy. There are some regions that the strategy will generate negative return and give us a false alarm.


From the KLCI-bar chart above, we can observe that the profit is highest when N=3. This is due to when we have smaller N we will have smaller partition size and more number of partitions. It will increase our number of transaction.
However, in S&P500 (SPX). The signal obtained from our partition generate more loss than profit and resulting a net loss.
Besides that in our chosen currency (GBPUSD), we obtained a negative result when N=20, this may due to currency is a high volatile instrument and appropriate for short term trading.
On the other hand, bond instrument is having an opposite behaviour with currency, we obtained a negative result when N=3. Bond is an instrument which is appropriate for long term trading.
**Neural Network(NN) Profit**

The diagram above shows the profit after applying Neural Network.
**Neural Network(NN) Profit (Reverse)**


The diagram above shows the profit (Reverse Action) after applying Neural Network.
**Support Vector Machine(SVM) Profit**


The diagram above shows the profit after applying SVM.

Table above show the selling and buying action of four instruments based on profit from three different time frame. From the table, we can clearly see that majority of the instruments are making profit from buying action rather than selling action except for GBP/USD in 3 rolling, the profit is obtain from selling action. The objective of doing this analysis is to provide the user some idea on the upcoming action when both machine learning provide different results.
**Support Vector Machine(SVM) Profit (Reverse)**


The diagram above shows the profit (Reversed Action) after applying SVM.
As you can see from the bar chart, after applying machine learning, there is a significant increase of the profit and significant decrease of the loss.Therefore, the net profit for all 4 instruments also increased.

Refer to above the table, the profit of every instrument in each time frame is obtained from buy action. The objective of doing this analysis is to provide the user some idea on the upcoming action when both machine learning provide different results.
### *5.1 Model Performance*
**Below diagram shows the profit of the 3 different rolling data that have increase after we apply Neural Network.**


**Below diagram shows the profit of the 3 different rolling data that have increase after we apply Support Vector Machine.**


Both the machine learning we apply manage to help us to increase the profit.
### *5.2 Model Accuracy*
We apply the machine learning method that chosen which are Support Vector Machine(SVM) and Neural Network(NN) to split the features into train and test data and calculate the accuracy of train and test data respectively and we get the accuracy as tabulated below:
Remark:
NN Accuracy and SVM Accuracy is different from the score function.
It is calculated by how many percentage of the Target(y) is matched with the predicted y hat generated from our machine learning.
N= 3
| Information | KLCI | GBPUSD | US1 | SPX
| --- | --- | --- | --- | --- |
| Accuracy of training set (NN)| 0.7390 | 0.7322 | 0.6950 | 0.7110 |
| Accuracy of testing set (NN)| 0.6564 | 0.6791 | 0.6639 | 0.6962 |
| NN Accuracy | 0.7169 | 0.7181 | 0.6860 | 0.7064 |
| Accuracy of training set (SVM) | 0.8874 | 0.6643 | 0.6855 | 0.8614 |
|Accuracy of testing set(SVM)| 0.5720 | 0.6580 | 0.6589 | 0.6578 |
| SVM Accuracy | 0.8068 | 0.6619 | 0.6777 | 0.8094 |
N = 10
| Information | KLCI | GBPUSD | US1 | SPX
| --- | --- | --- | --- | --- |
| Accuracy of training set (NN)| 0.7156 | 0.7034 | 0.7424 | 0.7571 |
| Accuracy of testing set (NN)| 0.5959 | 0.6505 | 0.5966 | 0.6772 |
| NN Accuracy | 0.6821 | 0.6883 | 0.7030 | 0.7332 |
| Accuracy of training set (SVM) | 0.9977 | 0.6835 | 0.7386 | 0.9840 |
|Accuracy of testing set(SVM)| 0.5890 | 0.6324 | 0.6023 | 0.6825 |
| SVM Accuracy | 0.8906 | 0.6680 | 0.7016 | 0.9036 |
N = 20
| Information | KLCI | GBPUSD | US1 | SPX
| --- | --- | --- | --- | --- |
| Accuracy of training set (NN)| 0.7462 | 0.7567 | 0.6877 | 0.7617 |
| Accuracy of testing set (NN)| 0.5606 | 0.59 | 0.6235 | 0.6047 |
| NN Accuracy | 0.6917 | 0.7096 | 0.6657 | 0.6279 |
| Accuracy of training set (SVM) | 1.0 | 0.7833 | 0.8538 | 1.0 |
|Accuracy of testing set(SVM)| 0.5758 | 0.65 | 0.6824 | 0.6628 |
| SVM Accuracy | 0.8835 | 0.7444 | 0.8035 | 0.8986 |
From tables above, we can see that for each of the 3 different rolling data, Support Vector Machine able to provide higher accuracy than Neural Network.
### Python Code
[final version code](https://www.dropbox.com/s/apwc12iibsuq8nj/assignment%202%20final2.rar?dl=0)
### 6.0 Backtesting and Application

**Figure 1 above: Predicted Signal and Machine Learning Results**

**Table 1 above: Future data for cross validation**

**Table 2 above: Current Data**
Remark: To generate the table above we adjust our input to **df1 = df[:-11]**
We use the last 2 partition with the 6 features (Profit/Loss, Standard Deviation, Ratio, Momentum, Duration, Buy/Sell) to predict whether the next partition is profitable (True or False, True = take action because it is a profit region, False = do not take action or reverse the action respectively because it is a loss region) by using SVM Machine Learning and Neural Networks Machine Learning. We can find the SVM y_hat and also the NN_y hat for the 3rd partition which can determine the accuracy for our prediction of our target (1 or 0). When y_hat is 1, we take action (buy or sell), whereas when y_hat is 0, we do not take any action.
From the figure 1, we can see that our model is predicting next region is a **False** region and our indicator is predicting next region is a **Short** signal. Then, we checked our model prediction with cross validation with Table 1 and Table 2. Through the validation, it matched with our model prediction.
When both Machine Learning is indicating same result, then it will be more reliable. If both give different result then we need to include some sentiment analysis and other evidence for us to act on.
### 7.0 Conclusion
After comparing the result obtained from both machine learning, we can observe that the Support Vector Machine had increased the profit the most. Hence, we believe the developed SVM model provided good prediction capabilities with respect to the NN models.
The four financial instruments that we used in this assignment are traded on future exchange. Future exchange is a central financial exchange where people can trade standardized futures contracts; that is, a contract to buy specific quantities of a commodity or financial instrument at a specified price with delivery set at a specified time in the future. Futures instruments are priced according to the movement of the underlying asset (stock, physical commodity, index, etc.).
Usually, clients hold a margin account with the exchange, and every day the swings in the value of their positions is added to or deducted from their margin account. If the margin account gets too low, they have to replenish it. In this way it is highly unlikely that the client will not be able to fulfill his obligations arising from the contracts. Therefore, margin trading does not required cost of purchase when we are trading the instruments given in the futures markets. Thus, there is no information to calculate the rate of return of each financial instrument and we will only provide Profit and Loss in all of our assignment.
### Acknowledgement
I would like to express my appreciation to Dr. Lai An Chow and Dr. Goh Yong Kheng for their guidance during the process of preparing assignment for UECM3543- Case Study on Investment and Trading.
Hereby on behalf of my group, I would like to express our gratitude to both of them, specifically Dr. Lai An Chow presenting us an technical analysis method as well as primitive investing idea for us to explore, where Dr. Goh Yong also presenting us the idea of using machine learning which will be the biggest take away when we graduate from UTAR since it will be an added advantage for us in future careers.
I would like to express my appreciation to my group members who spent their time to discuss the assignment to produce a great work.
Last but not least, I am thankful to every UECM3543- Case Study on Investment and Trading friends who constantly sharing great and creative ideas, together we produce great work. They are the source of ideas, we learnt from each other and helped each other to breakthrough the bottleneck.