Support Vector Machine and Support Vector Regression

Xiaoqi Zheng, 0402/2020

In [1]:
library(e1071)

1. Support Vector Machine

1.1 Run SVM with defaut parameters

In [5]:
## A small example with the IRIS data set
data(iris)

## Split to train + test set
idxs <- sample(1:nrow(iris),as.integer(0.7*nrow(iris)))
trainIris <- iris[idxs,]
testIris <- iris[-idxs,]
In [6]:
?svm #scale = TRUE, kernel = "radial", degree = 3, gamma = if (is.vector(x)) 1 else 1 / ncol(x), cost = 1,
In [16]:
#Default Paramters
model <- svm(Species~., data=trainIris)
In [17]:
pred <- predict(model, testIris)
confus.matrix = table(real=testIris$Species, predict=pred)
print(confus.matrix)
accuracy = sum(diag(confus.matrix))/sum(confus.matrix)
cat("accuracy =",accuracy)
            predict
real         setosa versicolor virginica
  setosa         12          0         0
  versicolor      0         16         0
  virginica       0          3        14
accuracy = 0.9333333

1.2 Turn parameters for getting better result

In [19]:
?tune.svm
In [20]:
## Tuning SVM to find the best cost and gamma by 10 fold cross-validation.
svm_tune <- tune.svm(Species~., data = trainIris,
                 kernel="radial", cost=10^(-1:2), gamma=c(.5,1,2))
In [22]:
print(svm_tune)
Parameter tuning of ‘svm’:

- sampling method: 10-fold cross validation 

- best parameters:
 gamma cost
   0.5    1

- best performance: 0.01909091 

In [21]:
plot(svm_tune)
In [24]:
svm_model_after_tune <- svm(Species ~ ., data=trainIris, kernel="radial", 
                            cost=svm_tune$best.parameters$cost, gamma=svm_tune$best.parameters$gamma)
In [25]:
pred <- predict(svm_model_after_tune,testIris)
confus.matrix = table(real=testIris$Species, predict=pred)
print(confus.matrix)
accuracy = sum(diag(confus.matrix))/sum(confus.matrix)
cat("accuracy =",accuracy)
            predict
real         setosa versicolor virginica
  setosa         12          0         0
  versicolor      0         16         0
  virginica       0          2        15
accuracy = 0.9555556

2. Support Vector Regression

Also use the svm() function in R package 'e1071'.

2.1 Run SVR under default parameters

In [26]:
data = data.frame(x=1:20,
                  y=c(3,4,8,2,6,10,12,13,15,14,17,18,20,17,21,22,25,30,29,31))
In [27]:
plot(data$x, data$y, pch=16, xlab="X", ylab="Y")
In [29]:
## by linear regression
model <- lm(y ~ x , data) 
lm.pred = predict(model, data)
In [30]:
plot(data$x, data$y, pch=16, xlab="X", ylab="Y")
points(lm.pred, pch=2, col="red")
abline(model, col="red")
In [31]:
## by SVR
model <- svm(y ~ x , data)
In [32]:
svr.pred = predict(model, data)
In [33]:
plot(data$x, data$y, pch=16, xlab="X", ylab="Y")
points(svr.pred, pch=4, col="blue") ## not linear
In [37]:
## Check the accuracy
cat("Mean Error by lm is: ",sqrt(mean((data$y - lm.pred)^2)),"\n")
cat("Mean Error by SVR is: ",sqrt(mean((data$y - svr.pred)^2)))
Mean Error by lm is:  1.914203 
Mean Error by SVR is:  1.795094

2.2 Turn parameters for SVR

In [39]:
# tune cost and epsilon in SVR
tune.model = tune(svm,
                  y~x,
                  data=data,
                  range=list(cost=2^(2:9), epsilon = seq(0,1,0.1))
)
In [40]:
plot(tune.model)
In [42]:
model <- svm(y ~ x , data,cost = tune.model$best.parameters$cost,epsilon = tune.model$best.parameters$epsilon)
svr.pred = predict(model, data)
In [44]:
## Check the accuracy
cat("Mean Error by tuned SVR is: ",sqrt(mean((data$y - svr.pred)^2)))
Mean Error by tuned SVR is:  1.658112