# Illustrations of Support Vector Machines

##### Posted on 0 Comments

Use the e1071 library in R to demonstrate the support vector classifier and the SVM.

## Implement in R

```
library(e1071)
svm() ## kernel = "linear"
## small "cost" means wide margins, while large "cost" means narrow margins
```

An intuition is that large “cost” $C$ would be more strict for the misclassification, and hence narrow margins. Specially, $C = \infty$ corresponds to the separable case.

*Refer to 12.2 支持向量分类器 - ESL CN for more details.*

## Support Vector Classifier

demonstration for two-dimensional example

### generate the data

```
set.seed(1)
x = matrix(rnorm(20*2), ncol = 2)
y = c(rep(-1 ,10), rep(1 ,10) )
x[y == 1, ] = x[y == 1, ] + 1
plot(x , col = 3-y )
```

The classes are NOT linearly separable.

### fit the support vector classification

It is worth noting that we must encode the response as a **factor variable** in order to perform classification.

```
dat = data.frame(x = x, y = as.factor(y))
library(e1071)
svmfit = svm (y ~ ., data = dat, kernel = "linear", cost = 10, scale = FALSE)
plot(svmfit, dat)
```

### determine the identities

```
svmfit$index
summary(svmfit)
```

### a smaller cost

```
svmfit = svm ( y ∼. , data = dat, kernel = "linear" , cost = 0.1, scale = FALSE )
plot(svmfit, dat)
svmfit$index
```

After setting a smaller cost, we will get a larger number of support vectors, because the margin is now wider.

### cross-validation

We can use ``tune()” to perform ten-fold cross-validation on a set of models of interest.

```
set.seed(1)
tune.out = tune (svm, y ~ ., data = dat, kernel ="linear",
ranges = list ( cost = c (0.001 , 0.01 , 0.1 , 1 ,5 ,10 ,100) ) )
```

The output is as follows:

So “cost = 0.1” results in the lowest cross-validation error rate, and we can access the best model as follows:

```
bestmod = tune.out$best.model
summary( bestmod )
```

### prediction

```
xtest = matrix ( rnorm (20*2) , ncol =2)
ytest = sample ( c ( -1 ,1) , 20 , rep = TRUE )
xtest [ ytest ==1 ,]= xtest [ ytest ==1, ] + 1
testdat = data.frame ( x = xtest , y = as.factor ( ytest ) )
ypred = predict ( bestmod , testdat )
table ( predict = ypred , truth = testdat$y )
```

## Support Vector Machine

### generate some data with a non-linear class boundary

```
set.seed(1)
x = matrix ( rnorm (200*2) , ncol =2)
x [1:100, ]= x [1:100, ]+2
x [101:150, ]= x [101:150, ] -2
y = c ( rep(1, 150) , rep(2, 50) )
dat = data.frame ( x =x , y = as.factor ( y ) )
plot(x, col = y)
```

### randomly split into training and testing groups.

```
train = sample (200 ,100)
```

### training data using radical kernel

```
svmfit = svm(y~. , data = dat [ train ,], kernel = "radial", cost =1)
plot ( svmfit , dat [ train ,])
```

### cross-validation

```
set.seed (1)
tune.out = tune(svm, y∼. , data = dat [ train ,] , kernel =" radial " ,
ranges = list ( cost = c (0.1 ,1 ,10 ,100 ,1000) ,
gamma = c (0.5 ,1 ,2 ,3 ,4) ) )
summary( tune.out )
```

Therefore, the best choice of parameters involves “cost=1” and “gamma=2”.

### prediction

```
table( true = dat [ - train ,"y"] , pred = predict ( tune.out$best.model, newdata = dat [ - train ,]) )
```

## ROC Curves

The ROC curve is a popular graphic for simultaneously displaying the two types of errors for all possible thresholds.

Varying the classifier threshold changes its **true positive rate** (sensitivity) and **false positive rate** (1 - specificity).

The overall performance of a classifier, summarized over all possible thresholds, is given by the area under the (ROC) curve (AUC). An ideal ROC curve will hug the top left corner, so the larger the AUC the better the classifier.

The ROCR package can be used to produce ROC curves.

```
library(ROCR)
rocplot = function ( pred , truth , ...) {
predob = prediction ( pred, truth )
perf = performance ( predob , "tpr" , "fpr")
plot ( perf,...) }
```

```
svmfit.opt = svm (y~., data = dat[train, ], kernel = "radial", gamma =2 , cost =1 , decision.values = T )
fitted = attributes(predict(svmfit.opt, dat[train, ], decision.values = TRUE))$decision.values
```

```
par ( mfrow = c (1 ,2) )
rocplot ( fitted , dat [ train ,"y"] , main ="Training Data")
```

By increasing $\gamma$ we can produce a more flexible fit and generate further improvements in accuracy.

```
svmfit.flex = svm(y~. , data = dat [ train ,] , kernel = "radial", gamma =50 , cost =1 , decision.values = T )
fitted = attributes ( predict ( svmfit.flex , dat [ train ,] , decision.values = T ) ) $decision.values
rocplot ( fitted , dat [ train ,"y"] , add =T , col ="red")
```

But on the test data, $\gamma = 2$ would be better.

```
fitted = attributes(predict(svmfit.opt, dat[-train, ], decision.values = T ) ) $decision.values
rocplot(fitted, dat[-train, "y"] , main = "Test Data")
fitted = attributes(predict(svmfit.flex, dat[-train, ], decision.values = T ) ) $decision.values
rocplot(fitted, dat[-train, "y"] , add =T, col ="red")
```

## SVM with Multiple Classes

```
set.seed (1)
x = rbind (x, matrix ( rnorm (50*2) , ncol =2) )
y = c ( y , rep (0 ,50) )
x [ y ==0 ,2]= x [ y ==0 ,2]+2
dat = data.frame ( x =x , y = as.factor ( y ) )
par(mfrow = c (1 ,1) )
plot (x , col = (y +1) )
```

```
svmfit = svm(y~., data = dat , kernel ="radial" , cost =10 , gamma =1)
plot ( svmfit , dat )
```

The e1071 library can also be used to perform support vector regression, if the response vector that is passed in to svm() is numerical rather than a factor.

for classification (y is factor)

for regression (y is NOT factor)

## Application to Gene Expression Data

### examine the dimension of the data

```
library(ISLR)
names(Khan)
dim(Khan$xtrain)
dim(Khan$xtest)
length(Khan$ytrain)
length(Khan$ytest)
table(Khan$ytrain)
table(Khan$ytest)
```

```
dat = data.frame ( x = Khan$xtrain, y = as.factor(Khan$ytrain))
out = svm ( y~., data = dat , kernel = "linear", cost =10)
summary(out)
table ( out$fitted , dat$y )
```

NO training errors. In fact, this is not surprising, because the large number of variables relative to the number of observations implies that it is easy to find hyperplanes that fully separate the classes.

```
dat.te = data.frame ( x = Khan$xtest , y = as.factor( Khan$ytest ) )
pred.te = predict ( out, newdata = dat.te )
table ( pred.te , dat.te$y )
```

## Reference

An Introduction to Statistical Learning with Applications in R