SVMlight: Support Vector Machine
Author: Thorsten
Joachims <thorsten@ls8.cs.uni-dortmund.de>
University of Dortmund, Informatik, AI-Unit Collaborative
Research Center on 'Complexity Reduction in Multivariate Data'
(SFB475) Version: 3.02 Date: 16.11.99
OverviewSVMlight is an implementation of
Support Vector Machines (SVMs) in C. The main features of the program are
the following:
- fast optimization algorithm
- working set selection based on steepest feasible descent
- "shrinking" heuristic
- caching of kernel evaluations
- use of folding in the linear case
- includes algorithm for approximately training large transductive
SVMs (TSVMs)
- can train SVMs with cost models
- handles many thousands of support vectors
- handles several ten-thousands of training examples
- supports standard kernel functions and lets you define your own
- uses sparse vector representation
DescriptionSVMlight is an implementation of
Vapnik's Support Vector Machine [Vapnik,
1995] for the problem of pattern recognition. The optimization
algorithm used in SVMlight is described in [Joachims,
1999a]. The algorithm has scalable memory requirements and can handle
problems with many thousands of support vectors efficiently. This new
version also includes a new algorithm for training large-scale
transductive SVMs. The algorithm proceeds by solving a sequence of
optimization problems lower-bounding the solution using a form of local
search. A detailed description of the algorithm can be found in [Joachims,
1999c]. You can now also use SVMs with cost models (see [Morik
et al., 1999]). The code has been used on a large range of problems,
including text classification [Joachims,
1999c][Joachims,
1998a], several image recognition tasks, and medical applications.
Many tasks have the property of sparse instance vectors. This
implementation makes use of this property which leads to a very compact
and efficient representation.
Source CodeThe source code is free for scientific use. Please
contact me, if you are planning to use the software for commercial
purposes. The software must not be modified and distributed without prior
permission of the author. If you use SVMlight in your
scientific work, please cite as
- T. Joachims, Making large-Scale SVM Learning Practical. Advances in
Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and
A. Smola (ed.), MIT-Press, 1999.
[PDF]
[Postscript
(gz)] I would also appreciate, if you sent me (a link to)
your papers so that I can learn about your research. The implementation
was developed on Solaris 2.5 with gcc, but compiles also on SunOS 3.1.4,
Solaris 2.7, Linux, IRIX, Windows NT, and Powermac (after small
modifications, see FAQ).
The source code is available at the following location:
Please send me email
and let me know that you got svm-light. I will put you on my mailing list
to inform you about new versions and bug-fixes. SVMlight
comes with a quadratic programming tool for solving small intermediate
quadratic programming problems. It is based on the method of Hildreth and
D'Espo and solves small quadratic programs very efficiently. Nevertheless,
if for some reason you want to use another solver, the new version still
comes with an interface to PR_LOQO. The PR_LOQO optimizer was written
by A. Smola. It can be
requested from http://svm.first.gmd.de/software/loqosurvey.html.
InstallationTo install SVMlight you need to
download svm_light.tar.gz. Create a new directory:
Move svm_light.tar.gz to this
directory and unpack it with
gunzip -c svm_light.tar.gz | tar xvf - Now execute
which compiles the system and creates the two
executables
svm_learn (learning
module) svm_classify (classification
module) If you do not want to use the built-in optimizer but
PR_LOQO instead, create a subdirectory in the svm_light directory
with
and copy the files pr_loqo.c and
pr_loqo.h (which you received by email) in there. Now execute
If the system does not compile
properly, check this FAQ.
How to useThis section explains how to use the
SVMlight software. A good introduction to the theory of
SVMs is Chris Burges' tutorial.
SVMlight consists of a learning module
(svm_learn) and a classification module (svm_classify).
The classification module can be used to apply the learned model to new
examples. See also the examples below for how to use svm_learn
and svm_classify.
svm_learn is called with the following parameters:
svm_learn [options] example_file model_file Available
options are:
General options:
-? -> this
help -v
[0..3] -> verbosity level (default 1)
Learning options:
-c
float -> C: trade-off between training error
and margin (default 1000)
-j
float -> Cost: cost-factor, by which training
errors on
positive examples outweight errors on negative
examples (default 1)
-b
[0,1] -> use biased hyperplane (i.e. x*w+b>0)
instead
of unbiased hyperplane (i.e. x*w>0) (default 1)
-i
[0,1] -> remove inconsistent training examples
and retrain (default 0) Transduction options:
-p
[0..1] -> fraction of unlabeled examples to be
classified
into the positive class (default is the ratio of
positive and negative examples in the training data) Kernel
options:
-t int -> type of kernel function:
0: linear (default)
1: polynomial (s a*b+c)^d
2: radial basis function exp(-gamma ||a-b||^2)
3: sigmoid tanh(s a*b + c)
4: user defined kernel from kernel.h
-d
int -> parameter d in polynomial
kernel -g
float -> parameter gamma in rbf kernel
-s
float -> parameter s in sigmoid/poly kernel
-r
float -> parameter c in sigmoid/poly kernel
-u
string -> parameter of user defined kernel
Optimization options:
-q [2..400]
-> maximum size of QP-subproblems (default 10)
-m
[5..] -> size of cache for kernel evaluations in MB
(default 40)
The larger the faster...
-e
float -> eps: Allow that error for termination
criterion
[y [w*x+b] - 1] >= eps (default 0.001)
-h
[5..] -> number of iterations a variable needs to
be
optimal before considered for shrinking (default 100)
-f
[0,1] -> do final optimality check for variables
removed
by shrinking. Although this test is usually
positive, there is no guarantee that the optimum
was found if the test is omitted. (default 1) Output
options:
-l char -> file to write predicted labels of
unlabeled
examples into after transductive learning
-a
char -> write all alphas to this file after
learning
(in the same order as in the training set) The input file
example_file contains the training examples. The first lines may
contain comments and are ignored if they start with #. Each of the
following lines represents one training example and is of the following
format:
<class> .=. +1 | -1 | 0
<feature> .=. integer <value> .=.
real <line> .=. <class>
<feature>:<value> <feature>:<value> ...
<feature>:<value> The class label and each of the
feature/value pairs are separated by a space character. Feature/value
pairs MUST be ordered by increasing feature number. Features with value
zero can be skipped. The +1 as class label marks a positive example, -1 a
negative example respectively. A class label of 0 indicates that this
example should be classified using transduction. The predictions for the
examples classified by transduction are written to the file specified
through the -l option. The order of the predictions is the same as in the
training data.
The result of svm_learn is the model which is learned from the
training data in example_file. The model is written to
model_file. To classify test examples, svm_classify
reads this file. svm_classify is called with the following
parameters:
svm_classify [options] example_file model_file
output_file Available options are:
-h ->
Help. -v [0..3] -> Verbosity level (default
2). -f [0,1] -> 0: old output format of
V1.0
1: output the value of decision function (default) All test
examples in example_file are classified and the predicted classes
are written to output_file. There is one line per test example in
output_file containing the value of the decision function on that
example. The test example file has the same format as the one for
svm_learn. Again, <class> can have the value zero
indicating unknown.
If you want to find out more, try this FAQ.
Getting started: an Example Problem
Inductive SVMYou will find an example text classification problem
at
Download
this file into your svm_light directory and unpack it with
gunzip -c example1.tar.gz | tar xvf - This will create a
subdirectory example1. Documents are represented as feature
vectors. Each feature corresponds to a word stem (9947 features). The task
is to learn which Reuters
articles are about "corporate acquisitions". There are 1000 positive
and 1000 negative examples in the file train.dat. The file
test.dat contains 600 test examples. The feature numbers
correspond to the line numbers in the file words. To run the
example, execute the commands:
svm_learn example1/train.dat example1/model
svm_classify example1/test.dat example1/model
example1/predictions The accuracy on the test set is printed to
stdout.
Transductive SVMTo try out the transductive learner, you can use
the following dataset. I compiled it from the same Reuters articles as
used in the example for the inductive SVM. The dataset consists of only 10
training examples (5 positive and 5 negative) and the same 600 test
examples as above. You find it at
Download
this file into your svm_light directory and unpack it with
gunzip -c example2.tar.gz | tar xvf - This will create a
subdirectory example2. To run the example, execute the
commands:
svm_learn example2/train_transduction.dat example2/model
svm_classify example2/test.dat example2/model
example2/predictions The classification module is called only to
get the accuracy printed. The transductive learner is invoced
automatically, since train_transduction.dat contains
unlabeled examples (i. e. the 600 test examples). You can compare the
results to those of the inductive SVM by running:
svm_learn example2/train_induction.dat
example2/model svm_classify example2/test.dat
example2/model example2/predictions The file
train_induction.dat contains the same 10 (labeled) training
examples as train_transduction.dat.
Extensions and Additions
Questions and Bug ReportsIf you find bugs or you have problems
with the code you cannot solve by yourself, please contact me via email
<svm-light@ls8.cs.uni-dortmund.de>.
DisclaimerThis software is free only for non-commercial use. It
must not be modified and distributed without prior permission of the
author. The author is not responsible for implications from the use of
this software.
History
V3.01 -> V3.02
- Now examples can be read in correctly on SGIs.
V3.00 -> V3.01
- Fixed rare convergence bug for Hildreth and D'Espo solver.
V2.01 -> V3.00
- Training algorithm for transductive Support Vector Machines.
- Integrated core QP-solver based on the method of Hildreth and
D'Espo.
- Uses folding in the linear case, which speeds up linear SVM training
by an order of magnitude.
- Allows linear cost models.
- Faster in general.
V2.00 -> V2.01
V1.00 -> V2.00
- Learning is much faster especially for large training sets.
- Working set selection based on steepest feasible descent.
- "Shrinking" heuristic.
- Improved caching.
- New solver for intermediate QPs.
- Lets you set the size of the cache in MB.
- Simplified output format of svm_classify.
- Data files may contain comments.
V0.91 -> V1.00
- Learning is more than 4 times faster.
- Smarter caching and optimization.
- You can define your own kernel function.
- Lets you set the size of the cache.
- VCdim is now estimated based on the radius of the support vectors.
- The classification module is more memory efficient.
- The f2c library is available from here.
- Adaptive precision tuning makes optimization more robust.
- Includes some small bug fixes and is more robust.
- Source code for SVMlight
V1.00
V0.9 -> V0.91
- Fixed small bug which appears for very small C. Optimization did not
converge.
References
| [Joachims/99a] |
T. Joachims, 11 in: Making large-Scale SVM
Learning Practical. Advances in Kernel Methods - Support Vector
Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT-Press,
1999. [PDF]
[Postscript
(gz)] |
| [Joachims/99c] |
Thorsten Joachims, Transductive Inference
for Text Classification using Support Vector Machines.
International Conference on Machine Learning (ICML), 1999. [PDF]
[Postscript
(gz)] |
| [Joachims/98a] |
T. Joachims, Text Categorization with
Support Vector Machines: Learning with Many Relevant Features.
European Conference on Machine Learning (ECML), Claire Nédellec and
Céline Rouveirol (ed.), 1998. [PDF]
[Postscript
(gz)] |
| [Joachims/98c] |
Thorsten Joachims, Making large-Scale SVM
Learning Practical. LS8-Report, 24, Universität Dortmund, LS
VIII-Report, 1998. [PDF]
[Postscript
(gz)] |
| [Vapnik/95a] |
Vladimir N. Vapnik, The Nature of
Statistical Learning Theory. Springer,
1995.
|
Other SVM Resources
Last modified July 20th, 1999 by Thorsten
Joachims <thorsten@ls8.cs.uni-dortmund.de> |