Frequently Asked Questions

How fast is Support Vector Machine (SVM) learning?

Users often find the SVM learning algorithm to take too much time. This, however, may be a reflection of the particular choice of learning parameters. For instance, attempting to create a hard-margin SVM (with very high complexity parameter C ) using a linear kernel in a non-linearly separable dataset will diverge and possibly lead to an infinite loop. In those cases, consider using the automatic parameter estimation routines in SequentialMinimalOptimization and in the chosen Kernel function. Values returned by those routines may lead to more sparse machines and faster training times.

For best initial results, please set the UseComplexityHeuristic property of the SMO learning algorithm to true when learning your initial machine. This will give you an indication of an appropriate running time for your problem. Also, if you are using Gaussian kernels , try estimating the kernel parameters from the data using the Gaussian.Estimate static method of your kernel's class. The initial values returned from those heuristics can be used as a starting point for hyperparameter tuning.

Why I am getting OutOfMemoryExceptions when training my Vector Machines?

The framework automatically builds a kernel function cache to help speed up computations during SVM learning. However, there are cases that this cache may take too much memory and lead to such exceptions. To make a balance between memory consumption and CPU speed, set the CacheSize property to a lower value. The default is to store all input vectors in the cache; setting it to something lower (such as 1/20 the number of training samples) might help.

Why can't I open Excel worksheets using ExcelReader ?

To open Excel files, you have to install the Microsoft Access Database Engine 2010 Redistributable , available here . This engine, also known as ACE, is a replacement for JET, the old ODBC driver which could be used to open Excel 2003 workbooks. In order to use ACE in both 32-bit and 64-bit applications, you need to install both redistributables from the Microsoft website. To install them both, you need to use the /passive command line switch to prevent the installation from failing once it detects another component version already installed. After downloading the executables, run:

  • C:\Users\You\Downloads\AccessDatabaseEngine.exe /passive
  • C:\Users\You\Downloads\AccessDatabaseEngine_x64.exe /passive

Why am I getting "non-positive definite" or "variance is zero" exceptions?

If you encounter any of the exceptions:

  • "Variance is zero. Try specifying a regularization constant in the fitting options", or
  • "Covariance matrix is not positive definite. Try specifying a regularization constant in the fitting options"

This might be a sign that the problem you are trying to solve is a tricky one, or the model that is being assumed is not a very likely candidate for the samples you have.

One of the possible reasons for those exceptions is the presence of a constant column in your data (i.e. a column that only has the same value). This is a problem because the standard deviation (or variance) for such variables will be zero, and any methods making the assumption that the variance will be strictly positive (i.e. Gaussian models) will fail. One possible way to overcome this situation is to add a regularization constant to the variance calculations; this way the variance will be very small, but will be different from zero.

Please note that the addition of this constant will likely reduce the accuracy of your models. But since the models were already making wrong assumptions about your data, this could be an indication that other models may be more indicated for your particular problem after all.

On a final note, the way on how to add this regularization constant depends on the model being estimated; but most likely the solution for this problem would involve creating a NormalOptions object, setting its Regularization property to some small value, and pass it along to the model fitting algorithm. For examples involving Hidden Markov Models, please see the examples on the BaumWelchLearning documentation page.

Update: On the next version after 2.11, a new way to work with non-positive definite matrices within learning algorithms will be added in the framework. This new approach may help improve the accuracy of such models.

comments powered by Disqus