LogisticRegressionAnalysis Class

Logistic Regression Analysis.

Inheritance Hierarchy

SystemObject
Accord.MachineLearningTransformBaseDouble, Double
Accord.Statistics.AnalysisLogisticRegressionAnalysis

Namespace: Accord.Statistics.Analysis
Assembly: Accord.Statistics (in Accord.Statistics.dll) Version: 3.8.0

Syntax

Copy

[SerializableAttribute]
public class LogisticRegressionAnalysis : TransformBase<double[], double>, 
	IRegressionAnalysis, IMultivariateAnalysis, IAnalysis, ISupervisedLearning<LogisticRegression, double[], int>, 
	ISupervisedLearning<LogisticRegression, double[], double>

<SerializableAttribute>
Public Class LogisticRegressionAnalysis
	Inherits TransformBase(Of Double(), Double)
	Implements IRegressionAnalysis, IMultivariateAnalysis, IAnalysis, ISupervisedLearning(Of LogisticRegression, Double(), Integer), 
	ISupervisedLearning(Of LogisticRegression, Double(), Double)

Request Example View Source

The LogisticRegressionAnalysis type exposes the following members.

Constructors

	Name	Description
	LogisticRegressionAnalysis	Constructs a Logistic Regression Analysis.
	LogisticRegressionAnalysis(Double, Double)	Obsolete. Constructs a Logistic Regression Analysis.
	LogisticRegressionAnalysis(Double, Double, Double)	Obsolete. Constructs a Logistic Regression Analysis.
	LogisticRegressionAnalysis(Double, Double, String, String)	Obsolete. Constructs a Logistic Regression Analysis.
	LogisticRegressionAnalysis(Double, Double, Double, String, String)	Obsolete. Constructs a Logistic Regression Analysis.

Top

Properties

	Name	Description
	Array	Obsolete. Gets the source matrix from which the analysis was run.
	ChiSquare	Gets the Chi-Square (Likelihood Ratio) Test for the model.
	Coefficients	Gets the collection of coefficients of the model.
	CoefficientValues	Gets the value of each coefficient.
	ComputeInnerModels	Gets or sets whether nested models should be computed in order to calculate the likelihood-ratio test of each of the coefficients. Default is false.
	Confidences	Gets the 95% Confidence Intervals (C.I.) for each coefficient found in the regression.
	Deviance	Gets the Deviance of the model.
	InformationMatrix	Gets the information matrix obtained during learning.
	Inputs	Gets or sets the name of the input variables for the model.
	Iterations	Gets or sets the maximum number of iterations to be performed by the regression algorithm. Default is 50.
	LikelihoodRatioTests	Gets the Likelihood-Ratio Tests for each coefficient.
	LogLikelihood	Gets the Log-Likelihood for the model.
	NumberOfInputs	Gets the number of inputs accepted by the model. (Inherited from TransformBaseTInput, TOutput.)
	NumberOfOutputs	Gets the number of outputs generated by the model. (Inherited from TransformBaseTInput, TOutput.)
	NumberOfSamples	Gets the number of samples used to compute the analysis.
	OddsRatios	Gets the Odds Ratio for each coefficient found during the logistic regression.
	Output	Gets or sets the name of the output variable for the model.
	Outputs	Obsolete. Gets the dependent variable value for each of the source input points.
	Regression	Gets the Logistic Regression model created and evaluated by this analysis.
	Regularization	Gets or sets the regularization value to be added in the objective function. Default is 1e-10.
	Result	Obsolete. Gets the resulting probabilities obtained by the logistic regression model.
	Source	Obsolete. Gets the source matrix from which the analysis was run.
	StandardErrors	Gets the Standard Error for each coefficient found during the logistic regression.
	Token	Gets or sets a cancellation token that can be used to stop the learning algorithm while it is running.
	Tolerance	Gets or sets the difference between two iterations of the regression algorithm when the algorithm should stop. The difference is calculated based on the largest absolute parameter change of the regression. Default is 1e-5.
	WaldTests	Gets the Wald Tests for each coefficient.
	Weights	Obsolete. Gets the sample weight associated with each input vector.

Top

Methods

	Name	Description
	Compute	Obsolete. Computes the Logistic Regression Analysis.
	Compute(LogisticRegression)	Obsolete. Computes the Logistic Regression Analysis for an already computed regression.
	Compute(Double, Int32)	Obsolete. Computes the Logistic Regression Analysis.
	Equals	Determines whether the specified object is equal to the current object. (Inherited from Object.)
	Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)
	FromSummary	Obsolete. Creates a new LogisticRegressionAnalysis from summarized data. In summary data, instead of having a set of inputs and their associated outputs, we have the number of times an input vector had a positive label in the data set and how many times it had a negative label.
	GetConfidenceInterval	Gets the confidence interval for a given input.
	GetHashCode	Serves as the default hash function. (Inherited from Object.)
	GetLikelihoodRatio	Obsolete. Gets the Log-Likelihood Ratio between this model and another model.
	GetPredictionInterval	Gets the prediction interval for a given input.
	GetType	Gets the Type of the current instance. (Inherited from Object.)
	Learn(Double, Double, Double)	Learns a model that can map the given inputs to the given outputs.
	Learn(Double, Int32, Double)	Learns a model that can map the given inputs to the given outputs.
	Learn(Double, Int32, Int32)	Learns a model that can map the given inputs to the given outputs.
	MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object.)
	ToString	Returns a string that represents the current object. (Inherited from Object.)
	Transform(TInput)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from TransformBaseTInput, TOutput.)
	Transform(Double)	Applies the transformation to an input, producing an associated output. (Overrides TransformBaseTInput, TOutputTransform(TInput).)
	Transform(TInput, TOutput)	Applies the transformation to an input, producing an associated output. (Inherited from TransformBaseTInput, TOutput.)

Top

Extension Methods

	Name	Description
	HasMethod	Checks whether an object implements a method with the given name. (Defined by ExtensionMethods.)
	IsEqual	Compares two objects for equality, performing an elementwise comparison if the elements are vectors or matrices. (Defined by Matrix.)
	To(Type)	Overloaded. Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime. (Defined by ExtensionMethods.)
	ToT	Overloaded. Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime. (Defined by ExtensionMethods.)

Top

Remarks

The Logistic Regression Analysis tries to extract useful information about a logistic regression model.

This class can also be bound to standard controls such as the DataGridView by setting their DataSource property to the analysis' Coefficients property.

References:

E. F. Connor. Logistic Regression. Available on: http://userwww.sfsu.edu/~efc/classes/biol710/logistic/logisticreg.htm
C. Shalizi. Logistic Regression and Newton's Method. Lecture notes. Available on: http://www.stat.cmu.edu/~cshalizi/350/lectures/26/lecture-26.pdf
A. Storkey. Learning from Data: Learning Logistic Regressors. Available on: http://www.inf.ed.ac.uk/teaching/courses/lfd/lectures/logisticlearn-print.pdf

Examples

The following example shows to create a Logistic regresion analysis using a full dataset composed of input vectors and a binary output vector. Each input vector has an associated label (1 or 0) in the output vector, where 1 represents a positive label (yes, or true) and 0 represents a negative label (no, or false).

Copy

// We can also investigate all parameters individually. For
// example the coefficients values will be available at the
// vector

double[] coef = lra.CoefficientValues;

// The first value refers to the model's intercept term. We
// can also retrieve the odds ratios and standard errors:

double[] odds = lra.OddsRatios;
double[] stde = lra.StandardErrors;

// We can use the analysis to predict a probability for a new patient:
double y = lra.Regression.Probability(new double[] { 87, 1 }); // 0.75

// For those inputs, the answer probability is approximately 75%.

// We can also obtain confidence intervals for the probability:
DoubleRange ci = lra.GetConfidenceInterval(new double[] { 87, 1 });

The resulting table is shown below.

Copy

// Suppose we have the following data about some patients.
// The first variable is continuous and represent patient
// age. The second variable is dichotomic and give whether
// they smoke or not (this is completely fictional data).

double[][] inputs =
{
    //            Age  Smoking
    new double[] { 55,    0   },
    new double[] { 28,    0   },
    new double[] { 65,    1   },
    new double[] { 46,    0   },
    new double[] { 86,    1   },
    new double[] { 56,    1   },
    new double[] { 85,    0   },
    new double[] { 33,    0   },
    new double[] { 21,    1   },
    new double[] { 42,    1   },
};

// Additionally, we also have information about whether
// or not they those patients had lung cancer. The array
// below gives 0 for those who did not, and 1 for those
// who did.

double[] output =
{
    0, 0, 0, 1, 1, 1, 0, 0, 0, 1
};

// Create a Logistic Regression analysis
var lra = new LogisticRegressionAnalysis()
{
    Regularization = 0
};

// compute the analysis
LogisticRegression regression = lra.Learn(inputs, output);

// Now we can show a summary of the analysis
// Accord.Controls.DataGridBox.Show(regression.Coefficients);

The analysis can also be created from data given in a summary form. Instead of having one input vector associated with one positive or negative label, each input vector is associated with the proportion of positive to negative labels in the original dataset.

Copy

// Suppose we have a (fictitious) data set about patients who 
// underwent cardiac surgery. The first column gives the number
// of arterial bypasses performed during the surgery. The second
// column gives the number of patients whose surgery went well,
// while the third column gives the number of patients who had
// at least one complication during the surgery.
// 
int[,] data =
{
    // # of stents       success     complications
    {       1,             140,           45       },
    {       2,             130,           60       },
    {       3,             150,           31       },
    {       4,              96,           65       }
};

// Get input variable and number of positives and negatives
double[][] inputs = data.GetColumn(0).ToDouble().ToJagged();
int[] positive = data.GetColumn(1);
int[] negative = data.GetColumn(2);

// Create a new Logistic Regression Analysis from the summary data
var lra = new LogisticRegressionAnalysis();

// compute the analysis
LogisticRegression regression = lra.Learn(inputs, positive, negative);

// Now we can show a summary of the analysis
// Accord.Controls.DataGridBox.Show(regression.Coefficients);


// We can also investigate all parameters individually. For
// example the coefficients values will be available at the
// vector

double[] coef = lra.CoefficientValues;

// The first value refers to the model's intercept term. We
// can also retrieve the odds ratios and standard errors:

double[] odds = lra.OddsRatios;
double[] stde = lra.StandardErrors;


// Finally, we can use it to estimate risk for a new patient
double y = lra.Regression.Probability(new double[] { 4 }); // 67.0

The last example shows how to learn a logistic regression analysis using data given in the form of a System.Data.DataTable. This data is also heterogeneous, mixing both discrete (symbol) variables and continuous variables. This example is also available for MultipleLinearRegressionAnalysis.

Copy

// Note: this example uses a System.Data.DataTable to represent input data,
// but note that this is not required. The data could have been represented
// as jagged double matrices (double[][]) directly.

// If you have to handle heterogeneus data in your application, such as user records
// in a database, this data is best represented within the framework using a .NET's 
// DataTable object. In order to try to learn a classification or regression model
// using this datatable, first we will need to convert the table into a representation
// that the machine learning model can understand. Such representation is quite often,
// a matrix of doubles (double[][]).
var data = new DataTable("Customer Revenue Example");

data.Columns.Add("Day", "CustomerId", "Time (hour)", "Weather", "Buy");
data.Rows.Add("D1", 0, 8, "Sunny", true);
data.Rows.Add("D2", 1, 10, "Sunny", true);
data.Rows.Add("D3", 2, 10, "Rain", false);
data.Rows.Add("D4", 3, 16, "Rain", true);
data.Rows.Add("D5", 4, 15, "Rain", true);
data.Rows.Add("D6", 5, 20, "Rain", false);
data.Rows.Add("D7", 6, 12, "Cloudy", true);
data.Rows.Add("D8", 7, 12, "Sunny", false);

// One way to perform this conversion is by using a Codification filter. The Codification
// filter can take care of converting variables that actually denote symbols (i.e. the 
// weather in the example above) into representations that make more sense given the assumption
// of a real vector-based classifier.

// Create a codification codebook
var codebook = new Codification()
{
    { "Weather", CodificationVariable.Categorical },
    { "Time (hour)", CodificationVariable.Continuous },
    { "Revenue", CodificationVariable.Continuous },
};

// Learn from the data
codebook.Learn(data);

// Now, we will use the codebook to transform the DataTable into double[][] vectors. Due
// the way the conversion works, we can end up with more columns in your output vectors
// than the ones started with. If you would like more details about what those columns
// represent, you can pass then as 'out' parameters in the methods that follow below.
string[] inputNames;  // (note: if you do not want to run this example yourself, you 
string outputName;    // can see below the new variable names that will be generated)

// Now, we can translate our training data into integer symbols using our codebook:
double[][] inputs = codebook.Apply(data, "Weather", "Time (hour)").ToJagged(out inputNames);
double[] outputs = codebook.Apply(data, "Buy").ToVector(out outputName);
// (note: the Apply method transform a DataTable into another DataTable containing the codified 
//  variables. The ToJagged and ToVector methods are then used to transform those tables into
//  double[][] matrices and double[] vectors, respectively.

// If we would like to learn a logistic regression model for this data, there are two possible
// ways depending on which aspect of the logistic regression we are interested the most. If we
// are interested in interpreting the logistic regression, performing hypothesis tests with the
// coefficients and performing an actual _logistic regression analysis_, then we can use the
// LogisticRegressionAnalysis class for this. If however we are only interested in using
// the learned model directly to predict new values for the dataset, then we could be using the
// LogisticRegression and IterativeReweightedLeastSquares classes directly instead. 

// This example deals with the former case. For the later, please see the documentation page
// for the LogisticRegression class.

// We can create a new multiple linear analysis for the variables
var lra = new LogisticRegressionAnalysis()
{
    // We can also inform the names of the new variables that have been created by the
    // codification filter. Those can help in the visualizing the analysis once it is 
    // data-bound to a visual control such a Windows.Forms.DataGridView or WPF DataGrid:

    Inputs = inputNames, // will be { "Weather: Sunny", "Weather: Rain, "Weather: Cloudy", "Time (hours)" }
    Output = outputName  // will be "Revenue"
};

// Compute the analysis and obtain the estimated regression
LogisticRegression regression = lra.Learn(inputs, outputs);

// And then predict the label using
double predicted = lra.Transform(inputs[0]); // result will be ~0.287

// Because we opted for doing a MultipleLinearRegressionAnalysis instead of a simple
// linear regression, we will have further information about the regression available:
int inputCount = lra.NumberOfInputs;   // should be 4
int outputCount = lra.NumberOfOutputs; // should be 1
double logl = lra.LogLikelihood;       // should be -4.6035570737785525
ChiSquareTest x2 = lra.ChiSquare;      // should be 1.37789 (p=0.8480, non-significant)
double[] stdErr = lra.StandardErrors;  // should be high except for the last value of 0.27122079214927985 (due small data)
double[] or = lra.OddsRatios;          // should be 1.1116659950687609 for the last coefficient (related to time of day)
LogisticCoefficientCollection c = lra.Coefficients; // coefficient table (bind to a visual control for quick inspection)
double[][] h = lra.InformationMatrix;  // should contain Fisher's information matrix for the problem

Reference

Accord.Statistics.Analysis Namespace

Accord.Statistics.AnalysisMultipleLinearRegressionAnalysis