TFIDF Class |
Namespace: Accord.MachineLearning
[SerializableAttribute] public class TFIDF : ParallelLearningBase, ITransform<string[], double[]>, ICovariantTransform<string[], double[]>, ITransform, ITransform<string[], Sparse<double>>, ICovariantTransform<string[], Sparse<double>>, IUnsupervisedLearning<TFIDF, string[], double[]>
The TFIDF type exposes the following members.
Name | Description | |
---|---|---|
TFIDF |
Constructs a new TFIDF.
| |
TFIDF(String) |
Constructs a new TFIDF.
|
Name | Description | |
---|---|---|
Counts |
Gets the number of documents that contain each code word. Each element
is associated with a word, and the value of the element gives the number
of documents that contain this word.
| |
Idf |
Gets or sets the inverse document frequency (IDF) definition to be used.
| |
InverseDocumentFrequency |
Gets the inverse document frequency vector used to scale term-frequency vectors.
| |
NumberOfDocuments |
Gets the total number of documents considered by this TF-IDF.
| |
NumberOfInputs |
Gets the number of inputs accepted by the model.
| |
NumberOfOutputs |
Gets the number of outputs generated by the model.
| |
NumberOfWords |
Gets the number of words in this codebook.
| |
ParallelOptions |
Gets or sets the parallelization options for this algorithm.
(Inherited from ParallelLearningBase.) | |
Tf |
Gets or sets the term frequency (TF) definition to be used.
| |
Token |
Gets or sets a cancellation token that can be used
to cancel the algorithm while it is running.
(Inherited from ParallelLearningBase.) | |
UpdateDictionary |
Gets or sets a value indicating whether new words should be added to the
dictionary in the next call to Learn(String, Double).
|
Name | Description | |
---|---|---|
Equals | Determines whether the specified object is equal to the current object. (Inherited from Object.) | |
Finalize | Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.) | |
GetHashCode | Serves as the default hash function. (Inherited from Object.) | |
GetType | Gets the Type of the current instance. (Inherited from Object.) | |
Learn |
Learns a model that can map the given inputs to the desired outputs.
| |
MemberwiseClone | Creates a shallow copy of the current Object. (Inherited from Object.) | |
ToString | Returns a string that represents the current object. (Inherited from Object.) | |
Transform(String) |
Applies the transformation to an input, producing an associated output.
| |
Transform(String) |
Applies the transformation to a set of input vectors,
producing an associated set of output vectors.
| |
Transform(String, SparseDouble) |
Applies the transformation to an input, producing an associated output.
| |
Transform(String, SparseDouble) |
Applies the transformation to an input, producing an associated output.
| |
Transform(String, Double) |
Applies the transformation to an input, producing an associated output.
| |
Transform(String, SparseDouble) |
Applies the transformation to a set of input vectors,
producing an associated set of output vectors.
| |
Transform(String, SparseDouble) |
Applies the transformation to a set of input vectors,
producing an associated set of output vectors.
| |
Transform(String, Double) |
Applies the transformation to a set of input vectors,
producing an associated set of output vectors.
|
Name | Description | |
---|---|---|
HasMethod |
Checks whether an object implements a method with the given name.
(Defined by ExtensionMethods.) | |
IsEqual |
Compares two objects for equality, performing an elementwise
comparison if the elements are vectors or matrices.
(Defined by Matrix.) | |
To(Type) | Overloaded.
Converts an object into another type, irrespective of whether
the conversion can be done at compile time or not. This can be
used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.) | |
ToT | Overloaded.
Converts an object into another type, irrespective of whether
the conversion can be done at compile time or not. This can be
used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.) |
// The Term-Frequency/Inverse-Document-Frequency model can be used to // extract finite-length feature vectors from sequences of arbitrary // length, like for example, texts: string[] texts = { @"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas molestie malesuada nisi et placerat. Curabitur blandit porttitor suscipit. Nunc facilisis ultrices felis, vitae luctus arcu semper in. Fusce ut felis ipsum. Sed faucibus tortor ut felis placerat euismod. Vestibulum pharetra velit et dolor ornare quis malesuada leo aliquam. Aenean lobortis, tortor iaculis vestibulum dictum, tellus nisi vestibulum libero, ultricies pretium nisi ante in neque. Integer et massa lectus. Aenean ut sem quam. Mauris at nisl augue, volutpat tempus nisl. Suspendisse luctus convallis metus, vitae pretium risus pretium vitae. Duis tristique euismod aliquam", @"Sed consectetur nisl et diam mattis varius. Aliquam ornare tincidunt arcu eget adipiscing. Etiam quis augue lectus, vel sollicitudin lorem. Fusce lacinia, leo non porttitor adipiscing, mauris purus lobortis ipsum, id scelerisque erat neque eget nunc. Suspendisse potenti. Etiam non urna non libero pulvinar consequat ac vitae turpis. Nam urna eros, laoreet id sagittis eu, posuere in sapien. Phasellus semper convallis faucibus. Nulla fermentum faucibus tellus in rutrum. Maecenas quis risus augue, eu gravida massa." }; string[][] words = texts.Tokenize(); // Create a new TF-IDF with options: var codebook = new TFIDF() { Tf = TermFrequency.Log, Idf = InverseDocumentFrequency.Default }; // Compute the codebook (note: this would have to be done only for the training set) codebook.Learn(words); // Now, we can use the learned codebook to extract fixed-length // representations of the different texts (paragraphs) above: // Extract a feature vector from the text 1: double[] bow1 = codebook.Transform(words[0]); // Extract a feature vector from the text 2: double[] bow2 = codebook.Transform(words[1]); // we could also have transformed everything at once, i.e. // double[][] bow = codebook.Transform(words); // Now, since we have finite length representations (both bow1 and bow2 should // have the same size), we can pass them to any classifier or machine learning // method. For example, we can pass them to a Logistic Regression Classifier to // discern between the first and second paragraphs // Lets create a Logistic classifier to separate the two paragraphs: var learner = new IterativeReweightedLeastSquares<LogisticRegression>() { Tolerance = 1e-4, // Let's set some convergence parameters Iterations = 100, // maximum number of iterations to perform Regularization = 0 }; // Now, we use the learning algorithm to learn the distinction between the two: LogisticRegression reg = learner.Learn(new[] { bow1, bow2 }, new[] { false, true }); // Finally, we can predict using the classifier: bool c1 = reg.Decide(bow1); // Should be false bool c2 = reg.Decide(bow2); // Should be true