Package weka.clusterers
Class EM
- java.lang.Object
-
- All Implemented Interfaces:
java.io.Serializable,java.lang.Cloneable,Clusterer,DensityBasedClusterer,NumberOfClustersRequestable,CapabilitiesHandler,OptionHandler,Randomizable,RevisionHandler,WeightedInstancesHandler
public class EM extends RandomizableDensityBasedClusterer implements NumberOfClustersRequestable, WeightedInstancesHandler
Simple EM (expectation maximisation) class.
EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters. EM can decide how many clusters to create by cross validation, or you may specify apriori how many clusters to generate.
The cross validation performed to determine the number of clusters is done in the following steps:
1. the number of clusters is set to 1
2. the training set is split randomly into 10 folds.
3. EM is performed 10 times using the 10 folds the usual CV way.
4. the loglikelihood is averaged over all 10 results.
5. if loglikelihood has increased the number of clusters is increased by 1 and the program continues at step 2.
The number of folds is fixed to 10, as long as the number of instances in the training set is not smaller 10. If this is the case the number of folds is set equal to the number of instances. Valid options are:-N <num> number of clusters. If omitted or -1 specified, then cross validation is used to select the number of clusters.
-I <num> max iterations. (default 100)
-V verbose.
-M <num> minimum allowable standard deviation for normal density computation (default 1e-6)
-O Display model in old format (good when there are many clusters)
-S <num> Random number seed. (default 100)
- Version:
- $Revision: 9988 $
- Author:
- Mark Hall (mhall@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description EM()Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidbuildClusterer(Instances data)Generates a clusterer.double[]clusterPriors()Returns the cluster priors.java.lang.StringdebugTipText()Returns the tip text for this propertyjava.lang.StringdisplayModelInOldFormatTipText()Returns the tip text for this propertyCapabilitiesgetCapabilities()Returns default capabilities of the clusterer (i.e., the ones of SimpleKMeans).double[][][]getClusterModelsNumericAtts()Return the normal distributions for the cluster modelsdouble[]getClusterPriors()Return the priors for the clustersbooleangetDebug()Get debug modebooleangetDisplayModelInOldFormat()Get whether to display model output in the old, original format.intgetMaxIterations()Get the maximum number of iterationsdoublegetMinStdDev()Get the minimum allowable standard deviation.intgetNumClusters()Get the number of clustersjava.lang.String[]getOptions()Gets the current settings of EM.java.lang.StringgetRevision()Returns the revision string.java.lang.StringglobalInfo()Returns a string describing this clustererjava.util.EnumerationlistOptions()Returns an enumeration describing the available options.double[]logDensityPerClusterForInstance(Instance inst)Computes the log of the conditional density (per cluster) for a given instance.static voidmain(java.lang.String[] argv)Main method for testing this class.java.lang.StringmaxIterationsTipText()Returns the tip text for this propertyjava.lang.StringminStdDevTipText()Returns the tip text for this propertyintnumberOfClusters()Returns the number of clusters.java.lang.StringnumClustersTipText()Returns the tip text for this propertyvoidsetDebug(boolean v)Set debug mode - verbose outputvoidsetDisplayModelInOldFormat(boolean d)Set whether to display model output in the old, original format.voidsetMaxIterations(int i)Set the maximum number of iterations to performvoidsetMinStdDev(double m)Set the minimum value for standard deviation when calculating normal density.voidsetMinStdDevPerAtt(double[] m)voidsetNumClusters(int n)Set the number of clusters (-1 to select by CV).voidsetOptions(java.lang.String[] options)Parses a given list of options.java.lang.StringtoString()Outputs the generated clusters into a string.-
Methods inherited from class weka.clusterers.RandomizableDensityBasedClusterer
getSeed, seedTipText, setSeed
-
Methods inherited from class weka.clusterers.AbstractDensityBasedClusterer
distributionForInstance, logDensityForInstance, logJointDensitiesForInstance, makeCopies
-
Methods inherited from class weka.clusterers.AbstractClusterer
clusterInstance, forName, makeCopies, makeCopy
-
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface weka.clusterers.Clusterer
clusterInstance
-
-
-
-
Method Detail
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing this clusterer- Returns:
- a description of the evaluator suitable for displaying in the explorer/experimenter gui
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptionsin interfaceOptionHandler- Overrides:
listOptionsin classRandomizableDensityBasedClusterer- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.ExceptionParses a given list of options. Valid options are:-N <num> number of clusters. If omitted or -1 specified, then cross validation is used to select the number of clusters.
-I <num> max iterations. (default 100)
-V verbose.
-M <num> minimum allowable standard deviation for normal density computation (default 1e-6)
-O Display model in old format (good when there are many clusters)
-S <num> Random number seed. (default 100)
- Specified by:
setOptionsin interfaceOptionHandler- Overrides:
setOptionsin classRandomizableDensityBasedClusterer- Parameters:
options- the list of options as an array of strings- Throws:
java.lang.Exception- if an option is not supported
-
displayModelInOldFormatTipText
public java.lang.String displayModelInOldFormatTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDisplayModelInOldFormat
public void setDisplayModelInOldFormat(boolean d)
Set whether to display model output in the old, original format.- Parameters:
d- true if model ouput is to be shown in the old format
-
getDisplayModelInOldFormat
public boolean getDisplayModelInOldFormat()
Get whether to display model output in the old, original format.- Returns:
- true if model ouput is to be shown in the old format
-
minStdDevTipText
public java.lang.String minStdDevTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMinStdDev
public void setMinStdDev(double m)
Set the minimum value for standard deviation when calculating normal density. Reducing this value can help prevent arithmetic overflow resulting from multiplying large densities (arising from small standard deviations) when there are many singleton or near singleton values.- Parameters:
m- minimum value for standard deviation
-
setMinStdDevPerAtt
public void setMinStdDevPerAtt(double[] m)
-
getMinStdDev
public double getMinStdDev()
Get the minimum allowable standard deviation.- Returns:
- the minumum allowable standard deviation
-
numClustersTipText
public java.lang.String numClustersTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNumClusters
public void setNumClusters(int n) throws java.lang.ExceptionSet the number of clusters (-1 to select by CV).- Specified by:
setNumClustersin interfaceNumberOfClustersRequestable- Parameters:
n- the number of clusters- Throws:
java.lang.Exception- if n is 0
-
getNumClusters
public int getNumClusters()
Get the number of clusters- Returns:
- the number of clusters.
-
maxIterationsTipText
public java.lang.String maxIterationsTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMaxIterations
public void setMaxIterations(int i) throws java.lang.ExceptionSet the maximum number of iterations to perform- Parameters:
i- the number of iterations- Throws:
java.lang.Exception- if i is less than 1
-
getMaxIterations
public int getMaxIterations()
Get the maximum number of iterations- Returns:
- the number of iterations
-
debugTipText
public java.lang.String debugTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDebug
public void setDebug(boolean v)
Set debug mode - verbose output- Parameters:
v- true for verbose output
-
getDebug
public boolean getDebug()
Get debug mode- Returns:
- true if debug mode is set
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of EM.- Specified by:
getOptionsin interfaceOptionHandler- Overrides:
getOptionsin classRandomizableDensityBasedClusterer- Returns:
- an array of strings suitable for passing to setOptions()
-
getClusterModelsNumericAtts
public double[][][] getClusterModelsNumericAtts()
Return the normal distributions for the cluster models- Returns:
- a
double[][][]value
-
getClusterPriors
public double[] getClusterPriors()
Return the priors for the clusters- Returns:
- a
double[]value
-
toString
public java.lang.String toString()
Outputs the generated clusters into a string.- Overrides:
toStringin classjava.lang.Object- Returns:
- the clusterer in string representation
-
numberOfClusters
public int numberOfClusters() throws java.lang.ExceptionReturns the number of clusters.- Specified by:
numberOfClustersin interfaceClusterer- Specified by:
numberOfClustersin classAbstractClusterer- Returns:
- the number of clusters generated for a training dataset.
- Throws:
java.lang.Exception- if number of clusters could not be returned successfully
-
getCapabilities
public Capabilities getCapabilities()
Returns default capabilities of the clusterer (i.e., the ones of SimpleKMeans).- Specified by:
getCapabilitiesin interfaceCapabilitiesHandler- Specified by:
getCapabilitiesin interfaceClusterer- Overrides:
getCapabilitiesin classAbstractClusterer- Returns:
- the capabilities of this clusterer
- See Also:
Capabilities
-
buildClusterer
public void buildClusterer(Instances data) throws java.lang.Exception
Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.- Specified by:
buildClustererin interfaceClusterer- Specified by:
buildClustererin classAbstractClusterer- Parameters:
data- set of instances serving as training data- Throws:
java.lang.Exception- if the clusterer has not been generated successfully
-
clusterPriors
public double[] clusterPriors()
Returns the cluster priors.- Specified by:
clusterPriorsin interfaceDensityBasedClusterer- Specified by:
clusterPriorsin classAbstractDensityBasedClusterer- Returns:
- the cluster priors
-
logDensityPerClusterForInstance
public double[] logDensityPerClusterForInstance(Instance inst) throws java.lang.Exception
Computes the log of the conditional density (per cluster) for a given instance.- Specified by:
logDensityPerClusterForInstancein interfaceDensityBasedClusterer- Specified by:
logDensityPerClusterForInstancein classAbstractDensityBasedClusterer- Parameters:
inst- the instance to compute the density for- Returns:
- an array containing the estimated densities
- Throws:
java.lang.Exception- if the density could not be computed successfully
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevisionin interfaceRevisionHandler- Overrides:
getRevisionin classAbstractClusterer- Returns:
- the revision
-
main
public static void main(java.lang.String[] argv)
Main method for testing this class.- Parameters:
argv- should contain the following arguments:-t training file [-T test file] [-N number of clusters] [-S random seed]
-
-