Class Discretize
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.unsupervised.attribute.PotentialClassIgnorer
-
- weka.filters.unsupervised.attribute.Discretize
-
- All Implemented Interfaces:
java.io.Serializable,CapabilitiesHandler,OptionHandler,RevisionHandler,WeightedInstancesHandler,UnsupervisedFilter
- Direct Known Subclasses:
PKIDiscretize
public class Discretize extends PotentialClassIgnorer implements UnsupervisedFilter, WeightedInstancesHandler
An instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Discretization is by simple binning. Skips the class attribute if set. Valid options are:-unset-class-temporarily Unsets the class index temporarily before the filter is applied to the data. (default: no)
-B <num> Specifies the (maximum) number of bins to divide numeric attributes into. (default = 10)
-M <num> Specifies the desired weight of instances per bin for equal-frequency binning. If this is set to a positive number then the -B option will be ignored. (default = -1)
-F Use equal-frequency instead of equal-width discretization.
-O Optimize number of bins using leave-one-out estimate of estimated entropy (for equal-width discretization). If this is set then the -B option will be ignored.
-R <col1,col2-col4,...> Specifies list of columns to Discretize. First and last are valid indexes. (default: first-last)
-V Invert matching sense of column indexes.
-D Output binary attributes for discretized attributes.
- Version:
- $Revision: 8284 $
- Author:
- Len Trigg (trigg@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description Discretize()Constructor - initialises the filterDiscretize(java.lang.String cols)Another constructor, sets the attribute indices immediately
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.StringattributeIndicesTipText()Returns the tip text for this propertybooleanbatchFinished()Signifies that this batch of input to the filter is finished.java.lang.StringbinsTipText()Returns the tip text for this propertyjava.lang.StringdesiredWeightOfInstancesPerIntervalTipText()Returns the tip text for this propertyjava.lang.StringfindNumBinsTipText()Returns the tip text for this propertyjava.lang.StringgetAttributeIndices()Gets the current range selectionintgetBins()Gets the number of bins numeric attributes will be divided intoCapabilitiesgetCapabilities()Returns the Capabilities of this filter.double[]getCutPoints(int attributeIndex)Gets the cut points for an attributedoublegetDesiredWeightOfInstancesPerInterval()Get the DesiredWeightOfInstancesPerInterval value.booleangetFindNumBins()Get the value of FindNumBins.booleangetInvertSelection()Gets whether the supplied columns are to be removed or keptbooleangetMakeBinary()Gets whether binary attributes should be made for discretized ones.java.lang.String[]getOptions()Gets the current settings of the filter.java.lang.StringgetRevision()Returns the revision string.booleangetUseEqualFrequency()Get the value of UseEqualFrequency.java.lang.StringglobalInfo()Returns a string describing this filterbooleaninput(Instance instance)Input an instance for filtering.java.lang.StringinvertSelectionTipText()Returns the tip text for this propertyjava.util.EnumerationlistOptions()Gets an enumeration describing the available options.static voidmain(java.lang.String[] argv)Main method for testing this class.java.lang.StringmakeBinaryTipText()Returns the tip text for this propertyvoidsetAttributeIndices(java.lang.String rangeList)Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).voidsetAttributeIndicesArray(int[] attributes)Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).voidsetBins(int numBins)Sets the number of bins to divide each selected numeric attribute intovoidsetDesiredWeightOfInstancesPerInterval(double newDesiredNumber)Set the DesiredWeightOfInstancesPerInterval value.voidsetFindNumBins(boolean newFindNumBins)Set the value of FindNumBins.booleansetInputFormat(Instances instanceInfo)Sets the format of the input instances.voidsetInvertSelection(boolean invert)Sets whether selected columns should be removed or kept.voidsetMakeBinary(boolean makeBinary)Sets whether binary attributes should be made for discretized ones.voidsetOptions(java.lang.String[] options)Parses a given list of options.voidsetUseEqualFrequency(boolean newUseEqualFrequency)Set the value of UseEqualFrequency.java.lang.StringuseEqualFrequencyTipText()Returns the tip text for this property-
Methods inherited from class weka.filters.unsupervised.attribute.PotentialClassIgnorer
getIgnoreClass, getOutputFormat, ignoreClassTipText, setIgnoreClass
-
Methods inherited from class weka.filters.Filter
batchFilterFile, filterFile, getCapabilities, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, numPendingOutput, output, outputPeek, toString, useFilter, wekaStaticWrapper
-
-
-
-
Method Detail
-
listOptions
public java.util.Enumeration listOptions()
Gets an enumeration describing the available options.- Specified by:
listOptionsin interfaceOptionHandler- Overrides:
listOptionsin classPotentialClassIgnorer- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.ExceptionParses a given list of options. Valid options are:-unset-class-temporarily Unsets the class index temporarily before the filter is applied to the data. (default: no)
-B <num> Specifies the (maximum) number of bins to divide numeric attributes into. (default = 10)
-M <num> Specifies the desired weight of instances per bin for equal-frequency binning. If this is set to a positive number then the -B option will be ignored. (default = -1)
-F Use equal-frequency instead of equal-width discretization.
-O Optimize number of bins using leave-one-out estimate of estimated entropy (for equal-width discretization). If this is set then the -B option will be ignored.
-R <col1,col2-col4,...> Specifies list of columns to Discretize. First and last are valid indexes. (default: first-last)
-V Invert matching sense of column indexes.
-D Output binary attributes for discretized attributes.
- Specified by:
setOptionsin interfaceOptionHandler- Overrides:
setOptionsin classPotentialClassIgnorer- Parameters:
options- the list of options as an array of strings- Throws:
java.lang.Exception- if an option is not supported
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of the filter.- Specified by:
getOptionsin interfaceOptionHandler- Overrides:
getOptionsin classPotentialClassIgnorer- Returns:
- an array of strings suitable for passing to setOptions
-
getCapabilities
public Capabilities getCapabilities()
Returns the Capabilities of this filter.- Specified by:
getCapabilitiesin interfaceCapabilitiesHandler- Overrides:
getCapabilitiesin classFilter- Returns:
- the capabilities of this object
- See Also:
Capabilities
-
setInputFormat
public boolean setInputFormat(Instances instanceInfo) throws java.lang.Exception
Sets the format of the input instances.- Overrides:
setInputFormatin classPotentialClassIgnorer- Parameters:
instanceInfo- an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).- Returns:
- true if the outputFormat may be collected immediately
- Throws:
java.lang.Exception- if the input format can't be set successfully
-
input
public boolean input(Instance instance)
Input an instance for filtering. Ordinarily the instance is processed and made available for output immediately. Some filters require all instances be read before producing output.
-
batchFinished
public boolean batchFinished()
Signifies that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances.- Overrides:
batchFinishedin classFilter- Returns:
- true if there are instances pending output
- Throws:
java.lang.IllegalStateException- if no input structure has been defined
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing this filter- Returns:
- a description of the filter suitable for displaying in the explorer/experimenter gui
-
findNumBinsTipText
public java.lang.String findNumBinsTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getFindNumBins
public boolean getFindNumBins()
Get the value of FindNumBins.- Returns:
- Value of FindNumBins.
-
setFindNumBins
public void setFindNumBins(boolean newFindNumBins)
Set the value of FindNumBins.- Parameters:
newFindNumBins- Value to assign to FindNumBins.
-
makeBinaryTipText
public java.lang.String makeBinaryTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMakeBinary
public boolean getMakeBinary()
Gets whether binary attributes should be made for discretized ones.- Returns:
- true if attributes will be binarized
-
setMakeBinary
public void setMakeBinary(boolean makeBinary)
Sets whether binary attributes should be made for discretized ones.- Parameters:
makeBinary- if binary attributes are to be made
-
desiredWeightOfInstancesPerIntervalTipText
public java.lang.String desiredWeightOfInstancesPerIntervalTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getDesiredWeightOfInstancesPerInterval
public double getDesiredWeightOfInstancesPerInterval()
Get the DesiredWeightOfInstancesPerInterval value.- Returns:
- the DesiredWeightOfInstancesPerInterval value.
-
setDesiredWeightOfInstancesPerInterval
public void setDesiredWeightOfInstancesPerInterval(double newDesiredNumber)
Set the DesiredWeightOfInstancesPerInterval value.- Parameters:
newDesiredNumber- The new DesiredNumber value.
-
useEqualFrequencyTipText
public java.lang.String useEqualFrequencyTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getUseEqualFrequency
public boolean getUseEqualFrequency()
Get the value of UseEqualFrequency.- Returns:
- Value of UseEqualFrequency.
-
setUseEqualFrequency
public void setUseEqualFrequency(boolean newUseEqualFrequency)
Set the value of UseEqualFrequency.- Parameters:
newUseEqualFrequency- Value to assign to UseEqualFrequency.
-
binsTipText
public java.lang.String binsTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getBins
public int getBins()
Gets the number of bins numeric attributes will be divided into- Returns:
- the number of bins.
-
setBins
public void setBins(int numBins)
Sets the number of bins to divide each selected numeric attribute into- Parameters:
numBins- the number of bins
-
invertSelectionTipText
public java.lang.String invertSelectionTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getInvertSelection
public boolean getInvertSelection()
Gets whether the supplied columns are to be removed or kept- Returns:
- true if the supplied columns will be kept
-
setInvertSelection
public void setInvertSelection(boolean invert)
Sets whether selected columns should be removed or kept. If true the selected columns are kept and unselected columns are deleted. If false selected columns are deleted and unselected columns are kept.- Parameters:
invert- the new invert setting
-
attributeIndicesTipText
public java.lang.String attributeIndicesTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getAttributeIndices
public java.lang.String getAttributeIndices()
Gets the current range selection- Returns:
- a string containing a comma separated list of ranges
-
setAttributeIndices
public void setAttributeIndices(java.lang.String rangeList)
Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).- Parameters:
rangeList- a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last- Throws:
java.lang.IllegalArgumentException- if an invalid range list is supplied
-
setAttributeIndicesArray
public void setAttributeIndicesArray(int[] attributes)
Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).- Parameters:
attributes- an array containing indexes of attributes to Discretize. Since the array will typically come from a program, attributes are indexed from 0.- Throws:
java.lang.IllegalArgumentException- if an invalid set of ranges is supplied
-
getCutPoints
public double[] getCutPoints(int attributeIndex)
Gets the cut points for an attribute- Parameters:
attributeIndex- the index (from 0) of the attribute to get the cut points of- Returns:
- an array containing the cutpoints (or null if the attribute requested has been discretized into only one interval.)
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevisionin interfaceRevisionHandler- Overrides:
getRevisionin classFilter- Returns:
- the revision
-
main
public static void main(java.lang.String[] argv)
Main method for testing this class.- Parameters:
argv- should contain arguments to the filter: use -h for help
-
-