|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.solr.analysis.PatternTokenizerFactory
public class PatternTokenizerFactory
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group" "pattern" is the regular expression. "group" says which group to extract into tokens. group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from: http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#split(java.lang.String) Using group >= 0 selects the matching group as the token. For example, if you have: pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc' the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
| Field Summary | |
|---|---|
protected Map<String,String> |
args
|
protected int |
group
|
static String |
GROUP
|
protected Pattern |
pattern
|
static String |
PATTERN
|
| Constructor Summary | |
|---|---|
PatternTokenizerFactory()
|
|
| Method Summary | |
|---|---|
TokenStream |
create(Reader input)
Split the input using configured pattern |
Map<String,String> |
getArgs()
The arguments passed to init() |
static List<Token> |
group(Matcher matcher,
String input,
int group)
Create tokens from the matches in a matcher |
void |
init(Map<String,String> args)
Require a configured pattern |
static List<Token> |
split(Matcher matcher,
String input)
This behaves just like String.split( ), but returns a list of Tokens rather then an array of strings |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String PATTERN
public static final String GROUP
protected Map<String,String> args
protected Pattern pattern
protected int group
| Constructor Detail |
|---|
public PatternTokenizerFactory()
| Method Detail |
|---|
public void init(Map<String,String> args)
init in interface TokenizerFactorypublic Map<String,String> getArgs()
getArgs in interface TokenizerFactorypublic TokenStream create(Reader input)
create in interface TokenizerFactory
public static List<Token> split(Matcher matcher,
String input)
public static List<Token> group(Matcher matcher,
String input,
int group)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||