- add(int, String, int) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
add a new word to the dictionary
- add(int, String, int, int) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
add a new word to the dictionary with its statistics frequency
- add(int, String, int) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
-
- add(int, String, int, int) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
-
- add(T) - Method in class org.lionsoul.jcseg.util.IHashQueue
-
append a item from the tail
- add(int) - Method in class org.lionsoul.jcseg.util.IntArrayList
-
Append a new Integer to the end.
- addPartSpeech(String) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
add a new part to speech to the word.
- addPartSpeech(String) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- addSyn(String) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
add a new syn word to the word.
- addSyn(String) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- ADictionary - Class in org.lionsoul.jcseg.tokenizer.core
-
Dictionary abstract super class
- ADictionary(JcsegTaskConfig, Boolean) - Constructor for class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
initialize the ADictionary
- AL_TODO_FILE - Static variable in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
the default autoload task file name
- append(String) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a string to the buffer
- append(char[], int, int) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append parts of the chars to the buffer
- append(char[]) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append some chars to the buffer
- append(char) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a char to the buffer
- append(boolean) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a boolean value
- append(short) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a short value
- append(int) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a int value
- append(long) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a long value
- append(float) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a float value
- append(double) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a double value
- APPEND_CJK_PINYIN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
append the pinying to the splited IWord
- APPEND_CJK_SYN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
append the syn word to the splited IWord.
- APPEND_PART_OF_SPEECH - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
append the part of speech.
- appendCJKPinyin() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- appendCJKSyn() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- appendLatinSyn(IWord) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
Check and append the synonyms words of specified word included the CJK and basic Latin words
All the synonyms words share the same position part of speech, word type with the primitive word
- appendWordFeatures(IWord) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
check and append the pinyin and the synonyms words of the specified word
- ASegment - Class in org.lionsoul.jcseg.tokenizer
-
abstract segmentation super class:
1.
- ASegment(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.ASegment
-
initialize the segment
- ASegment(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.ASegment
-
- autoFilter - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
auto filter the words with low score
- autoLoad() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
initialize the value of its options by auto searching the jcesg.properties file:
- AutoLoadFile - Class in org.lionsoul.jcseg.tokenizer.core
-
AutoLoad file to describle the autoload configration files
- AutoLoadFile(String) - Constructor for class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
-
- autoMinLength - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
auto append the words with a length over the specifield value
as a phrase
- CE_MIXED_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
- charAt(int) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
get the char at a specified position in the buffer
- CHECK_CE_MASk - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ISegment
-
- CHECK_CF_MASK - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ISegment
-
- Chunk - Class in org.lionsoul.jcseg.tokenizer
-
chunk concept for the mmseg chinese word segment algorithm has implemented IChunk interface
- Chunk(IWord[]) - Constructor for class org.lionsoul.jcseg.tokenizer.Chunk
-
- CJK_UNITS - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
chinese single units
- CJK_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
China,JPanese,Korean words
- clear() - Method in class org.lionsoul.jcseg.util.IntArrayList
-
- clear() - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
clear the buffer by reset the count to 0
- CLEAR_STOPWORD - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
clear away the stopword.
- clearStopwords() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- clone() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
make clone available
- clone() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
rewrite the clone method
- clone() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
Interface to clone the current object
- CN_DNAME_1 - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
fisrt word of chinese double name
- CN_DNAME_2 - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
sencond word of chinese double name
- CN_LNAME - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
chinese last name
- CN_LNAME_ADORN - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
the adorn(修饰) char before the last name
- CN_SNAME - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
chinese single name
- CNFRA_TO_ARABIC - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
chinese fraction to arabic fraction .
- cnFractionToArabic() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- CNNUM_TO_ARABIC - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
chinese numeric to Arabic .
- cnNumericToArabic(String, boolean) - Static method in class org.lionsoul.jcseg.util.NumericUtil
-
a static method to turn the Chinese numeric to Arabic numbers
- cnNumToArabic() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- compareTo(TextRankSummaryExtractor.Document) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
override the compareTo method
compare document with its relevance score
- COMPLEX_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- ComplexSeg - Class in org.lionsoul.jcseg.tokenizer
-
Jcseg complex segmentation implements extended from the ASegment class
this will need the filter works of the four MMSeg rules:
- ComplexSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.ComplexSeg
-
- ComplexSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.ComplexSeg
-
- config - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
-
- config - Variable in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
- contains(T) - Method in class org.lionsoul.jcseg.util.IHashQueue
-
check the specifield T is aleady exists in the queue or not
- createDefaultDictionary(JcsegTaskConfig, boolean, boolean) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
-
create a default ADictionary instance:
1.
- createDefaultDictionary(JcsegTaskConfig) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
-
create the ADictionary according to the JcsegTaskConfig
check and load the lexicon by default
- createDefaultDictionary(JcsegTaskConfig, boolean) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
-
create the ADictionary according to the JcsegTaskConfig
- createDictionary(Class<? extends ADictionary>, Class<?>[], Object[]) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
-
create a new ADictionary instance
- createJcseg(int, Object...) - Static method in class org.lionsoul.jcseg.tokenizer.core.SegmentFactory
-
create the specified mode jcseg instance
- createSegment(Class<? extends ISegment>, Class<?>[], Object[]) - Static method in class org.lionsoul.jcseg.tokenizer.core.SegmentFactory
-
load the ISegment class with the given path
- createSingletonDictionary(JcsegTaskConfig) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
-
create a singleton ADictionary object according to the JcsegTaskConfig
check and load the lexicon by default
- createSingletonDictionary(JcsegTaskConfig, boolean) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
-
create a singleton ADictionary object according to the JcsegTaskConfig
- ctrlMask - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
-
segmentation runtime function control mask
- get(int, String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
return the IWord asscociate with the given key.
- get(int, String) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
-
- get(int) - Method in class org.lionsoul.jcseg.util.IntArrayList
-
- getAutoMinLength() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- getAverageWordsLength() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
-
- getAverageWordsLength() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
-
return the average word length for all the chunks.
- getBestCJKChunk(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
an abstract method to gain a CJK word from the
current position.
- getBestCJKChunk(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.ComplexSeg
-
- getBestCJKChunk(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.SearchSeg
-
here we don't have to do anything
- getBestCJKChunk(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.SimpleSeg
-
- getConfig() - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
get the current task configuration instance.
- getConfig() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
- getConfig() - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
-
get the current task config instance
- getDict() - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
get the current dictionary instance.
- getDict() - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
-
get the current dictionary instance
- getEnCharType(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
get the type of the english char
defined in this class and start with EN_.
- getEnSecondSeg() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getFile() - Method in class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
-
- getFrequency() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the frequency of the word,
use only when the word's length is one.
- getFrequency() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getIndex() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- getIndex(String) - Static method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
get the key's type index located in ILexicon interface
- getJarHome(Object) - Static method in class org.lionsoul.jcseg.util.Util
-
get the absolute parent path for the jar file.
- getKeyphrase(Reader) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- getKeyphrase(Reader) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
-
get the keyphrase list from a reader
- getKeyphraseFromFile(String) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
-
get the keyphrase list from a file
- getKeyphraseFromString(String) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
-
get the keyphrase list from a string
- getKeySentence(Reader) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- getKeySentence(Reader) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
get the key sentence from a reader
- getKeySentenceFromFile(String) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
get key sentence from a file path
- getKeySentenceFromString(String) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
get key sentence from a string
- getKeywords(Reader) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- getKeywords(Reader) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
-
get the keywords list from a reader
- getKeywordsFromFile(String) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
-
get the keywords list from a file
- getKeywordsFromString(String) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
-
get the keywords list from a string
- getKeywordsNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- getKeywordsNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- getLargestAverageWordLengthChunks(IChunk[]) - Static method in class org.lionsoul.jcseg.tokenizer.MMSegFilter
-
2.
- getLargestSingleMorphemicFreedomChunks(IChunk[]) - Static method in class org.lionsoul.jcseg.tokenizer.MMSegFilter
-
the largest sum of degree of morphemic freedom of one-character words
this rule will return the chunks that own the largest sum of degree of morphemic freedom
of one-character
- getLastUpdateTime() - Method in class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
-
- getLength() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
-
- getLength() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
-
return the length of the chunk(the number of the word)
- getLength() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the length of the word
- getLength() - Method in class org.lionsoul.jcseg.tokenizer.Sentence
-
- getLength() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getLexiconFilePrefix() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
property about lexicon file.
- getLexiconFileSuffix() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getLexiconPath() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
return the lexicon directory path
- getMaxCnLnadron() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getMaximumMatchChunks(IChunk[]) - Static method in class org.lionsoul.jcseg.tokenizer.MMSegFilter
-
1.
- getMaxIterateNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- getMaxIterateNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- getMaxIterateNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- getMaxLength() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getMaxWordsNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- getMixCnLength() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getNameSingleThreshold() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getNextCJKWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
get the next CJK word from the current position of the input stream
- getNextCJKWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.SearchSeg
-
get the next CJK word from the current position of the input stream
and this function is the core part the most segmentation implements
- getNextLatinWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
get the next Latin word from the current position of the input stream
- getNextMatch(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
match the next CJK word in the dictionary
- getNextPunctuationPairWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
get the next punctuation pair word from the current position
of the input stream.
- getPairPunctuationText(int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
find pair punctuation of the given punctuation char
the purpose is to get the text bettween them
- getPartSpeech() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the part of speech of the word.
- getPartSpeech() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getPinyin() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the pinying of the word
- getPinyin() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getPollTime() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getPosition() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the start position of the word.
- getPosition() - Method in class org.lionsoul.jcseg.tokenizer.Sentence
-
- getPosition() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getPPTMaxLength() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getPropertieFile() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getPunctuationPair(char) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
get the pair punctuation' pair
- getQueueSize() - Method in class org.lionsoul.jcseg.util.IPushbackReader
-
get the buffer size - the number of buffered data
- getScore() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- getSeg() - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
-
- getSeg() - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
-
- getSentence() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- getSentenceNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- getSentenceSeg() - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
- getSingleWordsMorphemicFreedom() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
-
- getSingleWordsMorphemicFreedom() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
-
return the degree of morphemic freedom for all
the single words.
- getSmallestVarianceWordLengthChunks(IChunk[]) - Static method in class org.lionsoul.jcseg.tokenizer.MMSegFilter
-
the smallest variance word length
this rule will the chunks that one the smallest variance word length
- getSTokenMinLen() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getStreamPosition() - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
- getStreamPosition() - Method in interface org.lionsoul.jcseg.tokenizer.core.ISegment
-
get the current length of the stream
- getStreamPosition() - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
-
- getSummary(Reader, int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- getSummary(Reader, int) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
get summary from a reader
- getSummaryFromFile(String, int) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
get document summary from a file
- getSummaryFromString(String, int) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
get document summary from a string
- getSyn() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the syn words of the word.
- getSyn() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getType() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the type of the word
- getType() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getValue() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the value of the word
- getValue() - Method in class org.lionsoul.jcseg.tokenizer.Sentence
-
- getValue() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getWindowSize() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- getWindowSize() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- getWords() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- getWords() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
-
- getWords() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
-
get the all the words in the chunk.
- getWordSeg() - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
- getWordsVariance() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
-
- getWordsVariance() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
-
return the variance of all the words in all
the chunks.
- gisb - Variable in class org.lionsoul.jcseg.tokenizer.SentenceSeg
-
global string buffer
- I_CN_NAME - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
identify the chinese name?
- ialist - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
-
- IChunk - Interface in org.lionsoul.jcseg.tokenizer.core
-
chunk interface for JCSeg the most important concept for the mmseg chinese segment alogorithm
- identifyCnName() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- idx - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
-
the index value of the current input stream
mainly for track the start position of the token
- idx - Variable in class org.lionsoul.jcseg.tokenizer.SentenceSeg
-
- IHashQueue<T extends IWord> - Class in org.lionsoul.jcseg.util
-
A normal queue base one single link
but with hash index, so, it is fast for searching
- IHashQueue() - Constructor for class org.lionsoul.jcseg.util.IHashQueue
-
- IHashQueue.Entry<T> - Class in org.lionsoul.jcseg.util
-
innner Entry node class
- IHashQueue.Entry(T, IHashQueue.Entry<T>, IHashQueue.Entry<T>) - Constructor for class org.lionsoul.jcseg.util.IHashQueue.Entry
-
- IIntFIFO - Class in org.lionsoul.jcseg.util
-
int first in first out queue base on single link
- IIntFIFO() - Constructor for class org.lionsoul.jcseg.util.IIntFIFO
-
- IIntFIFO.Entry - Class in org.lionsoul.jcseg.util
-
Item Entry inner class
- IIntFIFO.Entry(int, IIntFIFO.Entry) - Constructor for class org.lionsoul.jcseg.util.IIntFIFO.Entry
-
- IIntQueue - Class in org.lionsoul.jcseg.util
-
char queue class base on double link
Not thread safe
- IIntQueue() - Constructor for class org.lionsoul.jcseg.util.IIntQueue
-
- IIntQueue.Entry - Class in org.lionsoul.jcseg.util
-
innner Entry node class
- IIntQueue.Entry(int, IIntQueue.Entry, IIntQueue.Entry) - Constructor for class org.lionsoul.jcseg.util.IIntQueue.Entry
-
- ILexicon - Interface in org.lionsoul.jcseg.tokenizer.core
-
lexicon configuration class.
- insertionSort(T[]) - Static method in class org.lionsoul.jcseg.util.Sort
-
insert sort method
- insertionSort(T[], int, int) - Static method in class org.lionsoul.jcseg.util.Sort
-
method to sort an subarray from start to end with insertion sort algorithm
- IntArrayList - Class in org.lionsoul.jcseg.util
-
array list for basic int data type to intead of ArrayList
Well, this will save a lot work to Reopened and Unpacking
- IntArrayList() - Constructor for class org.lionsoul.jcseg.util.IntArrayList
-
- IntArrayList(int) - Constructor for class org.lionsoul.jcseg.util.IntArrayList
-
- IPushbackReader - Class in org.lionsoul.jcseg.util
-
IPushBackReader based on Reader
Not thread safe support unlimited unread operation
- IPushbackReader(Reader) - Constructor for class org.lionsoul.jcseg.util.IPushbackReader
-
- isAutoFilter() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- isAutoload() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
about lexicon autoload
- isb - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
-
- isCJKChar(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is CJK, Thai...
- isCNNumeric(char) - Static method in class org.lionsoul.jcseg.util.NumericUtil
-
check the given char is chinese numeric or not
- isCnPunctuation(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- isDecimal(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is a decimal including the full-width char
- isDigit(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is a digit or not
true will return if it is or return false this method can recognize full-with char
- ISegment - Interface in org.lionsoul.jcseg.tokenizer.core
-
Jcseg segment interface
- isEnChar(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is a basic Latin and russia and
greece letter true will be return if it is or return false
this method can recognize full-width char and letter
- isENKeepPunctuaton(char) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the given char is english keep punctuation
- isEnLetter(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
include the full-width and half-width char
- isEnNumeric(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specifield char is an english numeric(48-57)
including the full-width char
- isEnPunctuation(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the given char is half-width punctuation
- isFWEnChar(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the given char is a full-width char
AT+reader: the full-width punctuation is not included here
- isHWEnChar(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the given char is a half-width char or not
- isKeepPunctuation(char) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- isLetterNumber(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is Letter number like 'ⅠⅡ'
true will be return if it is, or return false
- isLowerCaseLetter(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- isOtherNumber(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is other number like '①⑩⑽㈩'
true will be return if it is, or return false
- isPairPunctuation(char) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the given char is pair punctuation or not
- isSync() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
- IStringBuffer - Class in org.lionsoul.jcseg.util
-
string buffer class
- IStringBuffer() - Constructor for class org.lionsoul.jcseg.util.IStringBuffer
-
create a buffer with a default length 16
- IStringBuffer(int) - Constructor for class org.lionsoul.jcseg.util.IStringBuffer
-
create a buffer with a specified length
- IStringBuffer(String) - Constructor for class org.lionsoul.jcseg.util.IStringBuffer
-
create a buffer with a specified string
- isUpperCaseLetter(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- isWhitespace(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the given string is a whitespace
- IWord - Interface in org.lionsoul.jcseg.tokenizer.core
-
Word interface
- ladCJKPos() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- length() - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
return the lenght of the buffer
- LEX_PROPERTY_FILE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
default lexicon property file name
- LexiconException - Exception in org.lionsoul.jcseg.tokenizer.core
-
JCSeg Dictionary configuration exception class
- LexiconException(String) - Constructor for exception org.lionsoul.jcseg.tokenizer.core.LexiconException
-
- LexiconException(Throwable) - Constructor for exception org.lionsoul.jcseg.tokenizer.core.LexiconException
-
- LexiconException(String, Throwable) - Constructor for exception org.lionsoul.jcseg.tokenizer.core.LexiconException
-
- load(File) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load all the words from a specified lexicon file
- load(String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load all the words from a specified lexicon path
- load(InputStream) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load all the words from a specified lexicon input stream
- load(String) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
initialize the value of its options from a speicfied
jcseg.properties propertie file
- load(InputStream) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
initialize the value of its options from a InputStream
of a jcseg.properties prperties file
- LOAD_CJK_PINYIN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
wether to load the pinying of the CJK_WORDS
- LOAD_CJK_POS - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
wether to load the word's part of speech
- LOAD_CJK_SYN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
wether to load the syn word of the CJK_WORDS.
- loadCJKPinyin() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- loadCJKSyn() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- loadClassPath() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load all the words from all the files under the specified class path.
- loadDirectory(String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load the all the words form all the files under a specified lexicon directionry
- loadWords(JcsegTaskConfig, ADictionary, File) - Static method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load all the words in the specified lexicon file into the dictionary
- loadWords(JcsegTaskConfig, ADictionary, String) - Static method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load all the words from a specified lexicon file path
- loadWords(JcsegTaskConfig, ADictionary, InputStream) - Static method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load words from a InputStream
- SEARCH_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- SearchSeg - Class in org.lionsoul.jcseg.tokenizer
-
search mode implementation all the possible combination will be returned,
and build it for search of course.
- SearchSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.SearchSeg
-
- SearchSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.SearchSeg
-
- seg - Variable in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
-
the ISegment object
- seg - Variable in class org.lionsoul.jcseg.extractor.KeywordsExtractor
-
the ISegment object
- SegmentFactory - Class in org.lionsoul.jcseg.tokenizer.core
-
Segment factory to create singleton ISegment object
a path of the class that has implemented the ISegment interface must be given first
- SegmentFactory() - Constructor for class org.lionsoul.jcseg.tokenizer.core.SegmentFactory
-
- sentence(String) - Method in class org.lionsoul.jcseg.test.JcsegTest
-
key sentence extractor
- Sentence - Class in org.lionsoul.jcseg.tokenizer
-
sentence desc class
- Sentence(String, int) - Constructor for class org.lionsoul.jcseg.tokenizer.Sentence
-
construct method
- Sentence(String) - Constructor for class org.lionsoul.jcseg.tokenizer.Sentence
-
- sentenceNum - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- sentenceSeg - Variable in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
sentence splitter object
- SentenceSeg - Class in org.lionsoul.jcseg.tokenizer
-
document sentence splitter
- SentenceSeg(Reader) - Constructor for class org.lionsoul.jcseg.tokenizer.SentenceSeg
-
construct method
- SentenceSeg() - Constructor for class org.lionsoul.jcseg.tokenizer.SentenceSeg
-
- set(int, int) - Method in class org.lionsoul.jcseg.util.IntArrayList
-
- set(int, char) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
set the char at the specifield index
- setAppendCJKPinyin(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setAppendCJKSyn(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setAppendPartOfSpeech(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setAutoFilter(boolean) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- setAutoload(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setAutoMinLength(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- setClearStopwords(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setCnFactionToArabic(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setCnNumToArabic(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setConfig(JcsegTaskConfig) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
set the current task configuration instance.
- setConfig(JcsegTaskConfig) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
- setConfig(JcsegTaskConfig) - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
-
set the current task config
- setDict(ADictionary) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
set the dictionary of the current tokenizer.
- setDict(ADictionary) - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
-
set the current dictionary instance
- setEnSecondSeg(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setFile(File) - Method in class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
-
- setICnName(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setIndex(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- setKeepPunctuations(String) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setKeepUnregWords(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setKeywordsNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- setKeywordsNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- setLastUpdateTime(long) - Method in class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
-
- setLength(int) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
self define the length
- setLength(int) - Method in class org.lionsoul.jcseg.tokenizer.Sentence
-
- setLength(int) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- setLexiconPath(String[]) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setLoadCJKPinyin(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setLoadCJKPos(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setLoadCJKSyn(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setMaxCnLnadron(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setMaxIterateNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- setMaxIterateNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- setMaxIterateNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- setMaxLength(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setMaxWordsNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- setMixCnLength(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setNameSingleThreshold(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setPartSpeech(String[]) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
- setPartSpeech(String[]) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- setPinyin(String) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
set the pinying of the word
- setPinyin(String) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- setPollTime(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setPosition(int) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
set the position of the word
- setPosition(int) - Method in class org.lionsoul.jcseg.tokenizer.Sentence
-
- setPosition(int) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- setPPT_MAX_LENGTH(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setScore(double) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- setSeg(ISegment) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
-
- setSeg(ISegment) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
-
- setSentence(Sentence) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- setSentenceNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- setSentenceSeg(SentenceSeg) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
- setSTokenMinLen(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setSyn(String[]) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
- setSyn(String[]) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- setValue(String) - Method in class org.lionsoul.jcseg.tokenizer.Sentence
-
- setWindowSize(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- setWindowSize(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- setWords(List<IWord>) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- setWordSeg(ISegment) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
- shellSort(T[]) - Static method in class org.lionsoul.jcseg.util.Sort
-
shell sort algorithm
- SIMPLE_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
simple algorithm or complex algorithm
- SimpleSeg - Class in org.lionsoul.jcseg.tokenizer
-
Jcseg simple segmentation implements extend from ASegment
- SimpleSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.SimpleSeg
-
- SimpleSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.SimpleSeg
-
- SIMSTR - Static variable in class org.lionsoul.jcseg.util.STConverter
-
- SimToTraditional(String) - Static method in class org.lionsoul.jcseg.util.STConverter
-
convert the simplified words to traditional words
of the specified string.
- SimToTraditional(String, IStringBuffer) - Static method in class org.lionsoul.jcseg.util.STConverter
-
- size(int) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
return the size of the dictionary
- size(int) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
-
- size() - Method in class org.lionsoul.jcseg.util.IHashQueue
-
get the size of the queue
- size() - Method in class org.lionsoul.jcseg.util.IIntFIFO
-
get the size of the queue
- size() - Method in class org.lionsoul.jcseg.util.IIntQueue
-
get the size of the queue
- size() - Method in class org.lionsoul.jcseg.util.IntArrayList
-
- Sort - Class in org.lionsoul.jcseg.util
-
All kind of Sort algorithm implemented method use the default compare method
- Sort() - Constructor for class org.lionsoul.jcseg.util.Sort
-
- START_SS_MASK - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ISegment
-
- startAutoload() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
start the lexicon autoload thread
- STConverter - Class in org.lionsoul.jcseg.util
-
Simplified and traditional chinese convert class
all the search work base on
String.indexOf(int)
you may store all the words in a HashMap for the purpuse of a faster fetch
- STConverter() - Constructor for class org.lionsoul.jcseg.util.STConverter
-
- STOKEN_MIN_LEN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
Less length for the second split to make up a word
- STOP_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
- stopAutoload() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
- StringUtil - Class in org.lionsoul.jcseg.util
-
a class to deal with the english stop char like the english punctuation
- StringUtil() - Constructor for class org.lionsoul.jcseg.util.StringUtil
-
- summary(String) - Method in class org.lionsoul.jcseg.test.JcsegTest
-
summary extractor
- SummaryExtractor - Class in org.lionsoul.jcseg.extractor
-
document summary extractor
- SummaryExtractor(ISegment, SentenceSeg) - Constructor for class org.lionsoul.jcseg.extractor.SummaryExtractor
-
construct method
- sync - Variable in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-