A B C D E F G H I J K L M N O P Q R S T U V W 

A

add(int, String, int) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
add a new word to the dictionary
add(int, String, int, int) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
add a new word to the dictionary with its statistics frequency
add(int, String, int) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
 
add(int, String, int, int) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
 
add(T) - Method in class org.lionsoul.jcseg.util.IHashQueue
append a item from the tail
add(int) - Method in class org.lionsoul.jcseg.util.IntArrayList
Append a new Integer to the end.
addPartSpeech(String) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
add a new part to speech to the word.
addPartSpeech(String) - Method in class org.lionsoul.jcseg.tokenizer.Word
 
addSyn(String) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
add a new syn word to the word.
addSyn(String) - Method in class org.lionsoul.jcseg.tokenizer.Word
 
ADictionary - Class in org.lionsoul.jcseg.tokenizer.core
Dictionary abstract super class
ADictionary(JcsegTaskConfig, Boolean) - Constructor for class org.lionsoul.jcseg.tokenizer.core.ADictionary
initialize the ADictionary
AL_TODO_FILE - Static variable in class org.lionsoul.jcseg.tokenizer.core.ADictionary
the default autoload task file name
append(String) - Method in class org.lionsoul.jcseg.util.IStringBuffer
append a string to the buffer
append(char[], int, int) - Method in class org.lionsoul.jcseg.util.IStringBuffer
append parts of the chars to the buffer
append(char[]) - Method in class org.lionsoul.jcseg.util.IStringBuffer
append some chars to the buffer
append(char) - Method in class org.lionsoul.jcseg.util.IStringBuffer
append a char to the buffer
append(boolean) - Method in class org.lionsoul.jcseg.util.IStringBuffer
append a boolean value
append(short) - Method in class org.lionsoul.jcseg.util.IStringBuffer
append a short value
append(int) - Method in class org.lionsoul.jcseg.util.IStringBuffer
append a int value
append(long) - Method in class org.lionsoul.jcseg.util.IStringBuffer
append a long value
append(float) - Method in class org.lionsoul.jcseg.util.IStringBuffer
append a float value
append(double) - Method in class org.lionsoul.jcseg.util.IStringBuffer
append a double value
APPEND_CJK_PINYIN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
append the pinying to the splited IWord
APPEND_CJK_SYN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
append the syn word to the splited IWord.
APPEND_PART_OF_SPEECH - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
append the part of speech.
appendCJKPinyin() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
appendCJKSyn() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
appendLatinSyn(IWord) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
Check and append the synonyms words of specified word included the CJK and basic Latin words All the synonyms words share the same position part of speech, word type with the primitive word
appendWordFeatures(IWord) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
check and append the pinyin and the synonyms words of the specified word
ASegment - Class in org.lionsoul.jcseg.tokenizer
abstract segmentation super class: 1.
ASegment(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.ASegment
initialize the segment
ASegment(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.ASegment
 
autoFilter - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
auto filter the words with low score
autoLoad() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
initialize the value of its options by auto searching the jcesg.properties file:
AutoLoadFile - Class in org.lionsoul.jcseg.tokenizer.core
AutoLoad file to describle the autoload configration files
AutoLoadFile(String) - Constructor for class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
 
autoMinLength - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
auto append the words with a length over the specifield value as a phrase

B

B - Static variable in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
 
bucketSort(int[], int) - Static method in class org.lionsoul.jcseg.util.Sort
bucket sort algorithm
bucketSort(Integer[], int) - Static method in class org.lionsoul.jcseg.util.Sort
bucket sort algorithm
buffer() - Method in class org.lionsoul.jcseg.util.IStringBuffer
return the chars of the buffer

C

CE_MIXED_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
 
charAt(int) - Method in class org.lionsoul.jcseg.util.IStringBuffer
get the char at a specified position in the buffer
CHECK_CE_MASk - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ISegment
 
CHECK_CF_MASK - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ISegment
 
Chunk - Class in org.lionsoul.jcseg.tokenizer
chunk concept for the mmseg chinese word segment algorithm has implemented IChunk interface
Chunk(IWord[]) - Constructor for class org.lionsoul.jcseg.tokenizer.Chunk
 
CJK_UNITS - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
chinese single units
CJK_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
China,JPanese,Korean words
clear() - Method in class org.lionsoul.jcseg.util.IntArrayList
 
clear() - Method in class org.lionsoul.jcseg.util.IStringBuffer
clear the buffer by reset the count to 0
CLEAR_STOPWORD - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
clear away the stopword.
clearStopwords() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
clone() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
make clone available
clone() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
rewrite the clone method
clone() - Method in class org.lionsoul.jcseg.tokenizer.Word
Interface to clone the current object
CN_DNAME_1 - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
fisrt word of chinese double name
CN_DNAME_2 - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
sencond word of chinese double name
CN_LNAME - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
chinese last name
CN_LNAME_ADORN - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
the adorn(修饰) char before the last name
CN_SNAME - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
chinese single name
CNFRA_TO_ARABIC - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
chinese fraction to arabic fraction .
cnFractionToArabic() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
CNNUM_TO_ARABIC - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
chinese numeric to Arabic .
cnNumericToArabic(String, boolean) - Static method in class org.lionsoul.jcseg.util.NumericUtil
a static method to turn the Chinese numeric to Arabic numbers
cnNumToArabic() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
compareTo(TextRankSummaryExtractor.Document) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
override the compareTo method compare document with its relevance score
COMPLEX_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
ComplexSeg - Class in org.lionsoul.jcseg.tokenizer
Jcseg complex segmentation implements extended from the ASegment class this will need the filter works of the four MMSeg rules:
ComplexSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.ComplexSeg
 
ComplexSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.ComplexSeg
 
config - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
 
config - Variable in class org.lionsoul.jcseg.tokenizer.core.ADictionary
 
contains(T) - Method in class org.lionsoul.jcseg.util.IHashQueue
check the specifield T is aleady exists in the queue or not
createDefaultDictionary(JcsegTaskConfig, boolean, boolean) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
create a default ADictionary instance: 1.
createDefaultDictionary(JcsegTaskConfig) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
create the ADictionary according to the JcsegTaskConfig check and load the lexicon by default
createDefaultDictionary(JcsegTaskConfig, boolean) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
create the ADictionary according to the JcsegTaskConfig
createDictionary(Class<? extends ADictionary>, Class<?>[], Object[]) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
create a new ADictionary instance
createJcseg(int, Object...) - Static method in class org.lionsoul.jcseg.tokenizer.core.SegmentFactory
create the specified mode jcseg instance
createSegment(Class<? extends ISegment>, Class<?>[], Object[]) - Static method in class org.lionsoul.jcseg.tokenizer.core.SegmentFactory
load the ISegment class with the given path
createSingletonDictionary(JcsegTaskConfig) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
create a singleton ADictionary object according to the JcsegTaskConfig check and load the lexicon by default
createSingletonDictionary(JcsegTaskConfig, boolean) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
create a singleton ADictionary object according to the JcsegTaskConfig
ctrlMask - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
segmentation runtime function control mask

D

D - Static variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
D - Static variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
D - Static variable in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
 
data - Variable in class org.lionsoul.jcseg.util.IHashQueue.Entry
 
data - Variable in class org.lionsoul.jcseg.util.IIntFIFO.Entry
 
data - Variable in class org.lionsoul.jcseg.util.IIntQueue.Entry
 
deleteCharAt(int) - Method in class org.lionsoul.jcseg.util.IStringBuffer
delete the char at the specified position
deQueue() - Method in class org.lionsoul.jcseg.util.IIntFIFO
remove the first item from the queue
deQueue() - Method in class org.lionsoul.jcseg.util.IIntQueue
remove the node from the head and you should make sure the size is larger than 0 by calling size() before you invoke the method or you will just get -1
DETECT_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
DetectSeg - Class in org.lionsoul.jcseg.tokenizer
Detect segmentation mode return words only in the loaded dictionary yat, when matched a word and return it or continue to find the next word in the dictionary
DetectSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.DetectSeg
method to create the new ISegment
DetectSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.DetectSeg
method to create a new ISegment
dic - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
the dictionary and task configuration instance
Dictionary - Class in org.lionsoul.jcseg.tokenizer
Dictionary class
Dictionary(JcsegTaskConfig, Boolean) - Constructor for class org.lionsoul.jcseg.tokenizer.Dictionary
 
DictionaryFactory - Class in org.lionsoul.jcseg.tokenizer.core
Dictionary Factory to create Dictionary instance a path of the class that has extends the ADictionary class must be given first

E

EC_MIXED_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
chinese and english mix word like B超,SIM卡.
EN_LETTER - Static variable in class org.lionsoul.jcseg.util.StringUtil
 
EN_NUMERIC - Static variable in class org.lionsoul.jcseg.util.StringUtil
 
EN_POSPEECH - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
 
EN_PUN_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
 
EN_PUNCTUATION - Static variable in class org.lionsoul.jcseg.util.StringUtil
 
EN_SECOND_SEG - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
Wether to do the secondary split for complex latin compose
EN_UNKNOW - Static variable in class org.lionsoul.jcseg.util.StringUtil
 
EN_WHITESPACE - Static variable in class org.lionsoul.jcseg.util.StringUtil
 
EN_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
 
enQueue(int) - Method in class org.lionsoul.jcseg.util.IIntFIFO
add a new item to the queue
enQueue(int) - Method in class org.lionsoul.jcseg.util.IIntQueue
append a int from the tail
enSecondSeg(IWord, boolean) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
Do the secondary split for the specified complex Latin word This will split a complex English, Arabic, punctuation compose word to multiple simple parts Like 'qq2013' will split to 'qq' and '2013'
equals(Object) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
I mean: you have to rewrite the equals method cause the jcseg require it
equals(Object) - Method in class org.lionsoul.jcseg.tokenizer.Word
 

F

filter(IWord) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
word item filter
filter(IWord) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
word item filter
findCHName(char[], int, IChunk) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
find an Chinese name from the current position of the input chars
findCHName(IWord, IChunk) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
Deprecated.
fwsTohws(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
a static method to replace the full-width char to the half-width char in a given string (65281-65374 for full-width char)

G

get(int, String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
return the IWord asscociate with the given key.
get(int, String) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
 
get(int) - Method in class org.lionsoul.jcseg.util.IntArrayList
 
getAutoMinLength() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
getAverageWordsLength() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
 
getAverageWordsLength() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
return the average word length for all the chunks.
getBestCJKChunk(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
an abstract method to gain a CJK word from the current position.
getBestCJKChunk(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.ComplexSeg
 
getBestCJKChunk(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.SearchSeg
here we don't have to do anything
getBestCJKChunk(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.SimpleSeg
 
getConfig() - Method in class org.lionsoul.jcseg.tokenizer.ASegment
get the current task configuration instance.
getConfig() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
 
getConfig() - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
get the current task config instance
getDict() - Method in class org.lionsoul.jcseg.tokenizer.ASegment
get the current dictionary instance.
getDict() - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
get the current dictionary instance
getEnCharType(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
get the type of the english char defined in this class and start with EN_.
getEnSecondSeg() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
getFile() - Method in class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
 
getFrequency() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
return the frequency of the word, use only when the word's length is one.
getFrequency() - Method in class org.lionsoul.jcseg.tokenizer.Word
 
getIndex() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
 
getIndex(String) - Static method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
get the key's type index located in ILexicon interface
getJarHome(Object) - Static method in class org.lionsoul.jcseg.util.Util
get the absolute parent path for the jar file.
getKeyphrase(Reader) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
getKeyphrase(Reader) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
get the keyphrase list from a reader
getKeyphraseFromFile(String) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
get the keyphrase list from a file
getKeyphraseFromString(String) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
get the keyphrase list from a string
getKeySentence(Reader) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
 
getKeySentence(Reader) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
get the key sentence from a reader
getKeySentenceFromFile(String) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
get key sentence from a file path
getKeySentenceFromString(String) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
get key sentence from a string
getKeywords(Reader) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
getKeywords(Reader) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
get the keywords list from a reader
getKeywordsFromFile(String) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
get the keywords list from a file
getKeywordsFromString(String) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
get the keywords list from a string
getKeywordsNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
getKeywordsNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
getLargestAverageWordLengthChunks(IChunk[]) - Static method in class org.lionsoul.jcseg.tokenizer.MMSegFilter
2.
getLargestSingleMorphemicFreedomChunks(IChunk[]) - Static method in class org.lionsoul.jcseg.tokenizer.MMSegFilter
the largest sum of degree of morphemic freedom of one-character words this rule will return the chunks that own the largest sum of degree of morphemic freedom of one-character
getLastUpdateTime() - Method in class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
 
getLength() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
 
getLength() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
return the length of the chunk(the number of the word)
getLength() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
return the length of the word
getLength() - Method in class org.lionsoul.jcseg.tokenizer.Sentence
 
getLength() - Method in class org.lionsoul.jcseg.tokenizer.Word
 
getLexiconFilePrefix() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
property about lexicon file.
getLexiconFileSuffix() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
getLexiconPath() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
return the lexicon directory path
getMaxCnLnadron() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
getMaximumMatchChunks(IChunk[]) - Static method in class org.lionsoul.jcseg.tokenizer.MMSegFilter
1.
getMaxIterateNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
getMaxIterateNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
getMaxIterateNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
 
getMaxLength() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
getMaxWordsNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
getMixCnLength() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
getNameSingleThreshold() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
getNextCJKWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
get the next CJK word from the current position of the input stream
getNextCJKWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.SearchSeg
get the next CJK word from the current position of the input stream and this function is the core part the most segmentation implements
getNextLatinWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
get the next Latin word from the current position of the input stream
getNextMatch(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
match the next CJK word in the dictionary
getNextPunctuationPairWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
get the next punctuation pair word from the current position of the input stream.
getPairPunctuationText(int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
find pair punctuation of the given punctuation char the purpose is to get the text bettween them
getPartSpeech() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
return the part of speech of the word.
getPartSpeech() - Method in class org.lionsoul.jcseg.tokenizer.Word
 
getPinyin() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
return the pinying of the word
getPinyin() - Method in class org.lionsoul.jcseg.tokenizer.Word
 
getPollTime() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
getPosition() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
return the start position of the word.
getPosition() - Method in class org.lionsoul.jcseg.tokenizer.Sentence
 
getPosition() - Method in class org.lionsoul.jcseg.tokenizer.Word
 
getPPTMaxLength() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
getPropertieFile() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
getPunctuationPair(char) - Static method in class org.lionsoul.jcseg.util.StringUtil
get the pair punctuation' pair
getQueueSize() - Method in class org.lionsoul.jcseg.util.IPushbackReader
get the buffer size - the number of buffered data
getScore() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
 
getSeg() - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
 
getSeg() - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
 
getSentence() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
 
getSentenceNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
 
getSentenceSeg() - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
 
getSingleWordsMorphemicFreedom() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
 
getSingleWordsMorphemicFreedom() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
return the degree of morphemic freedom for all the single words.
getSmallestVarianceWordLengthChunks(IChunk[]) - Static method in class org.lionsoul.jcseg.tokenizer.MMSegFilter
the smallest variance word length this rule will the chunks that one the smallest variance word length
getSTokenMinLen() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
getStreamPosition() - Method in class org.lionsoul.jcseg.tokenizer.ASegment
 
getStreamPosition() - Method in interface org.lionsoul.jcseg.tokenizer.core.ISegment
get the current length of the stream
getStreamPosition() - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
 
getSummary(Reader, int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
 
getSummary(Reader, int) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
get summary from a reader
getSummaryFromFile(String, int) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
get document summary from a file
getSummaryFromString(String, int) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
get document summary from a string
getSyn() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
return the syn words of the word.
getSyn() - Method in class org.lionsoul.jcseg.tokenizer.Word
 
getType() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
return the type of the word
getType() - Method in class org.lionsoul.jcseg.tokenizer.Word
 
getValue() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
return the value of the word
getValue() - Method in class org.lionsoul.jcseg.tokenizer.Sentence
 
getValue() - Method in class org.lionsoul.jcseg.tokenizer.Word
 
getWindowSize() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
getWindowSize() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
getWords() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
 
getWords() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
 
getWords() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
get the all the words in the chunk.
getWordSeg() - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
 
getWordsVariance() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
 
getWordsVariance() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
return the variance of all the words in all the chunks.
gisb - Variable in class org.lionsoul.jcseg.tokenizer.SentenceSeg
global string buffer

H

hashCode() - Method in class org.lionsoul.jcseg.tokenizer.Word
rewrite the hash code generate algorithm take the value as the main factor
hwsTofws(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
a static method to replace the half-width char to the full-width char in a given string

I

I_CN_NAME - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
identify the chinese name?
ialist - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
 
IChunk - Interface in org.lionsoul.jcseg.tokenizer.core
chunk interface for JCSeg the most important concept for the mmseg chinese segment alogorithm
identifyCnName() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
idx - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
the index value of the current input stream mainly for track the start position of the token
idx - Variable in class org.lionsoul.jcseg.tokenizer.SentenceSeg
 
IHashQueue<T extends IWord> - Class in org.lionsoul.jcseg.util
A normal queue base one single link but with hash index, so, it is fast for searching
IHashQueue() - Constructor for class org.lionsoul.jcseg.util.IHashQueue
 
IHashQueue.Entry<T> - Class in org.lionsoul.jcseg.util
innner Entry node class
IHashQueue.Entry(T, IHashQueue.Entry<T>, IHashQueue.Entry<T>) - Constructor for class org.lionsoul.jcseg.util.IHashQueue.Entry
 
IIntFIFO - Class in org.lionsoul.jcseg.util
int first in first out queue base on single link
IIntFIFO() - Constructor for class org.lionsoul.jcseg.util.IIntFIFO
 
IIntFIFO.Entry - Class in org.lionsoul.jcseg.util
Item Entry inner class
IIntFIFO.Entry(int, IIntFIFO.Entry) - Constructor for class org.lionsoul.jcseg.util.IIntFIFO.Entry
 
IIntQueue - Class in org.lionsoul.jcseg.util
char queue class base on double link Not thread safe
IIntQueue() - Constructor for class org.lionsoul.jcseg.util.IIntQueue
 
IIntQueue.Entry - Class in org.lionsoul.jcseg.util
innner Entry node class
IIntQueue.Entry(int, IIntQueue.Entry, IIntQueue.Entry) - Constructor for class org.lionsoul.jcseg.util.IIntQueue.Entry
 
ILexicon - Interface in org.lionsoul.jcseg.tokenizer.core
lexicon configuration class.
insertionSort(T[]) - Static method in class org.lionsoul.jcseg.util.Sort
insert sort method
insertionSort(T[], int, int) - Static method in class org.lionsoul.jcseg.util.Sort
method to sort an subarray from start to end with insertion sort algorithm
IntArrayList - Class in org.lionsoul.jcseg.util
array list for basic int data type to intead of ArrayList Well, this will save a lot work to Reopened and Unpacking
IntArrayList() - Constructor for class org.lionsoul.jcseg.util.IntArrayList
 
IntArrayList(int) - Constructor for class org.lionsoul.jcseg.util.IntArrayList
 
IPushbackReader - Class in org.lionsoul.jcseg.util
IPushBackReader based on Reader Not thread safe support unlimited unread operation
IPushbackReader(Reader) - Constructor for class org.lionsoul.jcseg.util.IPushbackReader
 
isAutoFilter() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
isAutoload() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
about lexicon autoload
isb - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
 
isCJKChar(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the specified char is CJK, Thai...
isCNNumeric(char) - Static method in class org.lionsoul.jcseg.util.NumericUtil
check the given char is chinese numeric or not
isCnPunctuation(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
 
isDecimal(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the specified char is a decimal including the full-width char
isDigit(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the specified char is a digit or not true will return if it is or return false this method can recognize full-with char
ISegment - Interface in org.lionsoul.jcseg.tokenizer.core
Jcseg segment interface
isEnChar(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the specified char is a basic Latin and russia and greece letter true will be return if it is or return false this method can recognize full-width char and letter
isENKeepPunctuaton(char) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the given char is english keep punctuation
isEnLetter(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
include the full-width and half-width char
isEnNumeric(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the specifield char is an english numeric(48-57) including the full-width char
isEnPunctuation(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the given char is half-width punctuation
isFWEnChar(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the given char is a full-width char AT+reader: the full-width punctuation is not included here
isHWEnChar(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the given char is a half-width char or not
isKeepPunctuation(char) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
isLetterNumber(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the specified char is Letter number like 'ⅠⅡ' true will be return if it is, or return false
isLowerCaseLetter(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
 
isOtherNumber(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the specified char is other number like '①⑩⑽㈩' true will be return if it is, or return false
isPairPunctuation(char) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the given char is pair punctuation or not
isSync() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
 
IStringBuffer - Class in org.lionsoul.jcseg.util
string buffer class
IStringBuffer() - Constructor for class org.lionsoul.jcseg.util.IStringBuffer
create a buffer with a default length 16
IStringBuffer(int) - Constructor for class org.lionsoul.jcseg.util.IStringBuffer
create a buffer with a specified length
IStringBuffer(String) - Constructor for class org.lionsoul.jcseg.util.IStringBuffer
create a buffer with a specified string
isUpperCaseLetter(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
 
isWhitespace(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
check the given string is a whitespace
IWord - Interface in org.lionsoul.jcseg.tokenizer.core
Word interface

J

JcsegException - Exception in org.lionsoul.jcseg.tokenizer.core
JCSeg exception class
JcsegException(String) - Constructor for exception org.lionsoul.jcseg.tokenizer.core.JcsegException
 
JcsegException(Throwable) - Constructor for exception org.lionsoul.jcseg.tokenizer.core.JcsegException
 
JcsegException(String, Throwable) - Constructor for exception org.lionsoul.jcseg.tokenizer.core.JcsegException
 
JcsegTaskConfig - Class in org.lionsoul.jcseg.tokenizer.core
Jcseg segmentation task configuration class
JcsegTaskConfig() - Constructor for class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
create the config and do nothing about initialize Note: this may cuz Incompatibility problems for the old version that has use this construct method
JcsegTaskConfig(boolean) - Constructor for class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
create and initialize the config by auto load
JcsegTaskConfig(String) - Constructor for class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
create and initialize the task config from a properties file
JcsegTaskConfig(InputStream) - Constructor for class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
create and initialize the task config from a InputStream
JcsegTest - Class in org.lionsoul.jcseg.test
jcseg test program.
JcsegTest() - Constructor for class org.lionsoul.jcseg.test.JcsegTest
 

K

K1 - Static variable in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
 
KEEP_UNREG_WORDS - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
keepUnregWords() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
keyphrase(String) - Method in class org.lionsoul.jcseg.test.JcsegTest
keyphrase extractor
KeyphraseExtractor - Class in org.lionsoul.jcseg.extractor
key phrase extractor
KeyphraseExtractor(ISegment) - Constructor for class org.lionsoul.jcseg.extractor.KeyphraseExtractor
construct method
keywords(String) - Method in class org.lionsoul.jcseg.test.JcsegTest
keywords extractor
KeywordsExtractor - Class in org.lionsoul.jcseg.extractor
document keywords extractor
KeywordsExtractor(ISegment) - Constructor for class org.lionsoul.jcseg.extractor.KeywordsExtractor
construct method
keywordsNum - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
keywordsNum - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 

L

ladCJKPos() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
length() - Method in class org.lionsoul.jcseg.util.IStringBuffer
return the lenght of the buffer
LEX_PROPERTY_FILE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
default lexicon property file name
LexiconException - Exception in org.lionsoul.jcseg.tokenizer.core
JCSeg Dictionary configuration exception class
LexiconException(String) - Constructor for exception org.lionsoul.jcseg.tokenizer.core.LexiconException
 
LexiconException(Throwable) - Constructor for exception org.lionsoul.jcseg.tokenizer.core.LexiconException
 
LexiconException(String, Throwable) - Constructor for exception org.lionsoul.jcseg.tokenizer.core.LexiconException
 
load(File) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
load all the words from a specified lexicon file
load(String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
load all the words from a specified lexicon path
load(InputStream) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
load all the words from a specified lexicon input stream
load(String) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
initialize the value of its options from a speicfied jcseg.properties propertie file
load(InputStream) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
initialize the value of its options from a InputStream of a jcseg.properties prperties file
LOAD_CJK_PINYIN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
wether to load the pinying of the CJK_WORDS
LOAD_CJK_POS - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
wether to load the word's part of speech
LOAD_CJK_SYN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
wether to load the syn word of the CJK_WORDS.
loadCJKPinyin() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
loadCJKSyn() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
loadClassPath() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
load all the words from all the files under the specified class path.
loadDirectory(String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
load the all the words form all the files under a specified lexicon directionry
loadWords(JcsegTaskConfig, ADictionary, File) - Static method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
load all the words in the specified lexicon file into the dictionary
loadWords(JcsegTaskConfig, ADictionary, String) - Static method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
load all the words from a specified lexicon file path
loadWords(JcsegTaskConfig, ADictionary, InputStream) - Static method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
load words from a InputStream

M

main(String[]) - Static method in class org.lionsoul.jcseg.test.JcsegTest
 
match(int, String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
loop up the dictionary, check the given key is in the dictionary or not
match(int, String) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
 
MAX_CN_LNADRON - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
the max length for the adron of the chinese last name.like 老陈 “老”
MAX_LENGTH - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
maximum length for maximum match(5-7)
maxIterateNum - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
maxIterateNum - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
maxIterateNum - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
 
maxWordsNum - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
max phrase length
mergeSort(T[]) - Static method in class org.lionsoul.jcseg.util.Sort
merge sort algorithm
MIX_CN_LENGTH - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
maximum length for the chinese words after the LATIN word.
MIX_POSPEECH - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
 
MMSegFilter - Class in org.lionsoul.jcseg.tokenizer
mmseg default filter class
MMSegFilter() - Constructor for class org.lionsoul.jcseg.tokenizer.MMSegFilter
 

N

NAME_POSPEECH - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
 
NAME_SINGLE_THRESHOLD - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
the threshold of the single word that is a single word when it and the last char of the name make up a word.
next() - Method in class org.lionsoul.jcseg.tokenizer.ASegment
 
next() - Method in interface org.lionsoul.jcseg.tokenizer.core.ISegment
segment a word from a char array from a specified position.
next() - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
 
next() - Method in class org.lionsoul.jcseg.tokenizer.SentenceSeg
get the next sentence
next - Variable in class org.lionsoul.jcseg.util.IHashQueue.Entry
 
next - Variable in class org.lionsoul.jcseg.util.IIntFIFO.Entry
 
next - Variable in class org.lionsoul.jcseg.util.IIntQueue.Entry
 
nextBasicLatin(int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
find the letter or digit word from the current position count until the char is whitespace or not letter_digit
nextCJKSentence(int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
load a CJK char list from the stream start from the current position till the char is not a CJK char
nextCNNumeric(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
find the chinese number from the current position count until the char in the specified position is not a orther number or whitespace
nextLetterNumber(int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
find the next other letter from the current position find the letter number from the current position count until the char in the specified position is not a letter number or whitespace
nextOtherNumber(int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
find the other number from the current position count until the char in the specified position is not a orther number or whitespace
NUMERIC_POSPEECH - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
 
NumericUtil - Class in org.lionsoul.jcseg.util
a class to deal with Chinese numeric
NumericUtil() - Constructor for class org.lionsoul.jcseg.util.NumericUtil
 

O

org.lionsoul.jcseg.extractor - package org.lionsoul.jcseg.extractor
 
org.lionsoul.jcseg.extractor.impl - package org.lionsoul.jcseg.extractor.impl
 
org.lionsoul.jcseg.test - package org.lionsoul.jcseg.test
 
org.lionsoul.jcseg.tokenizer - package org.lionsoul.jcseg.tokenizer
 
org.lionsoul.jcseg.tokenizer.core - package org.lionsoul.jcseg.tokenizer.core
 
org.lionsoul.jcseg.util - package org.lionsoul.jcseg.util
 

P

PPT_MAX_LENGTH - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
the maxinum length for the text bettween the pair punctution.
PPT_POSPEECH - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
 
prev - Variable in class org.lionsoul.jcseg.util.IHashQueue.Entry
 
prev - Variable in class org.lionsoul.jcseg.util.IIntQueue.Entry
 
printMatrix(double[][]) - Static method in class org.lionsoul.jcseg.util.Util
print the specifield matrix
PUNCTUATION - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
 
pushBack(int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
push back the data to the stream.
pushBack(int) - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
push back the data to the stream

Q

qCNNumericToArabic(String) - Static method in class org.lionsoul.jcseg.util.NumericUtil
 
quickSelect(T[], int) - Static method in class org.lionsoul.jcseg.util.Sort
quick select algorithm
quicksort(T[]) - Static method in class org.lionsoul.jcseg.util.Sort
quick sort algorithm

R

read() - Method in class org.lionsoul.jcseg.util.IPushbackReader
read the next int from the stream this will check the buffer queue first and take the first item of the buffer as the result
read(char[], int, int) - Method in class org.lionsoul.jcseg.util.IPushbackReader
read the specified block from the stream
reader - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
 
reader - Variable in class org.lionsoul.jcseg.tokenizer.SentenceSeg
 
readNext() - Method in class org.lionsoul.jcseg.tokenizer.ASegment
read the next char from the current position
readNext() - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
read the next char from the current position
readNext() - Method in class org.lionsoul.jcseg.tokenizer.SentenceSeg
read the next char from the current position
readUntil(char) - Method in class org.lionsoul.jcseg.tokenizer.SentenceSeg
loop the reader until the specifield char is found.
remove(int, String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
remove the mapping associate with the given key
remove(int, String) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
 
remove() - Method in class org.lionsoul.jcseg.util.IHashQueue
remove the node from the head and you should make sure the size is larger than 0 by calling size() before you invoke the method or you will just get null.
remove(int) - Method in class org.lionsoul.jcseg.util.IntArrayList
remove the element at the specified position use System.arraycopy intead of a loop may be more effcient
reset(Reader) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
input stream and reader reset.
reset(Reader) - Method in interface org.lionsoul.jcseg.tokenizer.core.ISegment
reset the reader
reset(Reader) - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
 
reset(Reader) - Method in class org.lionsoul.jcseg.tokenizer.SentenceSeg
stream/reader reset.

S

SEARCH_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
SearchSeg - Class in org.lionsoul.jcseg.tokenizer
search mode implementation all the possible combination will be returned, and build it for search of course.
SearchSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.SearchSeg
 
SearchSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.SearchSeg
 
seg - Variable in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
the ISegment object
seg - Variable in class org.lionsoul.jcseg.extractor.KeywordsExtractor
the ISegment object
SegmentFactory - Class in org.lionsoul.jcseg.tokenizer.core
Segment factory to create singleton ISegment object a path of the class that has implemented the ISegment interface must be given first
SegmentFactory() - Constructor for class org.lionsoul.jcseg.tokenizer.core.SegmentFactory
 
sentence(String) - Method in class org.lionsoul.jcseg.test.JcsegTest
key sentence extractor
Sentence - Class in org.lionsoul.jcseg.tokenizer
sentence desc class
Sentence(String, int) - Constructor for class org.lionsoul.jcseg.tokenizer.Sentence
construct method
Sentence(String) - Constructor for class org.lionsoul.jcseg.tokenizer.Sentence
 
sentenceNum - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
 
sentenceSeg - Variable in class org.lionsoul.jcseg.extractor.SummaryExtractor
sentence splitter object
SentenceSeg - Class in org.lionsoul.jcseg.tokenizer
document sentence splitter
SentenceSeg(Reader) - Constructor for class org.lionsoul.jcseg.tokenizer.SentenceSeg
construct method
SentenceSeg() - Constructor for class org.lionsoul.jcseg.tokenizer.SentenceSeg
 
set(int, int) - Method in class org.lionsoul.jcseg.util.IntArrayList
 
set(int, char) - Method in class org.lionsoul.jcseg.util.IStringBuffer
set the char at the specifield index
setAppendCJKPinyin(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setAppendCJKSyn(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setAppendPartOfSpeech(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setAutoFilter(boolean) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
setAutoload(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setAutoMinLength(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
setClearStopwords(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setCnFactionToArabic(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setCnNumToArabic(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setConfig(JcsegTaskConfig) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
set the current task configuration instance.
setConfig(JcsegTaskConfig) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
 
setConfig(JcsegTaskConfig) - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
set the current task config
setDict(ADictionary) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
set the dictionary of the current tokenizer.
setDict(ADictionary) - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
set the current dictionary instance
setEnSecondSeg(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setFile(File) - Method in class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
 
setICnName(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setIndex(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
 
setKeepPunctuations(String) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setKeepUnregWords(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setKeywordsNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
setKeywordsNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
setLastUpdateTime(long) - Method in class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
 
setLength(int) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
self define the length
setLength(int) - Method in class org.lionsoul.jcseg.tokenizer.Sentence
 
setLength(int) - Method in class org.lionsoul.jcseg.tokenizer.Word
 
setLexiconPath(String[]) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setLoadCJKPinyin(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setLoadCJKPos(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setLoadCJKSyn(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setMaxCnLnadron(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setMaxIterateNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
setMaxIterateNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
setMaxIterateNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
 
setMaxLength(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setMaxWordsNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
setMixCnLength(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setNameSingleThreshold(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setPartSpeech(String[]) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
 
setPartSpeech(String[]) - Method in class org.lionsoul.jcseg.tokenizer.Word
 
setPinyin(String) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
set the pinying of the word
setPinyin(String) - Method in class org.lionsoul.jcseg.tokenizer.Word
 
setPollTime(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setPosition(int) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
set the position of the word
setPosition(int) - Method in class org.lionsoul.jcseg.tokenizer.Sentence
 
setPosition(int) - Method in class org.lionsoul.jcseg.tokenizer.Word
 
setPPT_MAX_LENGTH(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setScore(double) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
 
setSeg(ISegment) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
 
setSeg(ISegment) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
 
setSentence(Sentence) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
 
setSentenceNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
 
setSentenceSeg(SentenceSeg) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
 
setSTokenMinLen(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
 
setSyn(String[]) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
 
setSyn(String[]) - Method in class org.lionsoul.jcseg.tokenizer.Word
 
setValue(String) - Method in class org.lionsoul.jcseg.tokenizer.Sentence
 
setWindowSize(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
setWindowSize(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
setWords(List<IWord>) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
 
setWordSeg(ISegment) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
 
shellSort(T[]) - Static method in class org.lionsoul.jcseg.util.Sort
shell sort algorithm
SIMPLE_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
simple algorithm or complex algorithm
SimpleSeg - Class in org.lionsoul.jcseg.tokenizer
Jcseg simple segmentation implements extend from ASegment
SimpleSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.SimpleSeg
 
SimpleSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.SimpleSeg
 
SIMSTR - Static variable in class org.lionsoul.jcseg.util.STConverter
 
SimToTraditional(String) - Static method in class org.lionsoul.jcseg.util.STConverter
convert the simplified words to traditional words of the specified string.
SimToTraditional(String, IStringBuffer) - Static method in class org.lionsoul.jcseg.util.STConverter
 
size(int) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
return the size of the dictionary
size(int) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
 
size() - Method in class org.lionsoul.jcseg.util.IHashQueue
get the size of the queue
size() - Method in class org.lionsoul.jcseg.util.IIntFIFO
get the size of the queue
size() - Method in class org.lionsoul.jcseg.util.IIntQueue
get the size of the queue
size() - Method in class org.lionsoul.jcseg.util.IntArrayList
 
Sort - Class in org.lionsoul.jcseg.util
All kind of Sort algorithm implemented method use the default compare method
Sort() - Constructor for class org.lionsoul.jcseg.util.Sort
 
START_SS_MASK - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ISegment
 
startAutoload() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
start the lexicon autoload thread
STConverter - Class in org.lionsoul.jcseg.util
Simplified and traditional chinese convert class all the search work base on String.indexOf(int) you may store all the words in a HashMap for the purpuse of a faster fetch
STConverter() - Constructor for class org.lionsoul.jcseg.util.STConverter
 
STOKEN_MIN_LEN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
Less length for the second split to make up a word
STOP_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
 
stopAutoload() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
 
StringUtil - Class in org.lionsoul.jcseg.util
a class to deal with the english stop char like the english punctuation
StringUtil() - Constructor for class org.lionsoul.jcseg.util.StringUtil
 
summary(String) - Method in class org.lionsoul.jcseg.test.JcsegTest
summary extractor
SummaryExtractor - Class in org.lionsoul.jcseg.extractor
document summary extractor
SummaryExtractor(ISegment, SentenceSeg) - Constructor for class org.lionsoul.jcseg.extractor.SummaryExtractor
construct method
sync - Variable in class org.lionsoul.jcseg.tokenizer.core.ADictionary
 

T

T_BASIC_LATIN - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
latain series.
T_CJK_PINYIN - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
pinyin
T_CJK_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
China,JPanese,Korean words
T_CN_NAME - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
chinese last name.
T_CN_NICKNAME - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
chinese nickname like: 老陈
T_CN_NUMERIC - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
Chinese numeric
T_LEN - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
 
T_LETTER_NUMBER - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
letter number like 'ⅠⅡ'
T_MIXED_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
chinese and english mix word like B超,SIM卡.
T_OTHER_NUMBER - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
other number like '①⑩⑽㈩'
T_PUNCTUATION - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
 
T_UNRECOGNIZE_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
useless chars like the CJK punctuation
TextRankKeyphraseExtractor - Class in org.lionsoul.jcseg.extractor.impl
document key phrase extractor base on textRank algorithm
TextRankKeyphraseExtractor(ISegment) - Constructor for class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
TextRankKeywordsExtractor - Class in org.lionsoul.jcseg.extractor.impl
document keywords extractor base on textRank algorithm
TextRankKeywordsExtractor(ISegment) - Constructor for class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
textRankSortedDocuments(List<Sentence>, List<List<IWord>>) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
get the documents order by relevance score.
TextRankSummaryExtractor - Class in org.lionsoul.jcseg.extractor.impl
TextRank summary extractor base on textRank algorithm
TextRankSummaryExtractor(ISegment, SentenceSeg) - Constructor for class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
 
TextRankSummaryExtractor.Document - Class in org.lionsoul.jcseg.extractor.impl
summary document inner class
TextRankSummaryExtractor.Document(int, Sentence, List<IWord>, double) - Constructor for class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
construct method
tokenize(String) - Method in class org.lionsoul.jcseg.test.JcsegTest
string tokenize handler
toLowerCase(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
 
toString() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
 
toString() - Method in class org.lionsoul.jcseg.tokenizer.Sentence
rewrite the toString method
toString() - Method in class org.lionsoul.jcseg.tokenizer.Word
 
toString() - Method in class org.lionsoul.jcseg.util.IStringBuffer
return the string of the current buffer
toUpperCase(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
 
TRASTR - Static variable in class org.lionsoul.jcseg.util.STConverter
 
TraToSimplified(String) - Static method in class org.lionsoul.jcseg.util.STConverter
convert the traditional words to simplified words of the specified string.
TraToSimplified(String, IStringBuffer) - Static method in class org.lionsoul.jcseg.util.STConverter
 

U

UNMATCH_CJK_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
unmatched word
unread(int) - Method in class org.lionsoul.jcseg.util.IPushbackReader
unread the speicfied data to the stream push the data back to the queue in fact, you know
unread(char[], int, int) - Method in class org.lionsoul.jcseg.util.IPushbackReader
unread a block from a char array to the stream
UNRECOGNIZE - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
 
Util - Class in org.lionsoul.jcseg.util
static method for jcseg.
Util() - Constructor for class org.lionsoul.jcseg.util.Util
 

V

version - Static variable in class org.lionsoul.jcseg.tokenizer.core.SegmentFactory
 

W

windowSize - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
 
windowSize - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
 
Word - Class in org.lionsoul.jcseg.tokenizer
word class for jcseg has implements IWord interface
Word(String, int) - Constructor for class org.lionsoul.jcseg.tokenizer.Word
 
Word(String, int, int) - Constructor for class org.lionsoul.jcseg.tokenizer.Word
 
wordPool - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
CJK word cache pool, Reusable string buffer and the array list for basic integer
wordSeg - Variable in class org.lionsoul.jcseg.extractor.SummaryExtractor
ISegment word tokenizer object
A B C D E F G H I J K L M N O P Q R S T U V W 

Copyright © 2016. All Rights Reserved.