| Package | Description |
|---|---|
| org.lionsoul.jcseg.extractor | |
| org.lionsoul.jcseg.extractor.impl | |
| org.lionsoul.jcseg.tokenizer | |
| org.lionsoul.jcseg.tokenizer.core | |
| org.lionsoul.jcseg.util |
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
KeywordsExtractor.filter(IWord word)
word item filter
|
protected boolean |
KeyphraseExtractor.filter(IWord word)
word item filter
|
| Modifier and Type | Method and Description |
|---|---|
List<IWord> |
TextRankSummaryExtractor.Document.getWords() |
| Modifier and Type | Method and Description |
|---|---|
void |
TextRankSummaryExtractor.Document.setWords(List<IWord> words) |
protected TextRankSummaryExtractor.Document[] |
TextRankSummaryExtractor.textRankSortedDocuments(List<Sentence> sentence,
List<List<IWord>> senWords)
get the documents order by relevance score.
|
| Constructor and Description |
|---|
TextRankSummaryExtractor.Document(int index,
Sentence sentence,
List<IWord> words,
double score)
construct method
|
| Modifier and Type | Class and Description |
|---|---|
class |
Word
word class for jcseg has implements IWord interface
|
| Modifier and Type | Field and Description |
|---|---|
protected LinkedList<IWord> |
ASegment.wordPool
CJK word cache pool, Reusable string buffer
and the array list for basic integer
|
| Modifier and Type | Method and Description |
|---|---|
IWord |
Word.clone()
Interface to clone the current object
|
protected IWord |
ASegment.enSecondSeg(IWord w,
boolean retfw)
Do the secondary split for the specified complex Latin word
This will split a complex English, Arabic, punctuation compose word to multiple simple parts
Like 'qq2013' will split to 'qq' and '2013'
|
IWord |
Dictionary.get(int t,
String key) |
protected IWord |
ASegment.getNextCJKWord(int c,
int pos)
get the next CJK word from the current position of the input stream
|
protected IWord |
SearchSeg.getNextCJKWord(int c,
int pos)
get the next CJK word from the current position of the input stream
and this function is the core part the most segmentation implements
|
protected IWord |
ASegment.getNextLatinWord(int c,
int pos)
get the next Latin word from the current position of the input stream
|
protected IWord[] |
ASegment.getNextMatch(char[] chars,
int index)
match the next CJK word in the dictionary
|
protected IWord |
ASegment.getNextPunctuationPairWord(int c,
int pos)
get the next punctuation pair word from the current position
of the input stream.
|
IWord[] |
Chunk.getWords() |
IWord |
DetectSeg.next() |
IWord |
ASegment.next() |
protected IWord |
ASegment.nextBasicLatin(int c)
find the letter or digit word from the current position
count until the char is whitespace or not letter_digit
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
ASegment.appendLatinSyn(IWord w)
Check and append the synonyms words of specified word included the CJK and basic Latin words
All the synonyms words share the same position part of speech, word type with the primitive word
|
protected void |
ASegment.appendWordFeatures(IWord word)
check and append the pinyin and the synonyms words of the specified word
|
protected IWord |
ASegment.enSecondSeg(IWord w,
boolean retfw)
Do the secondary split for the specified complex Latin word
This will split a complex English, Arabic, punctuation compose word to multiple simple parts
Like 'qq2013' will split to 'qq' and '2013'
|
boolean |
ASegment.findCHName(IWord w,
IChunk chunk)
Deprecated.
|
| Constructor and Description |
|---|
Chunk(IWord[] words) |
| Modifier and Type | Method and Description |
|---|---|
IWord |
IWord.clone()
make clone available
|
abstract IWord |
ADictionary.get(int t,
String key)
return the IWord asscociate with the given key.
|
IWord[] |
IChunk.getWords()
get the all the words in the chunk.
|
IWord |
ISegment.next()
segment a word from a char array
from a specified position.
|
| Modifier and Type | Class and Description |
|---|---|
class |
IHashQueue<T extends IWord>
A normal queue base one single link
but with hash index, so, it is fast for searching
|
Copyright © 2016. All Rights Reserved.