annif.analyzer package
Submodules
annif.analyzer.analyzer module
Common functionality for analyzers.
- class annif.analyzer.analyzer.Analyzer(**kwargs)
Bases:
object
Base class for language-specific analyzers. Either tokenize_words or _normalize_word must be overridden in subclasses. Other methods may be overridden when necessary.
- is_valid_token(word: str) bool
Return True if the word is an acceptable token.
- name = None
- token_min_length = 3
- tokenize_sentences(text: str) list[str]
Tokenize a piece of text (e.g. a document) into sentences.
- tokenize_words(text: str, filter: bool = True) list[str]
Tokenize a piece of text (e.g. a sentence) into words. If filter=True (default), only return valid tokens (e.g. not punctuation, numbers or very short words)
annif.analyzer.simple module
Simple analyzer for Annif. Only folds words to lower case.
- class annif.analyzer.simple.SimpleAnalyzer(param: None, **kwargs)
Bases:
annif.analyzer.analyzer.Analyzer
- name = 'simple'
annif.analyzer.simplemma module
Simplemma analyzer for Annif, based on simplemma lemmatizer.
- class annif.analyzer.simplemma.SimplemmaAnalyzer(param: str, **kwargs)
Bases:
annif.analyzer.analyzer.Analyzer
- name = 'simplemma'
annif.analyzer.snowball module
Snowball analyzer for Annif, based on nltk Snowball stemmer.
- class annif.analyzer.snowball.SnowballAnalyzer(param: str, **kwargs)
Bases:
annif.analyzer.analyzer.Analyzer
- name = 'snowball'
annif.analyzer.spacy module
spaCy analyzer for Annif which uses spaCy for lemmatization
- class annif.analyzer.spacy.SpacyAnalyzer(param: str, **kwargs)
Bases:
annif.analyzer.analyzer.Analyzer
- name = 'spacy'
- tokenize_words(text: str, filter: bool = True) list[str]
Tokenize a piece of text (e.g. a sentence) into words. If filter=True (default), only return valid tokens (e.g. not punctuation, numbers or very short words)
annif.analyzer.voikko module
Voikko analyzer for Annif, based on libvoikko library.
- class annif.analyzer.voikko.VoikkoAnalyzer(param: str, **kwargs)
Bases:
annif.analyzer.analyzer.Analyzer
- name = 'voikko'