annif.analyzer package
Submodules
annif.analyzer.analyzer module
Common functionality for analyzers.
- class annif.analyzer.analyzer.Analyzer(**kwargs)
Bases:
object
Base class for language-specific analyzers. Either tokenize_words or _normalize_word must be overridden in subclasses. Other methods may be overridden when necessary.
- static is_available() bool
Return True if the analyzer is available for use, False if not.
- is_valid_token(word: str) bool
Return True if the word is an acceptable token.
- name = None
- token_min_length = 3
- tokenize_sentences(text: str) list[str]
Tokenize a piece of text (e.g. a document) into sentences.
- tokenize_words(text: str, filter: bool = True) list[str]
Tokenize a piece of text (e.g. a sentence) into words. If filter=True (default), only return valid tokens (e.g. not punctuation, numbers or very short words)
annif.analyzer.estnltk module
EstNLTK analyzer for Annif which uses EstNLTK for lemmatization
- class annif.analyzer.estnltk.EstNLTKAnalyzer(param: str, **kwargs)
Bases:
Analyzer
- static is_available() bool
Return True if the analyzer is available for use, False if not.
- name = 'estnltk'
- tokenize_words(text: str, filter: bool = True) list[str]
Tokenize a piece of text (e.g. a sentence) into words. If filter=True (default), only return valid tokens (e.g. not punctuation, numbers or very short words)
annif.analyzer.simple module
Simple analyzer for Annif. Only folds words to lower case.
annif.analyzer.simplemma module
Simplemma analyzer for Annif, based on simplemma lemmatizer.
annif.analyzer.snowball module
Snowball analyzer for Annif, based on nltk Snowball stemmer.
annif.analyzer.spacy module
spaCy analyzer for Annif which uses spaCy for lemmatization
- class annif.analyzer.spacy.SpacyAnalyzer(param: str, **kwargs)
Bases:
Analyzer
- static is_available() bool
Return True if the analyzer is available for use, False if not.
- name = 'spacy'
- tokenize_words(text: str, filter: bool = True) list[str]
Tokenize a piece of text (e.g. a sentence) into words. If filter=True (default), only return valid tokens (e.g. not punctuation, numbers or very short words)
annif.analyzer.voikko module
Voikko analyzer for Annif, based on libvoikko library.
Module contents
Collection of language-specific analyzers and analyzer registry for Annif
- annif.analyzer.register_analyzer(analyzer)