annif.analyzer package¶
Submodules¶
annif.analyzer.analyzer module¶
Common functionality for analyzers.
- class annif.analyzer.analyzer.Analyzer(**kwargs)¶
Bases:
object
Base class for language-specific analyzers. Either tokenize_words or _normalize_word must be overridden in subclasses. Other methods may be overridden when necessary.
- is_valid_token(word)¶
Return True if the word is an acceptable token.
- name = None¶
- token_min_length = 3¶
- tokenize_sentences(text)¶
Tokenize a piece of text (e.g. a document) into sentences.
- tokenize_words(text, filter=True)¶
Tokenize a piece of text (e.g. a sentence) into words. If filter=True (default), only return valid tokens (e.g. not punctuation, numbers or very short words)
annif.analyzer.simple module¶
Simple analyzer for Annif. Only folds words to lower case.
- class annif.analyzer.simple.SimpleAnalyzer(param, **kwargs)¶
Bases:
annif.analyzer.analyzer.Analyzer
- name = 'simple'¶
annif.analyzer.simplemma module¶
Simplemma analyzer for Annif, based on simplemma lemmatizer.
- class annif.analyzer.simplemma.SimplemmaAnalyzer(param, **kwargs)¶
Bases:
annif.analyzer.analyzer.Analyzer
- name = 'simplemma'¶
annif.analyzer.snowball module¶
Snowball analyzer for Annif, based on nltk Snowball stemmer.
- class annif.analyzer.snowball.SnowballAnalyzer(param, **kwargs)¶
Bases:
annif.analyzer.analyzer.Analyzer
- name = 'snowball'¶
annif.analyzer.spacy module¶
spaCy analyzer for Annif which uses spaCy for lemmatization
- class annif.analyzer.spacy.SpacyAnalyzer(param, **kwargs)¶
Bases:
annif.analyzer.analyzer.Analyzer
- name = 'spacy'¶
- tokenize_words(text, filter=True)¶
Tokenize a piece of text (e.g. a sentence) into words. If filter=True (default), only return valid tokens (e.g. not punctuation, numbers or very short words)
annif.analyzer.voikko module¶
Voikko analyzer for Annif, based on libvoikko library.
- class annif.analyzer.voikko.VoikkoAnalyzer(param, **kwargs)¶
Bases:
annif.analyzer.analyzer.Analyzer
- name = 'voikko'¶
Module contents¶
Collection of language-specific analyzers and analyzer registry for Annif
- annif.analyzer.get_analyzer(analyzerspec)¶
- annif.analyzer.register_analyzer(analyzer)¶