annif.analyzer package

Submodules

annif.analyzer.analyzer module

Common functionality for analyzers.

class annif.analyzer.analyzer.Analyzer(**kwargs)

Bases: object

Base class for language-specific analyzers. Either tokenize_words or _normalize_word must be overridden in subclasses. Other methods may be overridden when necessary.

is_valid_token(word: str) bool

Return True if the word is an acceptable token.

name = None
token_min_length = 3
tokenize_sentences(text: str) list[str]

Tokenize a piece of text (e.g. a document) into sentences.

tokenize_words(text: str, filter: bool = True) list[str]

Tokenize a piece of text (e.g. a sentence) into words. If filter=True (default), only return valid tokens (e.g. not punctuation, numbers or very short words)

annif.analyzer.simple module

Simple analyzer for Annif. Only folds words to lower case.

class annif.analyzer.simple.SimpleAnalyzer(param: None, **kwargs)

Bases: Analyzer

name = 'simple'

annif.analyzer.simplemma module

Simplemma analyzer for Annif, based on simplemma lemmatizer.

class annif.analyzer.simplemma.SimplemmaAnalyzer(param: str, **kwargs)

Bases: Analyzer

name = 'simplemma'

annif.analyzer.snowball module

Snowball analyzer for Annif, based on nltk Snowball stemmer.

class annif.analyzer.snowball.SnowballAnalyzer(param: str, **kwargs)

Bases: Analyzer

name = 'snowball'

annif.analyzer.spacy module

spaCy analyzer for Annif which uses spaCy for lemmatization

class annif.analyzer.spacy.SpacyAnalyzer(param: str, **kwargs)

Bases: Analyzer

name = 'spacy'
tokenize_words(text: str, filter: bool = True) list[str]

Tokenize a piece of text (e.g. a sentence) into words. If filter=True (default), only return valid tokens (e.g. not punctuation, numbers or very short words)

annif.analyzer.voikko module

Voikko analyzer for Annif, based on libvoikko library.

class annif.analyzer.voikko.VoikkoAnalyzer(param: str, **kwargs)

Bases: Analyzer

name = 'voikko'

Module contents

Collection of language-specific analyzers and analyzer registry for Annif

annif.analyzer.get_analyzer(analyzerspec: str) Analyzer
annif.analyzer.register_analyzer(analyzer)