annif.backend package
Submodules
annif.backend.backend module
Common functionality for backends.
- class annif.backend.backend.AnnifBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
object
Base class for Annif backends that perform analysis. The non-implemented methods should be overridden in subclasses.
- DEFAULT_PARAMETERS = {'limit': 100}
- debug(message: str) None
Log a debug message from this backend
- default_params() dict[str, Any]
- info(message: str) None
Log an info message from this backend
- initialize(parallel: bool = False) None
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- property is_trained: bool
- property modification_time: datetime | None
- name = None
- property params: dict[str, Any]
- suggest(texts: list[str], params: dict[str, Any] | None = None) SuggestionBatch
Suggest subjects for the input documents and return a list of subject sets represented as a list of SubjectSuggestion objects.
- train(corpus: DocumentCorpus, params: dict[str, Any] | None = None, jobs: int = 0) None
Train the model on the given document or subject corpus.
- warning(message: str) None
Log a warning message from this backend
- class annif.backend.backend.AnnifLearningBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
AnnifBackend
Base class for Annif backends that can perform online learning
- learn(corpus: DocumentCorpus, params: dict[str, Any] | None = None) None
Further train the model on the given document or subject corpus.
annif.backend.dummy module
Dummy backend for testing basic interaction of projects and backends
- class annif.backend.dummy.DummyBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
AnnifLearningBackend
- initialize(parallel: bool = False) None
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- initialized = False
- is_trained = True
- modification_time = None
- name = 'dummy'
- subject_id = 0
annif.backend.ensemble module
Ensemble backend that combines results from multiple projects
- class annif.backend.ensemble.BaseEnsembleBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
AnnifBackend
Base class for ensemble backends
- initialize(parallel: bool = False) None
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- class annif.backend.ensemble.EnsembleBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
BaseEnsembleBackend
,AnnifHyperoptBackend
Ensemble backend that combines results from multiple projects
- get_hp_optimizer(corpus: DocumentCorpus, metric: str) EnsembleOptimizer
Get a HyperparameterOptimizer object that can look for optimal hyperparameter combinations for the given corpus, measured using the given metric
- property is_trained: bool
- property modification_time: datetime | None
- name = 'ensemble'
- class annif.backend.ensemble.EnsembleOptimizer(backend: EnsembleBackend, corpus: DocumentCorpus, metric: str)
Bases:
HyperparameterOptimizer
Hyperparameter optimizer for the ensemble backend
annif.backend.fasttext module
Annif backend using the fastText classifier
- class annif.backend.fasttext.FastTextBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
ChunkingBackend
,AnnifBackend
fastText backend for Annif
- DEFAULT_PARAMETERS = {'dim': 100, 'epoch': 5, 'loss': 'hs', 'lr': 0.25}
- FASTTEXT_PARAMS = {'bucket': <class 'int'>, 'dim': <class 'int'>, 'epoch': <class 'int'>, 'loss': <class 'str'>, 'lr': <class 'float'>, 'lrUpdateRate': <class 'int'>, 'maxn': <class 'int'>, 'minCount': <class 'int'>, 'minn': <class 'int'>, 'neg': <class 'int'>, 'pretrainedVectors': <class 'str'>, 't': <class 'float'>, 'thread': <class 'int'>, 'wordNgrams': <class 'int'>, 'ws': <class 'int'>}
- MODEL_FILE = 'fasttext-model'
- TRAIN_FILE = 'fasttext-train.txt'
- default_params() dict[str, Any]
- initialize(parallel: bool = False) None
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'fasttext'
annif.backend.http module
HTTP/REST client backend that makes calls to a web service and returns the results
- class annif.backend.http.HTTPBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
AnnifBackend
- property headers: dict[str, str]
- property is_trained: bool | None
- property modification_time: datetime | None
- name = 'http'
annif.backend.hyperopt module
Hyperparameter optimization functionality for backends
- class annif.backend.hyperopt.AnnifHyperoptBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
AnnifBackend
Base class for Annif backends that can perform hyperparameter optimization
- abstract get_hp_optimizer(corpus: DocumentCorpus, metric: str)
Get a HyperparameterOptimizer object that can look for optimal hyperparameter combinations for the given corpus, measured using the given metric
- class annif.backend.hyperopt.HPRecommendation(lines, score)
Bases:
tuple
- lines
Alias for field number 0
- score
Alias for field number 1
- class annif.backend.hyperopt.HyperparameterOptimizer(backend: AnnifBackend, corpus: DocumentCorpus, metric: str)
Bases:
object
Base class for hyperparameter optimizers
- optimize(n_trials: int, n_jobs: int, results_file: LazyFile | None) HPRecommendation
Find the optimal hyperparameters by testing up to the given number of hyperparameter combinations
- class annif.backend.hyperopt.TrialWriter(results_file: LazyFile, normalize_func: Callable)
Bases:
object
Object that writes hyperparameter optimization trial results into a TSV file.
- write(study: Study, trial: Trial) None
Write the results of one trial into the results file. On the first run, write the header line first.
annif.backend.mixins module
Annif backend mixins that can be used to implement features
annif.backend.mllm module
Maui-like Lexical Matching backend
- class annif.backend.mllm.MLLMBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
AnnifHyperoptBackend
Maui-like Lexical Matching backend for Annif
- DEFAULT_PARAMETERS = {'max_leaf_nodes': 1000, 'max_samples': 0.9, 'min_samples_leaf': 20, 'use_hidden_labels': False}
- MODEL_FILE = 'mllm-model.gz'
- TRAIN_FILE = 'mllm-train.gz'
- get_hp_optimizer(corpus: DocumentCorpus, metric: str) MLLMOptimizer
Get a HyperparameterOptimizer object that can look for optimal hyperparameter combinations for the given corpus, measured using the given metric
- initialize(parallel: bool = False) None
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'mllm'
- class annif.backend.mllm.MLLMOptimizer(backend: AnnifBackend, corpus: DocumentCorpus, metric: str)
Bases:
HyperparameterOptimizer
Hyperparameter optimizer for the MLLM backend
annif.backend.nn_ensemble module
Neural network based ensemble backend that combines results from multiple projects.
- class annif.backend.nn_ensemble.LMDBSequence(txn, batch_size)
Bases:
PyDataset
A sequence of samples stored in a LMDB database.
- add_sample(inputs: ndarray, targets: ndarray) None
- class annif.backend.nn_ensemble.MeanLayer(*args, **kwargs)
Bases:
Layer
Custom Keras layer that calculates mean values along the 2nd axis.
- call(inputs: EagerTensor) EagerTensor
- class annif.backend.nn_ensemble.NNEnsembleBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
AnnifLearningBackend
,BaseEnsembleBackend
Neural network ensemble backend that combines results from multiple projects
- DEFAULT_PARAMETERS = {'dropout_rate': 0.2, 'epochs': 10, 'learn-epochs': 1, 'lmdb_map_size': 1073741824, 'nodes': 100, 'optimizer': 'adam'}
- LMDB_FILE = 'nn-train.mdb'
- MODEL_FILE = 'nn-model.keras'
- get_model_metadata(model_filename: str) dict | None
Read metadata from Keras model files.
- initialize(parallel: bool = False) None
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'nn_ensemble'
- annif.backend.nn_ensemble.idx_to_key(idx: int) bytes
convert an integer index to a binary key for use in LMDB
- annif.backend.nn_ensemble.key_to_idx(key: memoryview | bytes) int
convert a binary LMDB key to an integer index
annif.backend.omikuji module
Annif backend using the Omikuji classifier
- class annif.backend.omikuji.OmikujiBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
TfidfVectorizerMixin
,AnnifBackend
Omikuji based backend for Annif
- DEFAULT_PARAMETERS = {'cluster_balanced': True, 'cluster_k': 2, 'collapse_every_n_layers': 0, 'max_depth': 20, 'min_df': 1, 'ngram': 1}
- MODEL_FILE = 'omikuji-model'
- TRAIN_FILE = 'omikuji-train.txt'
- initialize(parallel: bool = False) None
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'omikuji'
annif.backend.pav module
PAV ensemble backend that combines results from multiple projects and learns which concept suggestions from each backend are trustworthy using the PAV algorithm, a.k.a. isotonic regression, to turn raw scores returned by individual backends into probabilities.
- class annif.backend.pav.PAVBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
BaseEnsembleBackend
PAV ensemble backend that combines results from multiple projects
- DEFAULT_PARAMETERS = {'min-docs': 10}
- MODEL_FILE_PREFIX = 'pav-model-'
- initialize(parallel: bool = False) None
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'pav'
annif.backend.stwfsa module
- class annif.backend.stwfsa.StwfsaBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
AnnifBackend
- DEFAULT_PARAMETERS = {'concept_type_uri': 'http://www.w3.org/2004/02/skos/core#Concept', 'expand_abbreviation_with_punctuation': True, 'expand_ampersand_with_spaces': True, 'extract_any_case_from_braces': False, 'extract_upper_case_from_braces': True, 'handle_title_case': True, 'remove_deprecated': True, 'simple_english_plural_rules': False, 'sub_thesaurus_type_uri': 'http://www.w3.org/2004/02/skos/core#Collection', 'thesaurus_relation_is_specialisation': True, 'thesaurus_relation_type_uri': 'http://www.w3.org/2004/02/skos/core#member', 'use_txt_vec': False}
- MODEL_FILE = 'stwfsa_predictor.zip'
- STWFSA_PARAMETERS = {'concept_type_uri': <class 'str'>, 'expand_abbreviation_with_punctuation': <function boolean>, 'expand_ampersand_with_spaces': <function boolean>, 'extract_any_case_from_braces': <function boolean>, 'extract_upper_case_from_braces': <function boolean>, 'handle_title_case': <function boolean>, 'remove_deprecated': <function boolean>, 'simple_english_plural_rules': <function boolean>, 'sub_thesaurus_type_uri': <class 'str'>, 'thesaurus_relation_is_specialisation': <function boolean>, 'thesaurus_relation_type_uri': <class 'str'>, 'use_txt_vec': <class 'bool'>}
- initialize(parallel: bool = False) None
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'stwfsa'
annif.backend.svc module
Annif backend using a SVM classifier
- class annif.backend.svc.SVCBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
TfidfVectorizerMixin
,AnnifBackend
Support vector classifier backend for Annif
- DEFAULT_PARAMETERS = {'min_df': 1, 'ngram': 1}
- MODEL_FILE = 'svc-model.gz'
- initialize(parallel: bool = False) None
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'svc'
annif.backend.tfidf module
Backend that returns most similar subjects based on similarity in sparse TF-IDF normalized bag-of-words vector space
- class annif.backend.tfidf.SubjectBuffer(tempdir: str, subject_id: int)
Bases:
object
A file-backed buffer to store and retrieve subject text.
- BUFFER_SIZE = 100
- flush() None
- read() str
- write(text: str) None
- class annif.backend.tfidf.TFIDFBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
TfidfVectorizerMixin
,AnnifBackend
TF-IDF vector space similarity based backend for Annif
- INDEX_FILE = 'tfidf-index'
- initialize(parallel: bool = False) None
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'tfidf'
annif.backend.yake module
Annif backend using Yake keyword extraction
- class annif.backend.yake.YakeBackend(backend_id: str, config_params: dict[str, Any] | SectionProxy, project: AnnifProject)
Bases:
AnnifBackend
Yake based backend for Annif
- DEFAULT_PARAMETERS = {'deduplication_algo': 'levs', 'deduplication_threshold': 0.9, 'features': None, 'label_types': ['prefLabel', 'altLabel'], 'max_ngram_size': 4, 'num_keywords': 100, 'remove_parentheses': False, 'window_size': 1}
- INDEX_FILE = 'yake-index'
- initialize(parallel: bool = False) None
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- property is_trained
- property label_types: list[URIRef]
- name = 'yake'
Module contents
Registry of backend types for Annif
- annif.backend.get_backend(backend_id: str) Type[AnnifBackend]