annif.backend package¶
Submodules¶
annif.backend.backend module¶
Common functionality for backends.
- class annif.backend.backend.AnnifBackend(backend_id, config_params, project)¶
Bases:
object
Base class for Annif backends that perform analysis. The non-implemented methods should be overridden in subclasses.
- DEFAULT_PARAMETERS = {'limit': 100}¶
- debug(message)¶
Log a debug message from this backend
- default_params()¶
- info(message)¶
Log an info message from this backend
- initialize(parallel=False)¶
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- property is_trained¶
- property modification_time¶
- name = None¶
- needs_subject_index = False¶
- property params¶
- suggest(text, params=None)¶
Suggest subjects for the input text and return a list of subjects represented as a list of SubjectSuggestion objects.
- train(corpus, params=None, jobs=0)¶
Train the model on the given document or subject corpus.
- warning(message)¶
Log a warning message from this backend
- class annif.backend.backend.AnnifLearningBackend(backend_id, config_params, project)¶
Bases:
annif.backend.backend.AnnifBackend
Base class for Annif backends that can perform online learning
- learn(corpus, params=None)¶
Further train the model on the given document or subject corpus.
annif.backend.dummy module¶
Dummy backend for testing basic interaction of projects and backends
- class annif.backend.dummy.DummyBackend(backend_id, config_params, project)¶
Bases:
annif.backend.backend.AnnifLearningBackend
- default_params()¶
- initialize(parallel=False)¶
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- initialized = False¶
- is_trained = True¶
- label = 'dummy'¶
- modification_time = None¶
- name = 'dummy'¶
- uri = 'http://example.org/dummy'¶
annif.backend.ensemble module¶
Ensemble backend that combines results from multiple projects
- class annif.backend.ensemble.BaseEnsembleBackend(backend_id, config_params, project)¶
Bases:
annif.backend.backend.AnnifBackend
Base class for ensemble backends
- initialize(parallel=False)¶
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- class annif.backend.ensemble.EnsembleBackend(backend_id, config_params, project)¶
Bases:
annif.backend.ensemble.BaseEnsembleBackend
,annif.backend.hyperopt.AnnifHyperoptBackend
Ensemble backend that combines results from multiple projects
- get_hp_optimizer(corpus, metric)¶
Get a HyperparameterOptimizer object that can look for optimal hyperparameter combinations for the given corpus, measured using the given metric
- property is_trained¶
- property modification_time¶
- name = 'ensemble'¶
- class annif.backend.ensemble.EnsembleOptimizer(backend, corpus, metric)¶
Bases:
annif.backend.hyperopt.HyperparameterOptimizer
Hyperparameter optimizer for the ensemble backend
annif.backend.fasttext module¶
annif.backend.http module¶
HTTP/REST client backend that makes calls to a web service and returns the results
- class annif.backend.http.HTTPBackend(backend_id, config_params, project)¶
Bases:
annif.backend.backend.AnnifBackend
- property is_trained¶
- property modification_time¶
- name = 'http'¶
annif.backend.hyperopt module¶
Hyperparameter optimization functionality for backends
- class annif.backend.hyperopt.AnnifHyperoptBackend(backend_id, config_params, project)¶
Bases:
annif.backend.backend.AnnifBackend
Base class for Annif backends that can perform hyperparameter optimization
- abstract get_hp_optimizer(corpus, metric)¶
Get a HyperparameterOptimizer object that can look for optimal hyperparameter combinations for the given corpus, measured using the given metric
- class annif.backend.hyperopt.HPRecommendation(lines, score)¶
Bases:
tuple
- lines¶
Alias for field number 0
- score¶
Alias for field number 1
annif.backend.mixins module¶
Annif backend mixins that can be used to implement features
annif.backend.mllm module¶
Maui-like Lexical Matching backend
- class annif.backend.mllm.MLLMBackend(backend_id, config_params, project)¶
Bases:
annif.backend.hyperopt.AnnifHyperoptBackend
Maui-like Lexical Matching backend for Annif
- DEFAULT_PARAMETERS = {'max_leaf_nodes': 1000, 'max_samples': 0.9, 'min_samples_leaf': 20, 'use_hidden_labels': False}¶
- MODEL_FILE = 'mllm-model.gz'¶
- TRAIN_FILE = 'mllm-train.gz'¶
- default_params()¶
- get_hp_optimizer(corpus, metric)¶
Get a HyperparameterOptimizer object that can look for optimal hyperparameter combinations for the given corpus, measured using the given metric
- initialize(parallel=False)¶
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'mllm'¶
- needs_subject_index = True¶
- class annif.backend.mllm.MLLMOptimizer(backend, corpus, metric)¶
Bases:
annif.backend.hyperopt.HyperparameterOptimizer
Hyperparameter optimizer for the MLLM backend
annif.backend.nn_ensemble module¶
Neural network based ensemble backend that combines results from multiple projects.
- class annif.backend.nn_ensemble.LMDBSequence(txn, batch_size)¶
Bases:
keras.utils.data_utils.Sequence
A sequence of samples stored in a LMDB database.
- add_sample(inputs, targets)¶
- class annif.backend.nn_ensemble.MeanLayer(*args, **kwargs)¶
Bases:
keras.engine.base_layer.Layer
Custom Keras layer that calculates mean values along the 2nd axis.
- call(inputs)¶
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.
- Args:
- inputs: Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
- *args: Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
- **kwargs: Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- class annif.backend.nn_ensemble.NNEnsembleBackend(backend_id, config_params, project)¶
Bases:
annif.backend.backend.AnnifLearningBackend
,annif.backend.ensemble.BaseEnsembleBackend
Neural network ensemble backend that combines results from multiple projects
- DEFAULT_PARAMETERS = {'dropout_rate': 0.2, 'epochs': 10, 'learn-epochs': 1, 'lmdb_map_size': 1073741824, 'nodes': 100, 'optimizer': 'adam'}¶
- LMDB_FILE = 'nn-train.mdb'¶
- MODEL_FILE = 'nn-model.h5'¶
- default_params()¶
- initialize(parallel=False)¶
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'nn_ensemble'¶
- annif.backend.nn_ensemble.idx_to_key(idx)¶
convert an integer index to a binary key for use in LMDB
- annif.backend.nn_ensemble.key_to_idx(key)¶
convert a binary LMDB key to an integer index
annif.backend.omikuji module¶
Annif backend using the Omikuji classifier
- class annif.backend.omikuji.OmikujiBackend(backend_id, config_params, project)¶
Bases:
annif.backend.mixins.TfidfVectorizerMixin
,annif.backend.backend.AnnifBackend
Omikuji based backend for Annif
- DEFAULT_PARAMETERS = {'cluster_balanced': True, 'cluster_k': 2, 'collapse_every_n_layers': 0, 'max_depth': 20, 'min_df': 1, 'ngram': 1}¶
- MODEL_FILE = 'omikuji-model'¶
- TRAIN_FILE = 'omikuji-train.txt'¶
- default_params()¶
- initialize(parallel=False)¶
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'omikuji'¶
- needs_subject_index = True¶
annif.backend.pav module¶
PAV ensemble backend that combines results from multiple projects and learns which concept suggestions from each backend are trustworthy using the PAV algorithm, a.k.a. isotonic regression, to turn raw scores returned by individual backends into probabilities.
- class annif.backend.pav.PAVBackend(backend_id, config_params, project)¶
Bases:
annif.backend.ensemble.BaseEnsembleBackend
PAV ensemble backend that combines results from multiple projects
- DEFAULT_PARAMETERS = {'min-docs': 10}¶
- MODEL_FILE_PREFIX = 'pav-model-'¶
- default_params()¶
- initialize(parallel=False)¶
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'pav'¶
annif.backend.stwfsa module¶
- class annif.backend.stwfsa.StwfsaBackend(backend_id, config_params, project)¶
Bases:
annif.backend.backend.AnnifBackend
- DEFAULT_PARAMETERS = {'concept_type_uri': 'http://www.w3.org/2004/02/skos/core#Concept', 'expand_abbreviation_with_punctuation': True, 'expand_ampersand_with_spaces': True, 'extract_any_case_from_braces': False, 'extract_upper_case_from_braces': True, 'handle_title_case': True, 'remove_deprecated': True, 'simple_english_plural_rules': False, 'sub_thesaurus_type_uri': 'http://www.w3.org/2004/02/skos/core#Collection', 'thesaurus_relation_is_specialisation': True, 'thesaurus_relation_type_uri': 'http://www.w3.org/2004/02/skos/core#member', 'use_txt_vec': False}¶
- MODEL_FILE = 'stwfsa_predictor.zip'¶
- STWFSA_PARAMETERS = {'concept_type_uri': <class 'str'>, 'expand_abbreviation_with_punctuation': <function boolean>, 'expand_ampersand_with_spaces': <function boolean>, 'extract_any_case_from_braces': <function boolean>, 'extract_upper_case_from_braces': <function boolean>, 'handle_title_case': <function boolean>, 'remove_deprecated': <function boolean>, 'simple_english_plural_rules': <function boolean>, 'sub_thesaurus_type_uri': <class 'str'>, 'thesaurus_relation_is_specialisation': <function boolean>, 'thesaurus_relation_type_uri': <class 'str'>, 'use_txt_vec': <class 'bool'>}¶
- initialize(parallel=False)¶
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'stwfsa'¶
- needs_subject_index = True¶
annif.backend.svc module¶
Annif backend using a SVM classifier
- class annif.backend.svc.SVCBackend(backend_id, config_params, project)¶
Bases:
annif.backend.mixins.TfidfVectorizerMixin
,annif.backend.backend.AnnifBackend
Support vector classifier backend for Annif
- DEFAULT_PARAMETERS = {'min_df': 1, 'ngram': 1}¶
- MODEL_FILE = 'svc-model.gz'¶
- default_params()¶
- initialize(parallel=False)¶
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'svc'¶
- needs_subject_index = True¶
annif.backend.tfidf module¶
Backend that returns most similar subjects based on similarity in sparse TF-IDF normalized bag-of-words vector space
- class annif.backend.tfidf.SubjectBuffer(tempdir, subject_id)¶
Bases:
object
A file-backed buffer to store and retrieve subject text.
- BUFFER_SIZE = 100¶
- flush()¶
- read()¶
- write(text)¶
- class annif.backend.tfidf.TFIDFBackend(backend_id, config_params, project)¶
Bases:
annif.backend.mixins.TfidfVectorizerMixin
,annif.backend.backend.AnnifBackend
TF-IDF vector space similarity based backend for Annif
- INDEX_FILE = 'tfidf-index'¶
- initialize(parallel=False)¶
This method can be overridden by backends. It should cause the backend to pre-load all data it needs during operation. If parallel is True, the backend should expect to be used for parallel operation.
- name = 'tfidf'¶
- needs_subject_index = True¶
annif.backend.yake module¶
Module contents¶
Registry of backend types for Annif
- annif.backend.get_backend(backend_id)¶