annif package¶
Subpackages¶
- annif.analyzer package
- annif.backend package
- Submodules
- annif.backend.backend module
- annif.backend.dummy module
- annif.backend.ensemble module
- annif.backend.fasttext module
- annif.backend.http module
- annif.backend.hyperopt module
- annif.backend.mixins module
- annif.backend.mllm module
- annif.backend.nn_ensemble module
- annif.backend.omikuji module
- annif.backend.pav module
- annif.backend.stwfsa module
- annif.backend.svc module
- annif.backend.tfidf module
- annif.backend.yake module
- Module contents
- annif.corpus package
- annif.lexical package
- annif.transform package
Submodules¶
annif.cli module¶
Definitions for command-line (Click) commands for invoking Annif operations and printing the results to console.
- annif.cli.backend_param_option(f)¶
Decorator to add an option for CLI commands to override BE parameters
- annif.cli.common_options(f)¶
Decorator to add common options for all CLI commands
- annif.cli.generate_filter_batches(subjects)¶
- annif.cli.get_project(project_id)¶
Helper function to get a project by ID and bail out if it doesn’t exist
- annif.cli.open_documents(paths, docs_limit)¶
Helper function to open a document corpus from a list of pathnames, each of which is either a TSV file or a directory of TXT files. The corpus will be returned as an instance of DocumentCorpus or LimitingDocumentCorpus.
- annif.cli.parse_backend_params(backend_param, project)¶
Parse a list of backend parameters given with the –backend-param option into a nested dict structure
- annif.cli.set_project_config_file_path(ctx, param, value)¶
Override the default path or the path given in env by CLI option
- annif.cli.validate_backend_params(backend, beparam, project)¶
annif.config module¶
Configuration file handling
- class annif.config.AnnifConfigCFG(filename)¶
Bases:
object
Class for reading configuration in CFG/INI format
- property project_ids¶
- class annif.config.AnnifConfigDirectory(directory)¶
Bases:
object
Class for reading configuration from directory
- property project_ids¶
- class annif.config.AnnifConfigTOML(filename)¶
Bases:
object
Class for reading configuration in TOML format
- property project_ids¶
- annif.config.check_config(projects_config_path)¶
- annif.config.find_config()¶
- annif.config.parse_config(projects_config_path)¶
annif.datadir module¶
Mixin class for types that need a data directory
annif.default_config module¶
A configuration module, where “Config” is a default configuration and the other classes are different configuration profiles overriding default settings.
- class annif.default_config.Config¶
Bases:
object
- DATADIR = 'data'¶
- DEBUG = False¶
- INITIALIZE_PROJECTS = False¶
- PROJECTS_CONFIG_PATH = ''¶
- TESTING = False¶
- class annif.default_config.DevelopmentConfig¶
Bases:
annif.default_config.Config
- DEBUG = True¶
- class annif.default_config.ProductionConfig¶
Bases:
annif.default_config.Config
- INITIALIZE_PROJECTS = True¶
- class annif.default_config.TestingConfig¶
Bases:
annif.default_config.Config
- DATADIR = 'tests/data'¶
- PROJECTS_CONFIG_PATH = 'tests/projects.cfg'¶
- TESTING = True¶
- class annif.default_config.TestingDirectoryConfig¶
Bases:
annif.default_config.TestingConfig
- PROJECTS_CONFIG_PATH = 'tests/projects.d'¶
- class annif.default_config.TestingInitializeConfig¶
Bases:
annif.default_config.TestingConfig
- INITIALIZE_PROJECTS = True¶
- class annif.default_config.TestingInvalidProjectsConfig¶
Bases:
annif.default_config.TestingConfig
- PROJECTS_CONFIG_PATH = 'tests/projects_invalid.cfg'¶
- class annif.default_config.TestingNoProjectsConfig¶
Bases:
annif.default_config.TestingConfig
- PROJECTS_CONFIG_PATH = 'tests/notfound.cfg'¶
- class annif.default_config.TestingTOMLConfig¶
Bases:
annif.default_config.TestingConfig
- PROJECTS_CONFIG_PATH = 'tests/projects.toml'¶
annif.eval module¶
Evaluation metrics for Annif
- class annif.eval.EvaluationBatch(subject_index)¶
Bases:
object
A class for evaluating batches of results using all available metrics. The evaluate() method is called once per document in the batch. Final results can be queried using the results() method.
- evaluate(hits, gold_subjects)¶
- output_result_per_subject(y_true, y_pred, results_file)¶
Write results per subject (non-aggregated) to outputfile results_file
- results(metrics=[], results_file=None, warnings=False)¶
evaluate a set of selected subjects against a gold standard using different metrics. If metrics is empty, use all available metrics. If results_file (file object) given, write results per subject to it
- annif.eval.dcg_score(y_true, y_pred, limit=None)¶
return the discounted cumulative gain (DCG) score for the selected labels vs. relevant labels
- annif.eval.false_negatives(y_true, y_pred)¶
calculate the number of false negatives using bitwise operations, emulating the way sklearn evaluation metric functions work
- annif.eval.false_positives(y_true, y_pred)¶
calculate the number of false positives using bitwise operations, emulating the way sklearn evaluation metric functions work
- annif.eval.filter_pred_top_k(preds, limit)¶
filter a 2D prediction vector, retaining only the top K suggestions for each individual prediction; the rest will be set to zeros
- annif.eval.ndcg_score(y_true, y_pred, limit=None)¶
return the normalized discounted cumulative gain (nDCG) score for the selected labels vs. relevant labels
- annif.eval.precision_at_k_score(y_true, y_pred, limit)¶
calculate the precision at K, i.e. the number of relevant items among the top K predicted ones
- annif.eval.true_positives(y_true, y_pred)¶
calculate the number of true positives using bitwise operations, emulating the way sklearn evaluation metric functions work
annif.exception module¶
Custom exceptions used by Annif
- exception annif.exception.AnnifException(message, project_id=None, backend_id=None)¶
Bases:
click.exceptions.ClickException
Base Annif exception. We define this as a subclass of ClickException so that the CLI can automatically handle exceptions. This exception cannot be instantiated directly - subclasses should be used instead.
- format_message()¶
- prefix = None¶
- exception annif.exception.ConfigurationException(message, project_id=None, backend_id=None)¶
Bases:
annif.exception.AnnifException
Exception raised when a project or backend is misconfigured.
- prefix = 'Misconfigured'¶
- exception annif.exception.NotInitializedException(message, project_id=None, backend_id=None)¶
Bases:
annif.exception.AnnifException
Exception raised for attempting to use a project or backend that cannot be initialized, most likely since it is not yet functional because of lack of vocabulary or training.
- prefix = "Couldn't initialize"¶
- exception annif.exception.NotSupportedException(message, project_id=None, backend_id=None)¶
Bases:
annif.exception.AnnifException
Exception raised when an operation is not supported by a project or backend.
- prefix = 'Not supported'¶
- exception annif.exception.OperationFailedException(message, project_id=None, backend_id=None)¶
Bases:
annif.exception.AnnifException
Exception raised when an operation fails for some unknown reason.
- prefix = 'Operation failed'¶
annif.parallel module¶
Parallel processing functionality for Annif
- class annif.parallel.BaseWorker¶
Bases:
object
Base class for workers that implement tasks executed via multiprocessing. The init method can be used to store data objects that are necessary for the operation. They will be stored in a class attribute that is accessible to the static worker method. The storage solution is inspired by this blog post: https://thelaziestprogrammer.com/python/multiprocessing-pool-a-global-solution # noqa
- args = None¶
- classmethod init(args)¶
- class annif.parallel.ProjectSuggestMap(registry, project_ids, backend_params, limit, threshold)¶
Bases:
object
A utility class that can be used to wrap one or more projects and provide a mapping method that converts Document objects to suggestions. Intended to be used with the multiprocessing module.
- suggest(doc)¶
- annif.parallel.get_pool(n_jobs)¶
return a suitable multiprocessing pool class, and the correct jobs argument for its constructor, for the given amount of parallel jobs
annif.project module¶
Project management functionality for Annif
- class annif.project.Access(value)¶
Bases:
enum.IntEnum
Enumeration of access levels for projects
- private = 1¶
- public = 3¶
- class annif.project.AnnifProject(project_id, config, datadir, registry)¶
Bases:
annif.datadir.DatadirMixin
Class representing the configuration of a single Annif project.
- DEFAULT_ACCESS = 'public'¶
- property analyzer¶
- property backend¶
- dump()¶
return this project as a dict
- hyperopt(corpus, trials, jobs, metric, results_file)¶
optimize the hyperparameters of the project using a validation corpus against a given metric
- initialize(parallel=False)¶
Initialize this project and its backend so that they are ready to be used. If parallel is True, expect that the project will be used for parallel processing.
- initialized = False¶
- property is_trained¶
- learn(corpus, backend_params=None)¶
further train the project using documents from a metadata source
- property modification_time¶
- remove_model_data()¶
remove the data of this project
- property subjects¶
- suggest(text, backend_params=None)¶
Suggest subjects the given text by passing it to the backend. Returns a list of SubjectSuggestion objects ordered by decreasing score.
- train(corpus, backend_params=None, jobs=0)¶
train the project using documents from a metadata source
- property transform¶
- property vocab¶
annif.registry module¶
Registry that keeps track of Annif projects
- class annif.registry.AnnifRegistry(projects_config_path, datadir, init_projects)¶
Bases:
object
Class that keeps track of the Annif projects
- get_project(project_id, min_access=Access.private)¶
return the definition of a single Project by project_id
- get_projects(min_access=Access.private)¶
Return the available projects as a dict of project_id -> AnnifProject. The min_access parameter may be used to set the minimum access level required for the returned projects.
- annif.registry.get_project(project_id, min_access=Access.private)¶
return the definition of a single Project by project_id
- annif.registry.get_projects(min_access=Access.private)¶
Return the available projects as a dict of project_id -> AnnifProject. The min_access parameter may be used to set the minimum access level required for the returned projects.
- annif.registry.initialize_projects(app)¶
annif.rest module¶
Definitions for REST API operations. These are wired via Connexion to methods defined in the Swagger specification.
- annif.rest.learn(project_id, documents)¶
learn from documents and return an empty 204 response if succesful
- annif.rest.list_projects()¶
return a dict with projects formatted according to Swagger spec
- annif.rest.project_not_found_error(project_id)¶
return a Connexion error object when a project is not found
- annif.rest.server_error(err)¶
return a Connexion error object when there is a server error (project or backend problem)
- annif.rest.show_project(project_id)¶
return a single project formatted according to Swagger spec
- annif.rest.suggest(project_id, text, limit, threshold)¶
suggest subjects for the given text and return a dict with results formatted according to Swagger spec
annif.suggestion module¶
Representing suggested subjects.
- class annif.suggestion.LazySuggestionResult(construct)¶
Bases:
annif.suggestion.SuggestionResult
SuggestionResult implementation that wraps another SuggestionResult which is initialized lazily only when it is actually accessed. Method calls will be proxied to the wrapped SuggestionResult.
- as_list(subject_index)¶
Return the hits as an ordered sequence of SubjectSuggestion objects, highest scores first.
- as_vector(subject_index, destination=None)¶
Return the hits as a one-dimensional score vector where the indexes match the given subject index. If destination array is given (not None) it will be used, otherwise a new array will be created.
- filter(subject_index, limit=None, threshold=0.0)¶
Return a subset of the hits, filtered by the given limit and score threshold, as another SuggestionResult object.
- class annif.suggestion.ListSuggestionResult(hits)¶
Bases:
annif.suggestion.SuggestionResult
SuggestionResult implementation based primarily on lists of hits.
- as_list(subject_index)¶
Return the hits as an ordered sequence of SubjectSuggestion objects, highest scores first.
- as_vector(subject_index, destination=None)¶
Return the hits as a one-dimensional score vector where the indexes match the given subject index. If destination array is given (not None) it will be used, otherwise a new array will be created.
- classmethod create_from_index(hits, subject_index)¶
- filter(subject_index, limit=None, threshold=0.0)¶
Return a subset of the hits, filtered by the given limit and score threshold, as another SuggestionResult object.
- class annif.suggestion.SubjectSuggestion(uri, label, notation, score)¶
Bases:
tuple
- label¶
Alias for field number 1
- notation¶
Alias for field number 2
- score¶
Alias for field number 3
- uri¶
Alias for field number 0
- class annif.suggestion.SuggestionFilter(subject_index, limit=None, threshold=0.0)¶
Bases:
object
A reusable filter for filtering SubjectSuggestion objects.
- class annif.suggestion.SuggestionResult¶
Bases:
object
Abstract base class for a set of hits returned by an analysis operation.
- abstract as_list(subject_index)¶
Return the hits as an ordered sequence of SubjectSuggestion objects, highest scores first.
- abstract as_vector(subject_index, destination=None)¶
Return the hits as a one-dimensional score vector where the indexes match the given subject index. If destination array is given (not None) it will be used, otherwise a new array will be created.
- abstract filter(subject_index, limit=None, threshold=0.0)¶
Return a subset of the hits, filtered by the given limit and score threshold, as another SuggestionResult object.
- class annif.suggestion.VectorSuggestionResult(vector)¶
Bases:
annif.suggestion.SuggestionResult
SuggestionResult implementation based primarily on NumPy vectors.
- as_list(subject_index)¶
Return the hits as an ordered sequence of SubjectSuggestion objects, highest scores first.
- as_vector(subject_index, destination=None)¶
Return the hits as a one-dimensional score vector where the indexes match the given subject index. If destination array is given (not None) it will be used, otherwise a new array will be created.
- filter(subject_index, limit=None, threshold=0.0)¶
Return a subset of the hits, filtered by the given limit and score threshold, as another SuggestionResult object.
- property subject_order¶
annif.util module¶
Utility functions for Annif
- annif.util.atomic_save(obj, dirname, filename, method=None)¶
Save the given object (which must have a .save() method, unless the method parameter is given) into the given directory with the given filename, using a temporary file and renaming the temporary file to the final name.
- annif.util.boolean(val)¶
Convert the given value to a boolean True/False value, if it isn’t already. True values are ‘1’, ‘yes’, ‘true’, and ‘on’ (case insensitive), everything else is False.
- annif.util.cleanup_uri(uri)¶
remove angle brackets from a URI, if any
- annif.util.identity(x)¶
Identity function: return the given argument unchanged
- annif.util.merge_hits(weighted_hits, subject_index)¶
Merge hits from multiple sources. Input is a sequence of WeightedSuggestion objects. A SubjectIndex is needed to convert between subject IDs and URIs. Returns an SuggestionResult object.
- annif.util.metric_code(metric)¶
Convert a human-readable metric name into an alphanumeric string
- annif.util.parse_args(param_string)¶
Parse a string of comma separated arguments such as ‘42,43,key=abc’ into a list of positional args [42, 43] and a dict of keyword args {key: abc}
- annif.util.parse_sources(sourcedef)¶
parse a source definition such as ‘src1:1.0,src2’ into a sequence of tuples (src_id, weight)
annif.views module¶
- annif.views.home()¶
annif.vocab module¶
Vocabulary management functionality for Annif
- class annif.vocab.AnnifVocabulary(vocab_id, datadir, language)¶
Bases:
annif.datadir.DatadirMixin
Class representing a subject vocabulary which can be used by multiple Annif projects.
- as_graph()¶
return the vocabulary as an rdflib graph
- as_skos_file()¶
return the vocabulary as a file object, in SKOS/Turtle syntax
- load_vocabulary(subject_corpus, language, force=False)¶
Load subjects from a subject corpus and save them into a SKOS/Turtle file for later use. If force=True, replace the existing vocabulary completely.
- property skos¶
return the subject vocabulary from SKOS file
- property subjects¶
Module contents¶
- annif.create_app(config_name=None)¶