annif.transform package

Submodules

annif.transform.inputlimiter module

A simple transformation that truncates the text of input documents to a given character length.

class annif.transform.inputlimiter.InputLimiter(project: AnnifProject | None, input_limit: str)

Bases: BaseTransform

name = 'limit'
transform_fn(text: str) str

Perform the text transformation.

annif.transform.langfilter module

Transformation filtering out parts of a text that are in a language different from the language of the project.

class annif.transform.langfilter.LangFilter(project: AnnifProject, text_min_length: int | str = 500, sentence_min_length: int | str = 50, min_ratio: float = 0.5)

Bases: BaseTransform

name = 'filter_lang'
transform_fn(text: str) str

Perform the text transformation.

annif.transform.transform module

Common functionality for transforming text of input documents.

class annif.transform.transform.BaseTransform(project: AnnifProject | None)

Bases: object

Base class for text transformations, which need to implement the transform function.

name = None
abstract transform_fn(text)

Perform the text transformation.

class annif.transform.transform.IdentityTransform(project: AnnifProject | None)

Bases: BaseTransform

Transform that does not modify text but simply passes it through.

name = 'pass'
transform_fn(text: str) str

Perform the text transformation.

class annif.transform.transform.TransformChain(transform_classes: list[Type[BaseTransform]], args: list[tuple[list, dict]], project: AnnifProject | None)

Bases: object

Class instantiating and holding the transformation objects performing the actual text transformation.

transform_corpus(corpus: DocumentCorpus) TransformingDocumentCorpus
transform_text(text: str) str

Module contents

Functionality for obtaining text transformation from string specification

annif.transform.get_transform(transform_specs: str, project: AnnifProject | None) TransformChain
annif.transform.parse_specs(transform_specs: str) list[tuple[str, list, dict]]

Parse a transformation specification into a list of tuples, e.g. ‘transf_1(x),transf_2(y=42),transf_3’ is parsed to [(transf_1, [x], {}), (transf_2, [], {y: 42}), (transf_3, [], {})].