annif.transform package
Submodules
annif.transform.inputlimiter module
A simple transformation that truncates the text of input documents to a given character length.
- class annif.transform.inputlimiter.InputLimiter(project: AnnifProject | None, input_limit: str)
Bases:
BaseTransform
- name = 'limit'
- transform_fn(text: str) str
Perform the text transformation.
annif.transform.langfilter module
Transformation filtering out parts of a text that are in a language different from the language of the project.
- class annif.transform.langfilter.LangFilter(project: AnnifProject, text_min_length: int | str = 500, sentence_min_length: int | str = 50, min_ratio: float = 0.5)
Bases:
BaseTransform
- name = 'filter_lang'
- transform_fn(text: str) str
Perform the text transformation.
annif.transform.transform module
Common functionality for transforming text of input documents.
- class annif.transform.transform.BaseTransform(project: AnnifProject | None)
Bases:
object
Base class for text transformations, which need to implement the transform function.
- name = None
- abstract transform_fn(text)
Perform the text transformation.
- class annif.transform.transform.IdentityTransform(project: AnnifProject | None)
Bases:
BaseTransform
Transform that does not modify text but simply passes it through.
- name = 'pass'
- transform_fn(text: str) str
Perform the text transformation.
- class annif.transform.transform.TransformChain(transform_classes: list[Type[BaseTransform]], args: list[tuple[list, dict]], project: AnnifProject | None)
Bases:
object
Class instantiating and holding the transformation objects performing the actual text transformation.
- transform_corpus(corpus: DocumentCorpus) TransformingDocumentCorpus
- transform_text(text: str) str
Module contents
Functionality for obtaining text transformation from string specification
- annif.transform.get_transform(transform_specs: str, project: AnnifProject | None) TransformChain
- annif.transform.parse_specs(transform_specs: str) list[tuple[str, list, dict]]
Parse a transformation specification into a list of tuples, e.g. ‘transf_1(x),transf_2(y=42),transf_3’ is parsed to [(transf_1, [x], {}), (transf_2, [], {y: 42}), (transf_3, [], {})].