annif.vocab package
Submodules
annif.vocab.rules module
Support for exclude/include rules for subject vocabularies
- annif.vocab.rules.add_uris(graph: Graph, uris_func: callable, uris_set: set[str], vals: list[str], action: str) None
- annif.vocab.rules.kwargs_to_exclude_uris(vocab: AnnifVocabulary, kwargs: dict[str, str]) set[str]
- annif.vocab.rules.remove_uris(graph: Graph, uris_func: callable, uris_set: set[str], vals: list[str], action: str) None
- annif.vocab.rules.resolve_uri_or_curie(graph: Graph, value: str) URIRef
- annif.vocab.rules.uris_by_collection(graph: Graph, collection: str, action: str) list[str]
- annif.vocab.rules.uris_by_scheme(graph: Graph, scheme: str, action: str) list[str]
- annif.vocab.rules.uris_by_type(graph: Graph, type_: str, action: str) list[str]
annif.vocab.skos module
Classes for supporting vocabulary files in SKOS/RDF format
- class annif.vocab.skos.VocabFileSKOS(path: str)
Bases:
VocabSourceA subject corpus that uses SKOS files
- PREF_LABEL_PROPERTIES = (rdflib.term.URIRef('http://www.w3.org/2004/02/skos/core#prefLabel'), rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'))
- property concepts: Iterator[URIRef]
- get_concept_labels(concept: URIRef, label_types: Sequence[URIRef]) collections.defaultdict[str | None, list[str]]
return all the labels of the given concept with the given label properties as a dict-like object where the keys are language codes and the values are lists of labels in that language
- static is_rdf_file(path: str) bool
return True if the path looks like an RDF file that can be loaded as SKOS
- property languages: set[str]
Provide a list of language codes supported by this vocabulary source.
- save_skos(path: str) None
Save the contents of the subject vocabulary into a SKOS/Turtle file with the given path name.
- annif.vocab.skos.serialize_subjects_to_skos(subjects: Iterator, path: str) None
Create a SKOS representation of the given subjects and serialize it into a SKOS/Turtle file with the given path name.
annif.vocab.subject_file module
Classes for supporting vocabulary files in CSV or TSV format
- class annif.vocab.subject_file.VocabFileCSV(path: str)
Bases:
VocabSourceA multilingual subject vocabulary stored in a CSV file.
- static is_csv_file(path: str) bool
return True if the path looks like a CSV file
- property languages: list[str]
Provide a list of language codes supported by this vocabulary source.
- save_skos(path: str) None
Save the contents of the subject vocabulary into a SKOS/Turtle file with the given path name.
- property subjects: Generator
Iterate through the vocabulary, yielding Subject objects.
- class annif.vocab.subject_file.VocabFileTSV(path: str, language: str)
Bases:
VocabSourceA monolingual subject vocabulary stored in a TSV file.
- property languages: list[str]
Provide a list of language codes supported by this vocabulary source.
- save_skos(path: str) None
Save the contents of the subject vocabulary into a SKOS/Turtle file with the given path name.
- property subjects: Generator
Iterate through the vocabulary, yielding Subject objects.
annif.vocab.subject_index module
Subject index functionality for Annif
- class annif.vocab.subject_index.SubjectIndexFile
Bases:
SubjectIndexSubjectIndex implementation backed by a file.
- property active: list[tuple[int, Subject]]
return a list of (subject_id, Subject) tuples of all subjects that are available for use
- by_label(label: str | None, language: str) int | None
return the subject ID of a subject by its label in a given language
- by_uri(uri: str, warnings: bool = True) int | None
return the subject ID of a subject by its URI, or None if not found. If warnings=True, log a warning message if the URI cannot be found.
- contains_uri(uri: str) bool
- property languages: list[str] | None
- classmethod load(path: str) SubjectIndex
Load a subject index from a CSV file and return it.
- load_subjects(vocab_source: VocabSource) None
Initialize the subject index from a subject corpus
- save(path: str) None
Save this subject index into a file with the given path name.
- class annif.vocab.subject_index.SubjectIndexFilter(subject_index: SubjectIndex, exclude: set[str])
Bases:
SubjectIndexSubjectIndex implementation that filters another SubjectIndex based on a list of subject URIs to exclude.
- property active: list[tuple[int, Subject]]
return a list of (subject_id, Subject) tuples of all subjects that are available for use
- by_label(label: str | None, language: str) int | None
return the subject ID of a subject by its label in a given language
- by_uri(uri: str, warnings: bool = True) int | None
return the subject ID of a subject by its URI, or None if not found. If warnings=True, log a warning message if the URI cannot be found.
- contains_uri(uri: str) bool
- property languages: list[str] | None
annif.vocab.types module
Type declarations for vocabulary functionality
- class annif.vocab.types.Subject(uri, labels, notation)
Bases:
tuple- labels
Alias for field number 1
- notation
Alias for field number 2
- uri
Alias for field number 0
- class annif.vocab.types.SubjectIndex
Bases:
objectBase class for an index that remembers the associations between integer subject IDs and their URIs and labels.
- abstract active() list[tuple[int, Subject]]
return a list of (subject_id, Subject) tuples of all subjects that are available for use
- abstract by_label(label: str | None, language: str) int | None
return the subject ID of a subject by its label in a given language
- abstract by_uri(uri: str, warnings: bool = True) int | None
return the subject ID of a subject by its URI, or None if not found. If warnings=True, log a warning message if the URI cannot be found.
- abstract contains_uri(uri: str) bool
- abstract property languages: list[str] | None
- class annif.vocab.types.VocabSource
Bases:
objectAbstract base class for vocabulary sources
- abstract property languages
Provide a list of language codes supported by this vocabulary source.
- abstract save_skos(path)
Save the contents of the vocabulary source into a SKOS/Turtle file with the given path name.
- abstract property subjects
Iterate through the vocabulary, yielding Subject objects.
annif.vocab.vocab module
Vocabulary management functionality for Annif
- class annif.vocab.vocab.AnnifVocabulary(vocab_id: str, datadir: str)
Bases:
DatadirMixinClass representing a subject vocabulary which can be used by multiple Annif projects.
- INDEX_FILENAME_CSV = 'subjects.csv'
- INDEX_FILENAME_DUMP = 'subjects.dump.gz'
- INDEX_FILENAME_TTL = 'subjects.ttl'
- as_graph() Graph
return the vocabulary as an rdflib graph
- dump() dict[str, str | list | int | bool]
return this vocabulary as a dict
- property languages: list[str]
- load_vocabulary(vocab_source: VocabSource, force: bool = False) None
Load subjects from a subject corpus and save them into one or more subject index files as well as a SKOS/Turtle file for later use. If force=True, replace the existing subject index completely.
- property skos: VocabFileSKOS
return the subject vocabulary from SKOS file
- property subjects: SubjectIndex
Module contents
Annif vocabulary functionality
- class annif.vocab.AnnifVocabulary(vocab_id: str, datadir: str)
Bases:
DatadirMixinClass representing a subject vocabulary which can be used by multiple Annif projects.
- INDEX_FILENAME_CSV = 'subjects.csv'
- INDEX_FILENAME_DUMP = 'subjects.dump.gz'
- INDEX_FILENAME_TTL = 'subjects.ttl'
- as_graph() Graph
return the vocabulary as an rdflib graph
- dump() dict[str, str | list | int | bool]
return this vocabulary as a dict
- property languages: list[str]
- load_vocabulary(vocab_source: VocabSource, force: bool = False) None
Load subjects from a subject corpus and save them into one or more subject index files as well as a SKOS/Turtle file for later use. If force=True, replace the existing subject index completely.
- property skos: VocabFileSKOS
return the subject vocabulary from SKOS file
- property subjects: SubjectIndex
- class annif.vocab.Subject(uri, labels, notation)
Bases:
tuple- labels
Alias for field number 1
- notation
Alias for field number 2
- uri
Alias for field number 0
- class annif.vocab.SubjectIndex
Bases:
objectBase class for an index that remembers the associations between integer subject IDs and their URIs and labels.
- abstract active() list[tuple[int, Subject]]
return a list of (subject_id, Subject) tuples of all subjects that are available for use
- abstract by_label(label: str | None, language: str) int | None
return the subject ID of a subject by its label in a given language
- abstract by_uri(uri: str, warnings: bool = True) int | None
return the subject ID of a subject by its URI, or None if not found. If warnings=True, log a warning message if the URI cannot be found.
- abstract contains_uri(uri: str) bool
- abstract property languages: list[str] | None
- class annif.vocab.SubjectIndexFile
Bases:
SubjectIndexSubjectIndex implementation backed by a file.
- property active: list[tuple[int, Subject]]
return a list of (subject_id, Subject) tuples of all subjects that are available for use
- by_label(label: str | None, language: str) int | None
return the subject ID of a subject by its label in a given language
- by_uri(uri: str, warnings: bool = True) int | None
return the subject ID of a subject by its URI, or None if not found. If warnings=True, log a warning message if the URI cannot be found.
- contains_uri(uri: str) bool
- property languages: list[str] | None
- classmethod load(path: str) SubjectIndex
Load a subject index from a CSV file and return it.
- load_subjects(vocab_source: VocabSource) None
Initialize the subject index from a subject corpus
- save(path: str) None
Save this subject index into a file with the given path name.
- class annif.vocab.SubjectIndexFilter(subject_index: SubjectIndex, exclude: set[str])
Bases:
SubjectIndexSubjectIndex implementation that filters another SubjectIndex based on a list of subject URIs to exclude.
- property active: list[tuple[int, Subject]]
return a list of (subject_id, Subject) tuples of all subjects that are available for use
- by_label(label: str | None, language: str) int | None
return the subject ID of a subject by its label in a given language
- by_uri(uri: str, warnings: bool = True) int | None
return the subject ID of a subject by its URI, or None if not found. If warnings=True, log a warning message if the URI cannot be found.
- contains_uri(uri: str) bool
- property languages: list[str] | None
- class annif.vocab.VocabFileCSV(path: str)
Bases:
VocabSourceA multilingual subject vocabulary stored in a CSV file.
- static is_csv_file(path: str) bool
return True if the path looks like a CSV file
- property languages: list[str]
Provide a list of language codes supported by this vocabulary source.
- save_skos(path: str) None
Save the contents of the subject vocabulary into a SKOS/Turtle file with the given path name.
- property subjects: Generator
Iterate through the vocabulary, yielding Subject objects.
- class annif.vocab.VocabFileSKOS(path: str)
Bases:
VocabSourceA subject corpus that uses SKOS files
- PREF_LABEL_PROPERTIES = (rdflib.term.URIRef('http://www.w3.org/2004/02/skos/core#prefLabel'), rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'))
- property concepts: Iterator[URIRef]
- get_concept_labels(concept: URIRef, label_types: Sequence[URIRef]) collections.defaultdict[str | None, list[str]]
return all the labels of the given concept with the given label properties as a dict-like object where the keys are language codes and the values are lists of labels in that language
- static is_rdf_file(path: str) bool
return True if the path looks like an RDF file that can be loaded as SKOS
- property languages: set[str]
Provide a list of language codes supported by this vocabulary source.
- save_skos(path: str) None
Save the contents of the subject vocabulary into a SKOS/Turtle file with the given path name.
- class annif.vocab.VocabFileTSV(path: str, language: str)
Bases:
VocabSourceA monolingual subject vocabulary stored in a TSV file.
- property languages: list[str]
Provide a list of language codes supported by this vocabulary source.
- save_skos(path: str) None
Save the contents of the subject vocabulary into a SKOS/Turtle file with the given path name.
- property subjects: Generator
Iterate through the vocabulary, yielding Subject objects.
- class annif.vocab.VocabSource
Bases:
objectAbstract base class for vocabulary sources
- abstract property languages
Provide a list of language codes supported by this vocabulary source.
- abstract save_skos(path)
Save the contents of the vocabulary source into a SKOS/Turtle file with the given path name.
- abstract property subjects
Iterate through the vocabulary, yielding Subject objects.
- annif.vocab.kwargs_to_exclude_uris(vocab: AnnifVocabulary, kwargs: dict[str, str]) set[str]