annif.lexical package
Submodules
annif.lexical.mllm module
MLLM (Maui-like Lexical Matchin) model for Annif
- class annif.lexical.mllm.Candidate(doc_length, subject_id, freq, is_pref, n_tokens, ambiguity, first_occ, last_occ, spread)
Bases:
tuple
- ambiguity
Alias for field number 5
- doc_length
Alias for field number 0
- first_occ
Alias for field number 6
- freq
Alias for field number 2
- is_pref
Alias for field number 3
- last_occ
Alias for field number 7
- n_tokens
Alias for field number 4
- spread
Alias for field number 8
- subject_id
Alias for field number 1
- class annif.lexical.mllm.Feature(value)
Bases:
IntEnum
An enumeration.
- ambiguity = 6
- broader = 11
- collection = 14
- doc_freq = 1
- doc_length = 10
- first_occ = 7
- freq = 0
- is_pref = 4
- last_occ = 8
- n_tokens = 5
- narrower = 12
- spread = 9
- subj_freq = 2
- tfidf = 3
- class annif.lexical.mllm.MLLMCandidateGenerator
Bases:
BaseWorker
- classmethod generate_candidates(doc_subject_set, text)
- class annif.lexical.mllm.MLLMFeatureConverter
Bases:
BaseWorker
- classmethod candidates_to_features(candidates)
- class annif.lexical.mllm.MLLMModel
Bases:
object
Maui-like Lexical Matching model
- prepare_train(corpus: DocumentCorpus, vocab: AnnifVocabulary, analyzer: Analyzer, params: dict[str, Any], n_jobs: int) tuple[np.ndarray, np.ndarray]
- save(filename: str) list[str]
- train(train_x: ndarray | list[tuple[int, int]], train_y: list[bool] | ndarray, params: dict[str, Any]) None
- class annif.lexical.mllm.Match(subject_id, is_pref, n_tokens, pos, ambiguity)
Bases:
tuple
- ambiguity
Alias for field number 4
- is_pref
Alias for field number 1
- n_tokens
Alias for field number 2
- pos
Alias for field number 3
- subject_id
Alias for field number 0
- class annif.lexical.mllm.ModelData(broader, narrower, related, collection, doc_freq, subj_freq, idf)
Bases:
tuple
- broader
Alias for field number 0
- collection
Alias for field number 3
- doc_freq
Alias for field number 4
- idf
Alias for field number 6
- narrower
Alias for field number 1
Alias for field number 2
- subj_freq
Alias for field number 5
- class annif.lexical.mllm.Term(subject_id, label, is_pref)
Bases:
tuple
- is_pref
Alias for field number 2
- label
Alias for field number 1
- subject_id
Alias for field number 0
- annif.lexical.mllm.candidates_to_features(candidates: list[Candidate], mdata: ModelData) ndarray
Convert a list of Candidates to a NumPy feature matrix
- annif.lexical.mllm.generate_candidates(text: str, analyzer: Analyzer, vectorizer: CountVectorizer, index: TokenSetIndex) list[Candidate]
annif.lexical.tokenset module
Index for fast matching of token sets.
- class annif.lexical.tokenset.TokenSet(tokens: ndarray, subject_id: int | None = None, is_pref: bool = False)
Bases:
object
Represents a set of tokens (expressed as integer token IDs) that can be matched with another set of tokens. A TokenSet can optionally be associated with a subject from the vocabulary.
- class annif.lexical.tokenset.TokenSetIndex
Bases:
object
A searchable index of TokenSets (representing vocabulary terms)
annif.lexical.util module
Utility methods for lexical algorithms
- annif.lexical.util.get_subject_labels(graph: Graph, uri: str, properties: list[URIRef], language: str) list[str]
- annif.lexical.util.make_collection_matrix(graph: Graph, vocab: AnnifVocabulary) csc_matrix
- annif.lexical.util.make_relation_matrix(graph: Graph, vocab: AnnifVocabulary, property: URIRef) csc_matrix