annif.lexical package¶
Submodules¶
annif.lexical.mllm module¶
MLLM (Maui-like Lexical Matchin) model for Annif
- class annif.lexical.mllm.Candidate(doc_length, subject_id, freq, is_pref, n_tokens, ambiguity, first_occ, last_occ, spread)¶
Bases:
tuple
- ambiguity¶
Alias for field number 5
- doc_length¶
Alias for field number 0
- first_occ¶
Alias for field number 6
- freq¶
Alias for field number 2
- is_pref¶
Alias for field number 3
- last_occ¶
Alias for field number 7
- n_tokens¶
Alias for field number 4
- spread¶
Alias for field number 8
- subject_id¶
Alias for field number 1
- class annif.lexical.mllm.Feature(value)¶
Bases:
enum.IntEnum
An enumeration.
- ambiguity = 6¶
- broader = 11¶
- collection = 14¶
- doc_freq = 1¶
- doc_length = 10¶
- first_occ = 7¶
- freq = 0¶
- is_pref = 4¶
- last_occ = 8¶
- n_tokens = 5¶
- narrower = 12¶
- spread = 9¶
- subj_freq = 2¶
- tfidf = 3¶
- class annif.lexical.mllm.MLLMCandidateGenerator¶
Bases:
annif.parallel.BaseWorker
- classmethod generate_candidates(doc_subject_ids, text)¶
- class annif.lexical.mllm.MLLMFeatureConverter¶
Bases:
annif.parallel.BaseWorker
- classmethod candidates_to_features(candidates)¶
- class annif.lexical.mllm.MLLMModel¶
Bases:
object
Maui-like Lexical Matching model
- generate_candidates(text, analyzer)¶
- static load(filename)¶
- predict(candidates)¶
- prepare_train(corpus, vocab, analyzer, params, n_jobs)¶
- save(filename)¶
- train(train_x, train_y, params)¶
- class annif.lexical.mllm.Match(subject_id, is_pref, n_tokens, pos, ambiguity)¶
Bases:
tuple
- ambiguity¶
Alias for field number 4
- is_pref¶
Alias for field number 1
- n_tokens¶
Alias for field number 2
- pos¶
Alias for field number 3
- subject_id¶
Alias for field number 0
- class annif.lexical.mllm.ModelData(broader, narrower, related, collection, doc_freq, subj_freq, idf)¶
Bases:
tuple
- broader¶
Alias for field number 0
- collection¶
Alias for field number 3
- doc_freq¶
Alias for field number 4
- idf¶
Alias for field number 6
- narrower¶
Alias for field number 1
Alias for field number 2
- subj_freq¶
Alias for field number 5
- class annif.lexical.mllm.Term(subject_id, label, is_pref)¶
Bases:
tuple
- is_pref¶
Alias for field number 2
- label¶
Alias for field number 1
- subject_id¶
Alias for field number 0
- annif.lexical.mllm.candidates_to_features(candidates, mdata)¶
Convert a list of Candidates to a NumPy feature matrix
- annif.lexical.mllm.conflate_matches(matches, doc_length)¶
- annif.lexical.mllm.generate_candidates(text, analyzer, vectorizer, index)¶
annif.lexical.tokenset module¶
Index for fast matching of token sets.
- class annif.lexical.tokenset.TokenSet(tokens, subject_id=None, is_pref=False)¶
Bases:
object
Represents a set of tokens (expressed as integer token IDs) that can be matched with another set of tokens. A TokenSet can optionally be associated with a subject from the vocabulary.
- contains(other)¶
Returns True iff the tokens in the other TokenSet are all included within this TokenSet.
- class annif.lexical.tokenset.TokenSetIndex¶
Bases:
object
A searchable index of TokenSets (representing vocabulary terms)
- add(tset)¶
Add a TokenSet into this index
- search(tset)¶
Return the TokenSets that are contained in the given TokenSet. The matches are returned as a list of (TokenSet, ambiguity) pairs where ambiguity is an integer indicating the number of other TokenSets that also match the same tokens.
annif.lexical.util module¶
Utility methods for lexical algorithms
- annif.lexical.util.get_subject_labels(graph, uri, properties, language)¶
- annif.lexical.util.make_collection_matrix(graph, vocab)¶
- annif.lexical.util.make_relation_matrix(graph, vocab, property)¶