annif.lexical package¶

Submodules¶

annif.lexical.mllm module¶

MLLM (Maui-like Lexical Matchin) model for Annif

class annif.lexical.mllm.Candidate(doc_length, subject_id, freq, is_pref, n_tokens, ambiguity, first_occ, last_occ, spread)¶

Bases: tuple

ambiguity¶: Alias for field number 5

doc_length¶: Alias for field number 0

first_occ¶: Alias for field number 6

freq¶: Alias for field number 2

is_pref¶: Alias for field number 3

last_occ¶: Alias for field number 7

n_tokens¶: Alias for field number 4

spread¶: Alias for field number 8

subject_id¶: Alias for field number 1

class annif.lexical.mllm.Feature(value)¶

Bases: enum.IntEnum

An enumeration.

ambiguity = 6¶

broader = 11¶

collection = 14¶

doc_freq = 1¶

doc_length = 10¶

first_occ = 7¶

freq = 0¶

is_pref = 4¶

last_occ = 8¶

n_tokens = 5¶

narrower = 12¶

related = 13¶

spread = 9¶

subj_freq = 2¶

tfidf = 3¶

class annif.lexical.mllm.MLLMCandidateGenerator¶

Bases: annif.parallel.BaseWorker

classmethod generate_candidates(doc_subject_ids, text)¶

class annif.lexical.mllm.MLLMFeatureConverter¶

Bases: annif.parallel.BaseWorker

classmethod candidates_to_features(candidates)¶

class annif.lexical.mllm.MLLMModel¶

Bases: object

Maui-like Lexical Matching model

generate_candidates(text, analyzer)¶

static load(filename)¶

predict(candidates)¶

prepare_train(corpus, vocab, analyzer, params, n_jobs)¶

save(filename)¶

train(train_x, train_y, params)¶

class annif.lexical.mllm.Match(subject_id, is_pref, n_tokens, pos, ambiguity)¶

Bases: tuple

ambiguity¶: Alias for field number 4

is_pref¶: Alias for field number 1

n_tokens¶: Alias for field number 2

pos¶: Alias for field number 3

subject_id¶: Alias for field number 0

class annif.lexical.mllm.ModelData(broader, narrower, related, collection, doc_freq, subj_freq, idf)¶

Bases: tuple

broader¶: Alias for field number 0

collection¶: Alias for field number 3

doc_freq¶: Alias for field number 4

idf¶: Alias for field number 6

narrower¶: Alias for field number 1

related¶: Alias for field number 2

subj_freq¶: Alias for field number 5

class annif.lexical.mllm.Term(subject_id, label, is_pref)¶

Bases: tuple

is_pref¶: Alias for field number 2

label¶: Alias for field number 1

subject_id¶: Alias for field number 0

annif.lexical.mllm.candidates_to_features(candidates, mdata)¶: Convert a list of Candidates to a NumPy feature matrix

annif.lexical.mllm.conflate_matches(matches, doc_length)¶

annif.lexical.mllm.generate_candidates(text, analyzer, vectorizer, index)¶

annif.lexical.tokenset module¶

Index for fast matching of token sets.

class annif.lexical.tokenset.TokenSet(tokens, subject_id=None, is_pref=False)¶

Bases: object

Represents a set of tokens (expressed as integer token IDs) that can be matched with another set of tokens. A TokenSet can optionally be associated with a subject from the vocabulary.

contains(other)¶: Returns True iff the tokens in the other TokenSet are all included within this TokenSet.

class annif.lexical.tokenset.TokenSetIndex¶

Bases: object

A searchable index of TokenSets (representing vocabulary terms)

add(tset)¶: Add a TokenSet into this index

search(tset)¶: Return the TokenSets that are contained in the given TokenSet. The matches are returned as a list of (TokenSet, ambiguity) pairs where ambiguity is an integer indicating the number of other TokenSets that also match the same tokens.

annif.lexical.util module¶

Utility methods for lexical algorithms

annif.lexical.util.get_subject_labels(graph, uri, properties, language)¶

annif.lexical.util.make_collection_matrix(graph, vocab)¶

annif.lexical.util.make_relation_matrix(graph, vocab, property)¶

annif.lexical package¶

Submodules¶

annif.lexical.mllm module¶

annif.lexical.tokenset module¶

annif.lexical.util module¶

Module contents¶