tommy.model.corpus_model

Classes

class tommy.model.corpus_model.CorpusModel(derive_from: CorpusModel = None)[source]

Bases: object

CorpusModel stores the data about the documents in the input folder. It is only accessible through the CorpusController class. The raw corpus data is not stored as it wouldn’t fit in memory. The processed corpus is stored in the ProcessedCorpus class.

__init__(derive_from: CorpusModel = None)[source]

Initialize the corpus model and create an empty instance of the ProcessedCorpus so files can be added to the processed corpus after pre-processing

dictionary: Dictionary = None
metadata: list[Metadata] = None
processed_corpus: ProcessedCorpus