tommy.model.corpus_model.CorpusModel
- class tommy.model.corpus_model.CorpusModel(derive_from: CorpusModel = None)[source]
Bases:
object
CorpusModel stores the data about the documents in the input folder. It is only accessible through the CorpusController class. The raw corpus data is not stored as it wouldn’t fit in memory. The processed corpus is stored in the ProcessedCorpus class.
- __init__(derive_from: CorpusModel = None)[source]
Initialize the corpus model and create an empty instance of the ProcessedCorpus so files can be added to the processed corpus after pre-processing
- dictionary: Dictionary = None
- processed_corpus: ProcessedCorpus