tommy.controller.corpus_controller

Classes

class tommy.controller.corpus_controller.CorpusController[source]

Bases: object

The corpus controller class is responsible for handling interactions with the corpus model.

__init__() None[source]

Initialize corpus controller and eventhandler for metadata

change_config_model_refs(corpus_model: CorpusModel) None[source]

Sets the reference to the corpus model :param corpus_model: The corpus model :return: None

corpus_version_id: int = -1
extract_and_store_metadata(input_folder_path: str) None[source]

Gets the metadata from all files in the directory specified by the project settings and stores it in the corpus model

Parameters:

input_folder_path – The new path to the input folder

Returns:

None

fileParsers: GenericFileImporter = <tommy.controller.file_import.generic_file_importer.GenericFileImporter object>
get_dictionary() Dictionary[source]

Get the dictionary corresponding to the bag-of-words representation of the pre-processed documents. It is only set after pre-processing has been completed.

Returns:

the dictionary of the pre-processed documents

get_metadata() list[Metadata][source]

Gets the metadata from all files in the corpus model. This method assumes that extract_and_store_metadata has already been called.

Returns:

The metadata of the files in the corpus

get_processed_corpus() ProcessedCorpus[source]

Get an iterable of the processed corpus. Only works after pre-processing has been completed.

Returns:

The pre-processed files and a reference to their metadata

get_raw_bodies() Generator[RawBody, None, None][source]

Get a generator that reads all the raw file contents from the input folder

Returns:

A generator for just the contents of the raw corpus,

but without the metadata

get_raw_files() Generator[RawFile, None, None][source]

Get a generator that reads all the raw file contents and their metadata from the input folder

Returns:

A generator of the raw corpus

metadata_available() bool[source]

Check if the metadata is available in the corpus model

Returns:

True if metadata is available, False otherwise

property metadata_changed_event: Metadata'>]]

This event gets triggered every time the metadata of the corpus is changed, so the UI can update itself to show the metadata :return:

on_input_folder_path_changed(input_folder_path: str) None[source]

Gets the metadata from all files in the directory specified by the project settings and stores it in the corpus model and triggers the metadata-changed-event

Parameters:

input_folder_path – The new path to the input folder

Returns:

None

preprocess_corpus() ProcessedCorpus[source]

Preprocessed the corpus and save it in the corpus model

set_controller_refs(project_settings_controller: ProjectSettingsController, preprocessing_controller: PreprocessingController) None[source]

Sets the reference to the project settings controller, and subscribes to the publisher of project settings :param project_settings_controller: the project settings controller :param preprocessing_controller: the preprocessing controller :return: None

set_dictionary(dictionary: Dictionary) None[source]

Set the dictionary corresponding to the bag-of-words representation of the pre-processed documents.

Parameters:

dictionary – corpora.Dictionary: the dictionary of the

pre-processed documents :return: None

set_model_refs(corpus_model: CorpusModel) None[source]

Sets the reference to the corpus model :param corpus_model: The corpus model :return: None