tommy.controller.preprocessing_controller

Classes

class tommy.controller.preprocessing_controller.PreprocessingController[source]

Bases: object

A class that can preprocess text using the Dutch SpaCy pipeline.

apply_synonyms(tokens: list[str]) list[str][source]

Applies synonyms to the given list of tokens.

Parameters:

tokens – The list of tokens

Returns:

The list of tokens where tokens are mapped to their synonyms

filter_stopwords(tokens: list[str]) list[str][source]

Removes all stopwords from the given list of tokens.

Parameters:

tokens – The list of tokens

Returns:

The list of tokens without stopwords

load_pipeline(language: SupportedLanguage) None[source]
process_text(text: str) list[str][source]

Preprocesses the given text to a list of tokens.

process_tokens(doc: Doc) list[str][source]

Processes the tokens given by the SpaCy pipeline.

Parameters:

doc – The tokens given by processing of the Dutch SpaCy pipeline

Return list[str]:

The processed tokens

set_controller_refs(language_controller: LanguageController)[source]

Set the reference to the language controller

set_model_refs(stopwords_model: StopwordsModel, synonyms_model: SynonymsModel) None[source]
split_into_sentences(text: str) list[str][source]

Split the given text to a list of sentences.