tommy.controller.preprocessing_controller.PreprocessingController
- class tommy.controller.preprocessing_controller.PreprocessingController[source]
Bases:
object
A class that can preprocess text using the Dutch SpaCy pipeline.
- apply_synonyms(tokens: list[str]) list[str] [source]
Applies synonyms to the given list of tokens.
- Parameters:
tokens – The list of tokens
- Returns:
The list of tokens where tokens are mapped to their synonyms
- filter_stopwords(tokens: list[str]) list[str] [source]
Removes all stopwords from the given list of tokens.
- Parameters:
tokens – The list of tokens
- Returns:
The list of tokens without stopwords
- load_pipeline(language: SupportedLanguage) None [source]
- process_tokens(doc: Doc) list[str] [source]
Processes the tokens given by the SpaCy pipeline.
- Parameters:
doc – The tokens given by processing of the Dutch SpaCy pipeline
- Return list[str]:
The processed tokens
- set_controller_refs(language_controller: LanguageController)[source]
Set the reference to the language controller
- set_model_refs(stopwords_model: StopwordsModel, synonyms_model: SynonymsModel) None [source]