Skip to main content
Version: 0.0.22

Functions

Functions and their descriptions are listed below.

1. Text preprocessing functions

from etnltk.lang.am import preprocessing
FunctionDescription
remove_whitespacesRemove extra spaces, tabs, and new lines from a text string
remove_linksRemove URLs from a text string
remove_tagsRemove HTML tags from a text string
remove_emojisRemove emojis from a text string
remove_emailRemove email adresses from a text string
remove_digitsRemove all digits from a text string
remove_english_charsRemove ascii characters from a text string
remove_arabic_charsRemove arabic characters and numerals from a text string
remove_chinese_charsRemove chinese characters from a text string
remove_ethiopic_digitsRemove all ethiopic digits from a text string
remove_ethiopic_punctRemove ethiopic punctuations from a text string
remove_non_ethiopicRemove non ethioipc characters from a text string
remove_stopwordsRemove stop words

2. Text normalization functions

from etnltk.lang.am.normalizer import ( 
normalize_labialized,
normalize_shortened,
normalize_punct,
normalize_char
)
FunctionDescription
normalize_labialized (e.g., ሞልቱዋል -> ሞልቷል)labialized character normalization
normalize_shortened (e.g., አ.አ -> አዲስ አበባ)short form expansions
normalize_punct (e.g., :: -> ።)punctuation normalization
normalize_char (e.g., ጸሀይ -> ፀሐይ)character levels normalization

3. Text tokenization functions

from etnltk.tokenize.am import (
sent_tokenize,
word_tokenize
)
FunctionDescription
sent_tokenizesplits the raw input text into sentences
word_tokenizesplits the raw input text into word