Intelligent Functions Library

Translation suggestion functions

Quality-checking functions

Basic checking functionality

Language Identification Endpoint: An endpoint that takes a string as an input and outputs the detected language. This could be useful for tasks where the language of the text isn’t known beforehand.

I have text in Yoruba, don’t know the code, but some endpoint requires an ISO-2 or -3 code
Get ISO-3 code for Yoruba
Allows LLMs to take natural language query “... Yoruba …” and retrieve ISO code
Not necessarily trivial - Armenian has 2 ISO codes for sub-dialects, etc.
Input = string
Output = list of 2-character ISO codes, and 3-character iso codes

BLEU Scoring Endpoint: An endpoint that accepts a reference translation and a candidate translation and returns a BLEU score (via NLTK).

Romanizing Text Endpoint: An endpoint that converts text from one script to another. This would be particularly useful for languages with non-Latin scripts. See Ulf’s library uroman.

Keyword List Retrieval Endpoint: An endpoint that can extract the list of key content words or names from the text for further alignment tasks.

FastAlign Endpoint: An endpoint that attempts to align two sentences using a simple, efficient model like FastAlign. This would likely require access to pre-trained models (do these exist? Do we need to make these?) or an ability to train models from provided datasets.

Pretrained models
Endpoints for specific language pairs/sets

Statistical Alignment Endpoint: An endpoint that takes two strings as an input and attempts to statistically align them. It can use models like IBM Model 1 to 4, HMM, etc. Are there tools in SIL Machine for this?

Input: parallel sentences
Function trains model
Output word-by-word alignment

Syntax-Aware Alignment Endpoint: An endpoint that aligns considering the syntax of the languages. We could leverage MACULA data for source-text syntax chunking.

Multilingual Alignment Endpoint: An endpoint that can take multiple sentences in different languages and align them all together. We would need to specify the alignment method, or this could be an LLM-powered tool.

Preprocessing before alignment: merged vrefs in one language, etc.

E-Bible endpoints: what services could we expose relevant to this data?

Endpoint to get multiple versions for a given verse or range
Resolve versification between versions (using ParaText “original” versification?)

Last updated 1 year ago

Intelligent Functions Library

Simpler tasks

More complex tasks