Simulated GAN evaluation

"Simulated GAN Evaluation" is a proposed method for assessing the quality of AI-generated translations. The process begins by prompting for a translation using 'n' examples and a directive to complete the translation pairs (see example). The entire prompt, along with the AI's completion, is then passed to the evaluator. The evaluator's task is to determine which translation was generated by the AI. This may (but hopefully it wouldn't) require fine-tuning a discriminator model on the gold-standard pairs.

The concept, inspired by Generative Adversarial Networks (GANs), involves an evaluator attempting to distinguish AI-drafted translations from a list of real translations. While it doesn't function exactly like a GAN, the idea is to use a similar principle for evaluation.

The method could potentially be used to refine the prompts or the models themselves, by replacing the evaluator with a separate AI. Initial implementation will use LLMs to gauge the viability of the concept. If successful, this approach could be extended to train GANs for specific languages.

From Ben: Idea: Have the agent pick the machine translation from either the list of reordered transitions or from two translation the real one and a machine translation. But also have it detail it’s rational. You can then use the rational from the times it gets it correct to improve the the next iteration in the loop

Last updated