About The Role
Join Rex.zone to evaluate and improve AI system outputs using structured human-in-the-loop feedback. You will support RLHF workflows, prompt evaluation, and QA evaluation for large language models, ensuring training data quality and measurable model performance improvement across domains and languages.
What You Will Do
- Design and run LLM evaluation tasks (pairwise ranking, rubric-based grading, error categorization)
- Support RLHF-style preference data collection and analysis
- Audit data labeling outputs and QA evaluation results for annotation guidelines compliance
- Perform prompt evaluation and regression testing across model versions
- Track quality metrics and reviewer calibration using spreadsheets and/or scripting
- Document failure modes and coordinate with stakeholders to resolve systematic issues
Core Workflows
- LLM training pipelines and evaluation h...