Hallucination Detection and Evaluation of Large Language Models
Published in arXiv preprint, 2025
Built a KnowHalu-style hallucination evaluation pipeline: Replaced KnowHalu’s judge with HHEM to accelerate evaluation (10-minute judging) while maintaining QA detection performance (76.9% accuracy); explored a non-fabrication-checking variant reaching 82.2% accuracy. Improved summarization hallucination detection via segment-level verification.
Advisor: Prof. Yuan Tian (UCLA).
Recommended citation: Hallucination Detection and Evaluation of Large Language Models. arXiv:2512.22416, 2025.
Download Paper