Leveraging AI for competency assessments

Stefanos Orfanos; Guy-Marcel Kopoin

doi:10.17161/cberj.v3no2.24319

Authors

Stefanos Orfanos Georgia State University Author
Guy-Marcel Kopoin Georgia State University Author

DOI:

https://doi.org/10.17161/cberj.v3no2.24319

Keywords:

actuarial education, artificial intelligence, competency-based assessment, generalizability theory, reliability tests, validity tests

Abstract

Strong business skills—such as communication, professional judgment, and stakeholder management—have become a key differentiator for actuarial trainees entering the workplace and are correlated with future success. While case studies have historically been proven effective at developing these skills, existing resources are limited and typically structured as multi-week team projects that are difficult to scale, individualize, or align with specific competencies. To address this gap, this paper examines whether AI models can (i) efficiently transform a small set of comprehensive actuarial cases into many brief, single-competency, individual assessments; and (ii) score these assessments with adequate psychometric quality. Using 144 AI-generated assessments covering the Society of Actuaries’ eight core competencies, we achieve strong reliability (G=0.719, 0.740) with optimized three- and four-grader panels, respectively, selected through Generalizability Theory analysis. Our experiments reveal that iterative prompt refinement improves assessment quality, with later prompts outperforming initial versions and representing a medium-sized effect. However, we document critical challenges: all AI graders exhibit in-group bias, systematically favoring assessments generated by their own model family despite anonymization. Additionally, graders may engage in algorithmic gaming, producing low entropy scoring patterns with strong halo effects that bear no relationship to actual assessment quality. The exclusion of unreliable graders from a model family partially explains the apparent underperformance of assessments from that same family, illustrating how grader selection can inadvertently create bias. We propose a hybrid approach combining carefully selected AI grader panels with human moderators to address these documented biases while leveraging the efficiency gains of automated assessment.

Downloads

Download data is not yet available.

References

Alt, D., Naamati-Schneider, L., & Weishut, D. J. N. (2023). Competency-based learning and formative assessment feedback as precursors of college students' soft skills acquisition. Studies in Higher Education, 48(12), 1901–1917. https://doi.org/10.1080/03075079.2023.2217203

Attali, Y., & Burstein, J. (2006). Automated Essay Scoring With e-rater® V.2, Journal of Technology, Learning, and Assessment, 4(3).

Baek, C., Tate, T., & Warschauer, M. (2024). “ChatGPT seems too good to be true”: College students’ use and perceptions of generative AI. Computers & Education: AI, 7, 100294. https://doi.org/10.1016/j.caeai.2024.100294

Baker, R. S., & Hawn, A. (2022). Algorithmic bias in education. International Journal of Artificial Intelligence in Education, 32(4), 1052–1092. https://doi.org/10.1007/s40593-021-00285-9

Bloch, R., & Norman, G. (2012). Generalizability theory for the perplexed: A practical introduction and guide: AMEE Guide No. 68. Medical Teacher, 34(11), 960-992. https://doi.org/10.3109/0142159X.2012.703791

Brennan, R. L. (2001). Generalizability theory. Springer-Verlag. https://doi.org/10.1007/978-1-4757-3456-0

Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. Wiley.

Donkin, R., Yule, H. & Fyfe, T. (2023). Online case-based learning in medical education: A scoping review. BMC Medical Education, 23, 564. https://doi.org/10.1186/s12909-023-04520-w

Gervais, J. (2016). The operational definition of competency-based education. Journal of Competency-Based Education, 1(2), 98-106. https://doi.org/10.1002/cbe2.1011

Grévisse, C. (2024). LLM-based automatic short answer grading in undergraduate medical education. BMC Med Educ. 24(1):1060. https://doi.org/10.1186/s12909-024-06026-5

Idowu, J. A. (2024). Debiasing education algorithms. International Journal of Artificial Intelligence in Education, 34(1), 1510–1540. https://doi.org/10.1007/s40593-023-00389-4

Kane, M. T. (2013). Validating the Interpretations and Uses of Test Scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000

Khosravi, H., Buckingham Shum, S., Chen, G., Conati, C., Tsai, Y.-S., Kay, J., Knight, S., Martinez-Maldonado, R., Sadiq, S., & Gašević, D. (2022). Explainable Artificial Intelligence in education. Computers & Education: AI, 3, 100074. https://doi.org/10.1016/j.caeai.2022.100074

Kolb, D. A. (1984). Experiential learning: Experience as the source of learning and development. Prentice-Hall.

McMullen, J. E., Arakawa, N., Anderson, C., Pattison, L., & McGrath, S. (2023). A systematic review of contemporary competency-based education and training for pharmacy practitioners and students. Research in Social and Administrative Pharmacy, 19(2), 192–217. https://doi.org/10.1016/j.sapharm.2022.09.013

Merseth, K. K. (1991). The early history of case-based instruction: Insights for teacher education today. Journal of Teacher Education, 42(4), 243-249. https://doi.org/10.1177/002248719104200402

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. https://doi.org/10.1037/0003-066X.50.9.741

Park, J., & Choo, S. (2024). Generative AI prompt engineering for educators: Practical strategies. Journal of Special Education Technology, 41(2), 156–173. https://doi.org/10.1177/01626434241298954

Parker, K. R., & Chiang, C.-T. (2024). Modular design of teaching cases: Reducing workload while maximizing reusability. Communications of the Association for Information Systems, 54, 232–260. https://doi.org/10.17705/1CAIS.05409

Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Sage Publications.

Shermis, M. D., & Burstein, J. (Eds.). (2013). Handbook of automated essay evaluation: Current applications and new directions. Routledge.

Society of Actuaries. (2022). Competency framework for actuaries. Retrieved from https://www.soa.org/professional-development/competency-framework/

Stevens, J. P. (2009). Applied multivariate statistics for the social sciences (5th ed.). Routledge..

Swiecki, Z., Khosravi, H., Chen, G., Martinez-Maldonado, R., Lodge, J., Milligan, S., Selwyn, N., & Gašević, D. (2022). Assessment in the age of artificial intelligence. Computers & Education: AI, 3, 100075. https://doi.org/10.1016/j.caeai.2022.100075

Tabachnick, B. G., & Fidell, L. S. (2019). Using multivariate statistics (7th ed.). Pearson.

Thistlethwaite, J. E., Davies, D., Ekeocha, S., Kidd, J. M., MacDougall, C., Matthews, P., Purkis, J., & Clay, D. (2012). The effectiveness of case-based learning in health professional education. A BEME systematic review: BEME Guide No. 23. Medical Teacher, 34(6), e421-e444. https://doi.org/10.3109/0142159X.2012.680939

Walter, Y. (2024). Embracing the future of artificial intelligence in the classroom: The relevance of AI literacy, prompt engineering, and critical thinking in modern education. International Journal of Educational Technology in Higher Education, 21, 15. https://doi.org/10.1186/s41239-024-00448-3

Williamson, D. M., Mislevy, R. J., & Bejar, I. I. (Eds.). (2006). Automated Scoring of Complex Tasks in Computer-Based Testing. Routledge. https://doi.org/10.4324/9780415963572

Wu, F., Wang, T., Yin, D., Xu, X., Jin, C., Mu, N., & Tan, Q. (2023). Application of case-based learning in psychology teaching: A meta-analysis. BMC Medical Education, 23(1), Article 609. https://doi.org/10.1186/s12909-023-04525-5

Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education – where are the educators? International Journal of Educational Technology in Higher Education, 16, 39. https://doi.org/10.1186/s41239-019-0171-0

Leveraging AI for competency assessments

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Information

Language