Publication: Designing life science assessments in the era of generative artificial intelligence

Apr 6
2 min read

I am excited to share a new paper published in PLOS ONE from the education side of the lab.

There is a lot of anxiety that genAI tools will jeopardize learning gains from homework assignments. Using Bloom's Taxonomy, we analyzed where genAI succeeds and where it struggles.

We compared the performance of ChatGPT-4o and Harvard PhD students in designing molecular biology experiments. Initially, we predicted that as you move higher in Bloom's Taxonomy, ChatGPT would perform worse. However, that is not what we observed.

‪The main difference we observed in performance was at the "apply" level of Bloom's. While students (mean = 87%) performed two letter grades better than ChatGPT (mean = 67%) on experimental design worksheets, all of that difference can be explained by a difference in performance at the “apply” level. We posit that apply-level questions require multi-step, compositional reasoning, which large language models are not well-suited for. Therefore, this finding is likely broadly applicable across LLMs.

‪At the "create" level, genAI consistently generated reasonable hypotheses for given scenarios. However, the hypotheses often lacked the specificity and testability we would expect of PhD-level work.

‪We also observed that ChatGPT-4o struggled to interpret data presented graphically. Anecdotally, ChatGPT-5.4-Thinking has dramatically improved at data interpretation. This is one of many examples emphasizing how educators must be mindful of the rapidly developing genAI landscape.

‪We offer a few suggestions to educators. One that I've used in my teaching is increasing my expectation for precision in students' answers. GenAI is great at producing reasonable-sounding answers, but precision can be lacking. Use rubrics that reward careful justification for answers.

‪The class we describe in the paper uses "chalk talks" as a key assessment. Oral exams are valuable assessment tools that maximize student ownership of their answers. The class is also an example of a "flipped classroom," which gives frequent opportunities for formative assessments.

‪All in all, many examples of "good" pedagogy (e.g., active learning, flipped classrooms, universal design) are also beneficial for maintaining learning gains in the era of genAI.