Assessment redesign for generative AI: A taxonomy of options and their viability
with Sarah Howard and Professor Jaclyn Broadbent
Since the seemingly sudden emergence of ChatGPT at the end of 2022, there has been significant debate surrounding the impact of text-based generative AI in education. Many jurisdictions initially attempted to ban access to these tools, citing concerns that students would use them to cheat on assessments. Text-based generative AI tools can generate plausible artefacts that pass evaluations without any actual learning. All students need to do is provide a suitable prompt.
In higher education, extensive research has been conducted on cheating, including the prevalence, methods, and motivations behind it. It's important to note that many, if not most, students genuinely want to learn and avoid cheating. However, there will always be a proportion of students who look for shortcuts. Forensic psychology reveals that cheating often stems from a combination of motive, means, and opportunity With the introduction of ChatGPT and other generative AI, the opportunity and, particularly, the means to cheat have become omnipresent, while the level of risk and effort required to cheat has significantly decreased. Opportunities to exploit generative AI to take shortcuts are now ubiquitous.
Therefore, it's crucial to explore various strategies for modifying assessment tasks. Some of these approaches will address the motives behind cheating, while others will focus on eliminating the opportunity. By offering a taxonomy of assessment redesign options, our goal is to lay the groundwork for further discussion on the potential and feasibility of these methods. In doing so, we hope to shift the conversation beyond simply banning or policing new technology (i.e. focussing only on the means), and towards more constructive and innovative solutions.
Through conversations with and the work of learned colleagues (thanks in particular to Kelly Matthews , Dominic McGrath , Christine Slade , Amy Hubbell , Jacques-Olivier Perche and Danny Liu ), we have distilled the options on the table into six categories. They are:
We provide here a brief overview of each of these categories recognising that there are complexities associated with each of them. We will unpack these complexities elsewhere (watch this space). The purpose of this article is to provide a foundation to stimulate further conversation.
Ignore
The first option is to simply ignore this development and hope it goes away. Some colleagues believe that generative AI will not significantly impact education. They argue that, given the long history of hype surrounding various educational technologies such as tablet computers, electronic whiteboards, and massive open online courses, generative AI may not have a lasting impact. However, this approach seems unlikely to be viable in the long term. By most accounts, generative AI is poised to have a significant impact on education. It seems as though 'this time is different'.
Ban
The second option on the table is to attempt to ban these technologies. As previously discussed, banning was a common initial reaction to the introduction of ChatGPT. Students almost immediately found ways to bypass these bans. Setting aside debates about the validity and reliability of various policing methods, there are already numerous YouTube channels and popular websites that teach students how to cheat effectively and avoid AI detection tools. Generative AI will also feature in core productivity applications such as word processing, slideware, and spreadsheets. As a result, attempting to ban ChatGPT and similar large language model-based tools seems futile, especially in the medium and long term. Imagine trying to ban spellcheck or autocomplete for a corollary.
Invigilate
The third option is to design assessments that circumvent the use of AI. One obvious approach is to revert back to traditional exam settings where students are monitored as they produce written artefacts (as was the immediate response in many jurisdictions and institutions). While this may be necessary for certain subjects and contexts, it is unlikely to be a widespread solution. Written exams have their place but they should not be the default assessment approach in all circumstances. Other alternatives include oral examinations and ongoing reflection activities. Although these methods can help ensure that students genuinely learn the material, they are not infallible and must be designed well, be appropriate for the specifics of the context, and implemented fairly to be effective.
Embrace
The fourth option is to embrace generative AI in assessments. This could range from allowing or requiring students to use AI in specific tasks to having them critique, update, or assess AI-generated artefacts themselves. A wide range of options is emerging ( Ethan Mollick is worth following to keep updated on what is happening in this space). As generative AI is likely to increasingly impact how people work and live, it seems as though it will be important to embrace these tools in the classroom over the medium to long term. While there are opportunities to embrace AI, there are also concerns surrounding ethics, fairness, and equity, particularly in terms of privacy, access to advanced AI technologies, and the varying abilities of students to use them effectively.
Design around
The fifth option is to design assessments around the limitations of generative AI. This approach involves exploiting the weaknesses of AI technologies. I (JL) have described previously how the tasks I have been assigning my students do so. However, as AI becomes increasingly sophisticated, this strategy is likely to become riskier and less effective. The introduction of GPT-4 made substantial improvements to the plausibility and accuracy of responses to prompts that GPT-3.5 seemed to struggle with. This improvement was clearly evident when the same assessment tasks were tested using the newer model. These models are only going to improve from here, as will the capability of students to prompt them. As such, while designing around generative AI seemed like a promising approach early on, that promise has evaporated and this is unlikely to be a viable option in the short to medium term.
Rethink
Lastly, is the option to rethink assessment entirely This challenging approach requires asking how and why students are assessed in the first place. If assessments feel like chores and do not encourage creativity or inspire actual learning, or there is substantial time pressure to complete tasks, there is increased motivation to cut corners. Further, if assessment tasks are not designed to align with the developmental process that is learning and continue to view this process through snapshots provided by the production of artefacts, the methods of assessment need a rethink. None of this will be straightforward but is increasingly necessary, if it wasn't already This is perhaps the set of options that will require emphasising over the medium to long term.
Viability into the future
Based on this taxonomy, we provide here our sense of the viability (represented by traffic light colouration - red - likely not viable, orange - care needed, and green - seems most viable) of the six types of assessment redesign responses to generative AI over the short, medium, and long term:
We fully recognise that there is much complexity associated with any of the options we describe here and that the situation is constantly evolving. Ultimately, a combination of these options may be necessary to address the challenges posed by generative AI in education. The successful rethinking of assessment approaches for the age of AI is also likely to require the combined and coordinated efforts of researchers, teachers, design professionals, learning technologists, and policymakers.
When exploring these assessment redesign options, it will also become increasingly critical to consider how learning as a developmental process occurs and what motivates students to seek to take shortcuts through this process. We hope that the taxonomy we offer here will be helpful to continue to make sense of the developments in generative AI and consider how assessment design can be adapted accordingly.
Acknowledgement: ChatGPT was used to edit this article for clarity
Entrepreneur, researcher, and technology commercialization expert. Doctorate in Business Economics. Ph.D. in Business Information Systems.
2moExcellent idea, Jason! The question is how to quantify results in any of 6 approaches. #PROFITomix
Senior Lecturer & Non-Executive Director - Data, Governance, Risk & Strategy | AI | Business Information Systems | Equity & Inclusion
4moThis is so helpful Jason M. Lodge., thank you.
Researcher and Teacher Educator at UCLL
1yThis is truly inspiring! It makes me want to dive right in and start redesigning the curriculum.
Global Learning Business Partnering Lead - Assurance at EY
1yHi Jason - thank you for laying this out so clearly (with the help of ChatGPT I note!). I've been through a similar thought process myself, although your article explores a broader range of options than I had considered! My conclusions were similar, in that the use of Gen AI will become another tool that we expect learners to use (i.e. we need to embrace it in certain circumstances), but also that we do need look for new approaches for assessments, as we still need to ensure learners have the ability to know when they need to ask a question of such tools, know what question to ask, know how to evaluate the output, etc., in addition to performing tasks that can't be supported by AI (yet). In some cases rethinking and using approaches that are more learner-centric and more effective may help reduce the motive to "cheat" - and if 90% of people behave appropriately, that may be sufficient for us to claim success. In other cases - for example where there are compliance/regulatory implications - 90% isn't sufficient, and so our challenge is somewhat greater! However, I do feel this is the right direction to go, so look forward to hearing ideas on how to address the issue.
Assessment Academic at large (seeking work)
1yJason, I agree with the idea of a spectrum - very much horses for courses. Generative AI are no doubt going the throw a cat into the pigeon coop of education as well as many other domains as well. For this reason I have been advocating that a AI literacy framework should be developed to help guide people (staff, students and community) as to the scope and nature what is required to adapt to the new reality. See "AI Literacy framework for HE" for a proposal https://teche.mq.edu.au/2023/03/a-proposed-ai-literacy-framework/