AI-Powered Assessment and Feedback: What's Working and What's Hype

By Dr. Janette Camacho | October 8, 2024

Walk the exhibit hall of any education technology conference in 2024 and you will encounter a pattern: nearly every assessment vendor has added "AI-powered" to their product description. Automated grading. Instant feedback. Predictive analytics. Adaptive questioning. The promises are bold, and in some cases they are legitimate. In others, they represent little more than a marketing veneer over traditional automation dressed in fashionable language.

After 28 years in the classroom and a year of intensive experimentation with AI assessment tools, I want to offer an honest evaluation of where this technology genuinely advances learning, where it falls short, and where the hype is actively misleading educators.

What Is Actually Working

1. Formative Feedback on Writing Drafts

This is the single most impactful application of AI in assessment that I have encountered. Tools like Writable (acquired by Houghton Mifflin Harcourt in 2023), Grammarly's education tier, and even raw ChatGPT-4 can provide substantive, specific feedback on student writing drafts within seconds.

The key word is "drafts." I am not talking about summative grading - the final evaluation of a finished product. I am talking about the iterative feedback loop during the writing process. When a student submits a rough draft of a persuasive essay at 9:00 AM and receives detailed commentary on argument structure, evidence integration, and transitional logic by 9:02 AM, they can revise meaningfully before the next class period. Previously, that student would wait three to five days for my handwritten feedback, by which point the cognitive context of the assignment had faded.

I have been using AI-generated formative feedback in my classroom since February 2024. My workflow: students submit drafts through a shared platform, I run them through an AI tool with a custom rubric prompt, and the AI generates criterion-specific feedback. I then review the AI feedback - this takes about 30 seconds per student compared to 8 to 12 minutes for full manual feedback - make corrections or additions where the AI missed something important, and return the annotated feedback to students.

The time savings are substantial: roughly 60% reduction in the feedback cycle. But the quality improvement matters more than the efficiency gain. Students now receive feedback on every draft, not just the final submission. The increase in revision quality has been visible and measurable.

2. Diagnostic Item Analysis

Several platforms - notably Edulastic and Illuminate DNA - now use AI to analyze patterns in student assessment responses and flag specific misconceptions rather than simply reporting percentage correct. When 40% of my students select answer choice C on a particular question, AI can identify that this specific wrong answer reflects a confusion between area and perimeter rather than a random error, and suggest targeted reteaching activities.

This kind of analysis was technically possible before AI, but it required statistical expertise that most classroom teachers do not possess and do not have time to acquire. AI democratizes item analysis, making the insights accessible through natural language summaries rather than statistical tables.

3. Adaptive Practice and Spaced Repetition

AI-driven adaptive learning platforms have matured significantly in the past year. Khan Academy's Khanmigo, IXL's SmartScore system, and DreamBox all use AI to adjust problem difficulty and sequence based on student performance in real time. The educational research base for adaptive practice is strong - a 2024 meta-analysis published in Educational Research Review found moderate positive effects (d = 0.37) for AI-adaptive platforms compared to non-adaptive digital practice.

What makes these tools work is the underlying learning science, not the AI itself. Spaced repetition, interleaving, and desirable difficulty are well-established principles. AI simply implements them at a scale and speed that no human instructor could manage for 150 individual students.

What Is Not Working (Yet)

1. Automated Essay Scoring for Summative Purposes

Despite decades of development - automated essay scoring dates back to the 1960s with Ellis Page's PEG system - AI still cannot reliably score complex writing for high-stakes purposes. The fundamental problem is that current AI evaluates surface features (vocabulary sophistication, sentence length variation, structural markers) more effectively than it evaluates depth of thought, originality of argument, or authenticity of voice.

I tested three AI scoring platforms against my own scoring of 60 student essays using a six-trait rubric. Agreement with my scores was highest for conventions (87%) and organization (79%), and lowest for voice (52%) and ideas (61%). The traits that matter most for meaningful writing - the quality of thinking and the distinctiveness of expression - are precisely where AI scoring is weakest.

For formative purposes, this level of accuracy is acceptable because the feedback initiates a revision process rather than rendering a final judgment. For summative purposes, I consider it insufficient, and I am concerned about districts that adopt AI scoring to reduce labor costs without acknowledging the validity trade-off.

2. Automated Grading of Complex Performance Tasks

Project-based assessments, lab reports with original experimental design, creative portfolios - these resist AI grading because they require evaluating the integration of multiple competencies in context. An AI can assess whether a lab report includes a hypothesis, methodology, results, and conclusion. It cannot assess whether the student's experimental design actually tests the stated hypothesis in a logically valid way, or whether the student's interpretation of anomalous data demonstrates genuine scientific reasoning.

Performance tasks are, almost by definition, the assessments where human judgment is most essential. AI can assist - flagging structural gaps, checking citation formatting, identifying potential plagiarism - but the evaluative core remains a human responsibility.

3. AI Detection Tools

I include these in the "not working" category because they have been marketed as assessment tools - specifically, tools that assess academic integrity. Turnitin's AI detection feature, GPTZero, and similar platforms have been deployed widely in 2024, and the results have been troubling.

Multiple studies, including a widely cited July 2024 analysis from Stanford's Graduate School of Education, have found that AI detection tools exhibit significantly higher false positive rates for non-native English speakers. The tools tend to flag writing that uses simpler sentence structures and more common vocabulary - precisely the characteristics of competent writing by English Language Learners. Using these tools for enforcement decisions creates an equity problem that I consider unacceptable.

I tell teachers in my workshops: use AI detection results as a conversation starter, never as a conviction. If you suspect a student has submitted AI-generated work, the pedagogical response is an oral defense or a revision conference, not an algorithm's probability score.

The Hype to Watch Out For

"AI Will Replace Grading"

No, it will not. It will change grading - shifting teacher time from marking mechanics to evaluating thinking, from scoring routine assignments to designing more meaningful assessments. But the replacement narrative is both inaccurate and corrosive. It devalues the professional judgment that is the core of assessment literacy, and it feeds an administrative fantasy that class sizes can increase because AI is "doing the grading."

"Personalized Learning at Scale"

This phrase appears in virtually every EdTech pitch deck. In its strongest form - every student receiving a truly individualized learning pathway adapted in real time to their needs, interests, and goals - it remains aspirational rather than operational. What currently exists is adaptive practice within constrained domains (mathematics, vocabulary acquisition, factual recall). The AI cannot yet personalize a Socratic seminar, a collaborative inquiry project, or a community-based learning experience.

"Predictive Analytics Will Identify At-Risk Students"

Early warning systems powered by AI are real and increasingly common. But the prediction is only as good as the intervention it triggers. A dashboard that flags a student as "75% likely to fail" is worse than useless if no counselor, mentor, or support system responds to that flag. I have seen schools invest heavily in predictive analytics while cutting the human support staff who would act on the predictions. The technology without the infrastructure is surveillance, not support.

What I Recommend

For teachers entering the 2024-2025 school year, my recommendation is focused pragmatism. Adopt AI for formative feedback on writing and for adaptive practice in skill-based domains. Use AI item analysis to inform your reteaching. Resist pressure to use AI for summative scoring of complex work. Advocate loudly against AI detection tools being used as sole evidence of academic dishonesty.

And above all, remember that assessment is a relationship. The most powerful feedback I have ever given a student was not the most analytically precise - it was the comment that landed at the right moment, in the right tone, addressing the specific struggle that student was facing that week. AI can generate the words. It cannot read the room.

Dr. Janette Camacho is a Google for Education Certified Trainer & Coach, Google Certified Educator Level 1 & 2, Adobe Creative Educator, Apple Teacher, and FETC 2024 Featured Presenter with 28+ years of K-12 classroom experience. She is the founder of iTeachAI.