BYLINE: Joshua Shay Kricheli, Visiting AI Researcher, Learning Sciences Lab, ICT and Computer Science Ph.D. Student at Syracuse University, New York
Artificial Intelligence today sits at a crossroads between dazzling adaptability and pressing questions of trust. Large Language Models (LLMs) and their agentic frameworks have proven remarkably effective at generating human-like text, responding to new prompts, and flexibly adapting to varied tasks. Yet as they step into high-stakes domains such as education, reliability becomes not just a desirable feature, but a non-negotiable requirement. A system that adapts beautifully but produces errors or inconsistencies risks undermining both the learner’s confidence and the educator’s credibility.
My Experience at USC ICT: Learning, Adapting, and Facing Limitations
During my time as a Visiting Researcher in the Learning Sciences group at the Institute for Creative Technologies (ICT), University of Southern California, I had the opportunity to confront this tension directly. My project involved exploring Microsoft’s AutoGen framework, an open-source library that enables the design of multi-agent LLM systems. In AutoGen, different AI “agents” can be assigned distinct roles—one might serve as a content generator, another as a reviewer, and a third as a moderator or fact-checker. By simulating collaboration, these systems promise greater adaptability than a single model working in isolation.
The framework is compelling, but in practice it reveals familiar cracks. When tasked with generating instructional content for different learner levels, AutoGen agents often succeeded in adapting tone and complexity. Yet when I probed for consistency—such as asking multiple agents to agree on a standardized quiz format or to ensure that a checklist contained all required steps—the cracks widened. Lists sometimes contained duplicates. Explanations occasionally contradicted one another. The same concept, expressed differently by two agents, could confuse rather than clarify.
This inconsistency is not surprising. LLMs generate responses based on statistical patterns, not logical guarantees. Even when multiple agents interact, they are bound by the same probabilistic underpinnings. If one agent generates an error, another may overlook it—or worse, reinforce it. Without a governing structure to enforce rules or validate outcomes, adaptability quickly erodes into unpredictability.
This realization underscored for me why my doctoral work in NeuroSymbolic AI is so relevant to applied problems in education technology. NeuroSymbolic methods combine the strengths of neural networks—pattern recognition, adaptability, fluency—with the rigor of symbolic logic systems. Symbolic reasoning provides explicit rules, formal structures, and guarantees of consistency that statistical models alone cannot deliver. Neural components can “propose” or “improvise,” but symbolic modules “check” and “constrain,” ensuring that outputs remain within well-defined bounds.
A concrete example from my ICT experiments illustrates this. Suppose the task is to generate a multi-step instruction set for a science experiment. A neural agent might produce the following:
- Gather your materials.
- Mix the solution.
- Record your results.
- Dispose of the chemicals.
On the surface, this looks fine. But a closer inspection reveals a missing step: calibrating the measuring instrument. For a learner unfamiliar with laboratory processes, this omission is not trivial—it undermines the accuracy of the entire experiment. A symbolic module, however, could flag this error by cross-referencing the generated checklist with a formal ontology of required procedures. The missing calibration step would trigger an alert, prompting the system either to regenerate the list or to insert the missing element.
Similarly, in quiz generation tasks, symbolic reasoning could enforce constraints such as:
- Every question must have exactly four answer options.
- No duplicate correct answers are permitted.
- The distribution of difficulty must match the predefined rubric.
These are rules that symbolic systems can apply rigorously, ensuring the integrity of the generated material. Neural agents, meanwhile, provide the linguistic fluency to phrase questions naturally and adapt explanations to different learners. Together, the hybrid system promises adaptability without sacrificing reliability.
The importance of this work is amplified by the mission of ICT itself. As part of the Learning Sciences group, I was tasked not just with building clever AI agents, but with evaluating how they might be used to support real learning environments. Education is a domain that demands accountability. Unlike entertainment or casual conversation, errors in instructional content carry long-term consequences. A learner misled by an inaccurate explanation may not only fail to acquire knowledge but also lose trust in the system altogether.
Reflecting on my earlier work at Ben-Gurion University of the Negev in my home country of Israel, where I co-founded the Intelligent Robotics Lab, I see continuity in this challenge. There, we tackled problems in robotics and control theory—domains where precision is paramount. A robot that adapts dynamically but cannot guarantee safety is virtually unusable. Likewise, in multi-objective optimization and game theory models, flexibility must always be disciplined by mathematically stated rules. My background in mechanical engineering and control systems has made me particularly sensitive to this balance. Systems must not only work, but work reliably, within well-defined boundaries.
That perspective shaped how I approached the AutoGen experiments. Rather than accepting the framework’s outputs at face value, I tested them against explicit criteria derived from learning sciences. Did the generated explanations align with educational best practices? Did quizzes adhere to fair assessment standards? Could multi-agent dialogues maintain coherence across multiple turns without drifting off-topic? In many cases, the answers highlighted weaknesses—weaknesses that NeuroSymbolic integration could potentially address.
For example, consider a scenario in which an “instructor” agent explained Newton’s laws to a high school student, while a “reviewer” agent was tasked with evaluating the explanation for accuracy and clarity. The instructor performed admirably at first, tailoring the explanation to a student’s level. Yet when the reviewer flagged an inconsistency—confusing “inertia” with “momentum”—the instructor simply rephrased the original error rather than correcting it. The two agents reinforced a flawed explanation, creating the illusion of reliability without the substance. In a NeuroSymbolic system, the reviewer would not rely solely on probabilistic patterns; it would have access to a symbolic knowledge base of physics principles, allowing it to identify the precise logical contradiction and demand correction.
The promise of this approach extends well beyond education. In any high-stakes field—medicine, law, engineering—AI must be concurrently adaptive and reliable. A medical assistant system that flexibly adapts to a patient’s symptoms but produces contradictory treatment plans is highly dangerous. A legal reasoning agent that improvises persuasive arguments but overlooks statutory requirements is unreliable. The need for NeuroSymbolic approaches is universal: blending the creativity of neural methods with the accountability of symbolic systems.
What excited me most in my ICT’s mission is the possibility of bringing these ideas into scalable, real-world systems. Imagine a learning platform where every generated lesson, quiz, or explanation is both adaptive to the learner’s needs and grounded in formal educational standards. Imagine AI tutors that not only converse fluently but can justify their reasoning in transparent, rule-based terms. Such systems would not replace teachers but augment them—offering personalized support while maintaining the rigor that education demands.
This vision is ambitious, and the technical challenges are significant. Symbolic systems often struggle with scalability and brittleness, while neural systems can be opaque and error-prone. Yet the convergence of these paradigms is underway, and my work aims to accelerate it. At Syracuse University, I continue to develop algorithms that embed symbolic reasoning modules directly within neural architectures. These include methods for translating symbolic constraints into differentiable functions, allowing neural networks to “learn” while respecting explicit rules.
The research is still evolving, but the direction is clear. NeuroSymbolic AI provides not just a conceptual bridge between adaptability and reliability, but a practical framework for building systems that educators, learners, and society at large can trust. My experience at ICT has reinforced this trajectory, offering a testing ground where the limits of current agentic frameworks are visible and the need for hybrid solutions is undeniable.
As I reflect on this journey, I return to a simple but enduring lesson: adaptability without reliability is fragility, and reliability without adaptability is brittleness. To build the next generation of AI—particularly in sensitive domains like education—we must refuse that trade-off. We must design systems that adapt and endure, that learn and reason, that serve human needs without sacrificing human trust. NeuroSymbolic AI, I believe, holds the architecture for the future of Artificial General Intelligence (AGI).
//