Managing AI Managing AI: New Frontier for Agencies, Appointees, and Agents
Imagine the afforded edge of trusted Federal Agents given clear rules for using GenAI, with the latitude to use their best judgment to help set up that AI to manage other AIs.
Part 1 of the Responsible Autonomy Series: Imagining a Federal Workforce Entrusted to Manage AI Agents
Back to Work
The election is decided. With many feelings and thoughts swirling with the result, now we have work to do. In this first post of my newly renamed Agentic Edge Substack, I’ll introduce the concept of Responsible Autonomy as the human foundation for using GenAI and LLMs to create agents that augment human data collection, analysis, information validation, and decision-making. I’ll offer this as a kind of ideal and highlight how the next administration can use it as a framework to start (and continue) thinking about aligning AI agents to the evolving work of Federal agents.
My research and experience tell me—perhaps counterintuitively in this moment of potential democratic peril—that entrusting U.S. Federal Agents with this work is one of the best things we can do, as they have a sacred duty to follow clear rules and use the latitude their role affords them in activities and judgment calls that maintain accountability, transparency, and ethical standards, even in very difficult moments.
This post will focus on the current landscape in federal agencies, drawing from the principles outlined here, where governance, transparency, and ethical oversight anchor the use of AI. This includes the management of AI agents that critique, oversee, and even initiate other AI in what are called agentic workflows.
Starting Definitions:
Understanding Responsible Autonomy: Defining what it means to empower AI with autonomy, balanced by stringent governance.
Federal Agency Leadership: Explore how leaders like Dr. Stacey Dixon and Gen. Dimitri Henry are guiding intelligence agencies through AI integration by setting standards for governance and provenance.
Guiding Principles: Break down the foundational principles that allow agencies to set up clear, self-service management frameworks, ensuring trust in AI decisions.
The Complexity of AI Critiquing AI
I view it as inevitable in this business-friendly administration that Federal agencies will deploy large language models (LLMs) in dramatically successful ways, along with some failures. I also think we are on track to task LLMs not only with generating insights but with evaluating the reliability and alignment of other AI models. Imagine a scenario where one LLM reviews the ethical implications of another model’s output or assesses its accuracy against security standards. This capability is powerful, yet it opens a Pandora’s box of challenges:
Trust and Verification: How can agencies trust the critiques generated by one LLM about another? Can these models be verified without human oversight, or will human operators always be required to mediate this “AI-on-AI” analysis?
Bias Detection and Correction: If LLMs are critiquing each other, who ensures that critiques themselves are free from bias? Without careful management, LLMs could amplify biases instead of mitigating them, especially when dealing with high-stakes data in defense or policy contexts.
Ethical Alignment: How can agencies ensure that AI critiques uphold ethical standards? By what principles will they align their judgments about their AI? For instance, if one LLM flags another for suggesting policies that may have ethical implications, who determines the moral framework that informs these critiques?
These are questions that a cautious, compliance-driven approach alone won’t solve. Instead, they demand a blend of trust, transparency, and an informed willingness to take calculated risks—the very essence of Responsible Autonomy.
Responsible Autonomy: Initial Steps for Federal Agencies
Federal agencies committed to LLM adoption face a unique opportunity to set standards for how AI critiques are managed and evaluated. Here’s where Responsible Autonomy can be spelled out as an ideal and then requires a leap of faith to apply it pragmatically. This requires nuanced thinking: agencies must commit to innovation while remaining vigilant against risks that emerge when AI critiques AI.
Sworn Federal Agents are essential to the planning process. Though they are typically not the most technical, they have a high bar of ethicality to uphold, plus a level of latitude to accomplish their mission as they think best. This gives them a far better sense of ground truth than Agency leaders and IT teams, and this sense awareness has produced the very intuitions that AI needs to optimally perform their given duties, including the management of other AI agents.
That is a seachange in planning, for sure. This leads to the creation and management of new agentic workflows that support the Federal workforce. Now, as we are in the swirl of post-election openness and heightened awareness, is an ideal time for us to flesh out that new way of working, out in the open, with the high bar of ethicality in mind for all of us.
This level of engagement is what Responsible Autonomy really amounts to, for us as citizens/voters and professionals/workers. It is doing American values and being the “best Angels of our nature” or even “the best version of ourselves.” Whatever you want to call it is great. The point is doing it. Below are some initial steps for thinking ahead about beefing up the role of Federal Agents to manage AI managing AI, building very much on my take-aways from this panel at DoDIIS last week.
1. Establish Robust Governance and Provenance Systems
Dr. Dixon and Gen. Henry emphasized governance and data provenance—two principles that anchor the responsible use of AI. For LLMs critiquing other LLMs, governance structures are indispensable. Many already exist, and the principles behind those documents can be consumed into trusted AI models to power the agentic workforce. Agencies can pivot from closed to open cultures in principle (as a default, with being secretive and closed as by exception) by ensuring transparency at every level, establishing new systems overseen by Federal agents to track the origin, processing, and evaluation of AI critiques according to the value of their outputs.
That is governance. For provenance, Agency decision-makers and technical teams must reimagine a part of their jobs as verifying the AI outputs (recommendations, intelligence reports, courses of action, etc.). the critiques of AI by AI, the principle documents, and the entire analytical process to generate the inferences and insights.
Leap of Faith: This commitment to governance and provenance requires an upfront investment of time and resources by Agency and IT leaders to empower their Federal Agents to participate in setting up rigorous checks and balances on and in the loop. This leap, while resource-intensive, is essential for establishing the trustworthiness and long-term viability of LLMs in Federal scenarios.
2. Define the Scope and Ethical Parameters of AI Critiques
For LLMs critiquing each other, another leap is required to align them with Federal values. Agency leaders are on the hook to define clear ethical parameters for LLMs. This includes establishing guidelines for acceptable critique subjects, identifying the limits of AI autonomy, and determining when human oversight is mandatory. Ethical alignment is a dynamic, ongoing process that partially adapts as LLM capabilities evolve, but also stands firm on our core values (here’s a refresher).
Leap: Federal agencies can take a calculated risk here by entrusting AI models to conduct limited, supervised critiques autonomously. This creates room to engage Federal Agents in the educational process of learning how LLMs use parameters to “move information” toward outcomes, and how the AI process differs from the human process of using their latitude and intuition to make judgment calls in the line of duty. In this interaction, the humans can increasingly trust that the initial critiques of AI by AI are robust yet agile enough to refine their parameters as new questions and ethical dilemmas inevitably emerge. This is where their human intuition intersects with the power of LLMs to generate insights tuned to their situation and information needs.
3. Encourage Collaboration Across Departments for Bias Identification
Cross-departmental collaboration is essential in spotting biases that may arise in LLM critiques. AI biases can be subtle, and without diverse perspectives, they risk going unnoticed. Agency leaders can plan now for interagency collaboration, empowering Agents, IT, operations, compliance, and other functional leaders to engage the result of the first runs. That means looking closely at how the first LLM generated its output and how the critiquing LLM interacted with the first LLM.
There is a lot more to discuss here about bias, and I will plan to do that moving forward. For now, the key point is that this collaboration over the LLM outputs and the LLM critiques creates a new organizational venue for deeper human interaction. This venue affords them the edge of establishing agentic workflows that are fair, representative, and contextually aware of the needs of the entire team. Team members can then start to bring their principle documents and tacit knowledge to the table to refine the models and identify biases as they emerge with transparency, always referring back to our core values and the agency’s mission in support of them.
Leap: Collaboration requires vulnerability—a willingness to invite feedback that may challenge ingrained biases within agencies themselves. Leaders should leap first the into collaborative transparency that is critical for establishing credibility in the loop and on the loop of their AI-driven critiques.
Where AI Ends and Human Judgment Begins
While LLMs offer profound analytical capabilities, no critiques operate in a vacuum. Ultimately, the judgment of when to act on the outputs of any AI rests with human decision-makers. This boundary—the line between AI evaluation and human discernment—represents a core aspect of the Responsible Autonomy in Federal AI adoption.
Agencies adopting LLMs must remember that human oversight is not a sign of the inadequacy of humans (or AI), but a central pillar of the value and usefulness of AI. This framework offers an ethical integration of human and agentic workflows. By acknowledging that human intuition, ethical reasoning, and lived experience play roles that AI can never replicate, agencies can create a framework that keeps LLM critiques aligned with broader values.
Pathways into Responsible Autonomy with AI Adoption
Trust and Transparency in Governance: Agencies can take steps to build trust by making AI critique processes transparent and open to scholarly and public review, baking democratic accountability into the processes by which our representatives in the Executive Branch make trustworthy decisions using AI.
Ethical Courage in Defining AI Roles: Leaders should embrace the courage to define clear ethical boundaries for AI critiques, even as LLMs advance. This doesn’t mean abandoning tacit ethical frameworks that exist and hold our institutions together. It means amplifying those values honed in the tribal groups of those sworn to uphold the Constitution and execute certain duties on its (our) behalf. These principles exist in documents and in our best traditions, and they can be implemented at scale for increasing responsibility and autonomy.
Commitment to Human Oversight: Agencies must commit to active stakeholder engagement by interpreting and validating AI critiques and outputs. Agency leaders can play a unique role in articulating how the principles that currently sit in static form in guidance documents and emails can be shifted into LLMs. They can engage the political appointees, Congress, and think thank folks who swirl around near the white buildings of Washington, while also engaging a far flung workforce of analysts, investigators, protectors of civil liberties, financial managers, consultants, SMEs, and contractors.
The balance we are seeking between AI-driven analysis and human discernment is essential, I argue, not only to maintaining both security and integrity in AI deployments, but in this new phase of democracy under a newly empowered and likely emboldened Executive Branch.
The journey into AI integration is fraught with technical and organizational challenges, complicated now by deploying AI to critique other AI. Yet, with a framework of governance, ethical clarity, and the willingness to take calculated risks (leaps?), Federal agencies can lead the way in adopting LLMs that are both innovative and accountable. We can confidently enter this new frontier because of America’s afforded edge lies not in technological prowess alone but in the wisdom to integrate it responsibly, under a set of core values that are the true protectors of our democracy, as we embody them
This sets up our next step—defining how we want to empower Federal agents to do both “in-the-loop” to “on-the-loop” oversight of their AI agents. That strategic shift is afforded to the U.S. by having the world’s most mature enterprise software and IT industries to build upon. Agency leaders can us that foundation to embody Responsible Autonomy for the nation as agents of our enduring values guiding and supporting a rejuvenated Federal workforce.


