Is Human-in-the-loop the ultimate AI control? Spoiler alert, it isn’t.

Inspiration sometimes comes from the most unexpected sources. A colleague recently shared an amusing yet thought-provoking Indian chocolate commercial showing the failure of Human-in-the-loop (HITL) controls over AI. Whilst the ad made me chuckle the underlying problem is anything but amusing. It is a serious issue that needs to be addressed as AI becomes more prevalent in our day-to-day lives.

So, what exactly is HITL? At its core, it's a mechanism demanding human input, oversight, or intervention in an AI system's decision-making process. It has swiftly become a cornerstone of AI governance, particularly in high-stakes or safety-critical applications. It's easy to understand the comfort this type of control offers: the reassuring belief that AI merely advises, never truly decides. Indeed, surveys show that only 1% of people trust AI to make significant decisions independently with two-thirds trusting AI to provide insights but not make decisions.

However, the usefulness of HITL is highly context-dependent, and I believe that for many AI use cases HITL controls will be neither effective nor appropriate.

Effective AI Governance within organisations requires selecting the right controls, commensurate with risk to bolster business processes and earn the trust of your customers and the public. So where does HITL fit?

The advantages of HITL

So why are HITL controls popular for businesses and regulators alike? As AI becomes more prevalent in our lives trustis paramount and, in many cases, HITL provides a measure of trust for businesses, their customers and the public.

Safety and Risk Mitigation

Failsafe Mechanism: Humans serve as the ultimate last line of defence, capable of preventing catastrophic failures, especially when facing "black swan" edge cases that even the most advanced AI might miss, for example, NASA mandates human oversight in all autonomous mission-critical operations.
Moral Accountability: HITL frameworks are crucial for preserving human moral agency, particularly in ethically charged decisions, such as those with life-or-death outcomes.
Dynamic Adaptability: Human judgment possesses an unparalleled ability to adapt to novel or ambiguous contexts that AI systems often struggle to generalise effectively.

Trust and Public Legitimacy

Transparency & Assurance: Human oversight can signal responsibility and help build public and/or customer trust in AI systems.
Auditability: Human intervention provides a record of discretionary decisions that can be audited post hoc, unlike fully automated systems.

Legal and Regulatory Compliance

Due Process: In heavily regulated sectors (e.g., finance, healthcare, justice), human oversight is non-negotiable and ensures alignment with procedural fairness requirements.
Current Norms and Laws: The majority of emerging AI regulations, notably the EU AI Act and the well-established OECD principles, explicitly mandate HITL for high-risk AI systems.

The limitations of HITL

While HITL offers a comforting sense of control, we must realistically acknowledge its inherent constraints. We are increasingly witnessing real-world use cases where HITL is simply not practical or effective. Consider the blistering pace of real-time fraud detection within global financial systems: AI analyses billions of daily payment authorizations, flagging suspicious transactions in milliseconds. Human reviewers, however, simply cannot keep pace with the sheer volume of transactions, let alone the instantaneous response times required.

So, what are these fundamental constraints?

Human Factors Challenges

Automation Bias: Reality is that operators frequently defer to algorithmic recommendations, especially when under pressure. Studies show that up to 88% of users tend to over-rely on AI suggestions, even when evidence points to an AI error, effectively undermining critical human oversight.
Complacency and Deskilling: Repeated exposure to highly accurate AI systems can dangerously erode operator vigilance and degrade their intrinsic decision-making skills. For instance, in aviation, over-reliance on autopilot has led to a significant decrease in manual flying hours, impacting pilot proficiency in emergencies.
Cognitive Overload: In complex, fast-moving environments (think drone surveillance or high-frequency trading), humans can be overwhelmed, struggling to effectively assess AI decisions in real time. The average human attention span has reportedly dropped to just 8 seconds, making sustained, high-volume AI oversight incredibly difficult.

A great example of this human factor was the recent security failing at an AFL game at the MCG. AI powered security scanners were used to scan supporters entering the stadium and correctly flagged two individuals carrying firearms. However, human security staff failed to perform a thorough manual follow-up allowing two armed men to enter.

Scalability and Feasibility

High Operational Costs: Embedding meaningful human oversight at scale is incredibly resource-intensive, particularly in applications like vast content filtering operations or global fraud detection. Estimates suggest that human review for complex AI tasks can cost hundreds of dollars per hour, quickly becoming unsustainable.
Latency Constraints: HITL can introduce unacceptable delays in systems where real-time response is absolutely critical, such as autonomous vehicles. Even a millisecond of delay in braking decisions in a self-driving car can have catastrophic consequences.

The Illusion of Control

Formal Oversight is not the same as Effective Oversight: Simply having a human "in the loop" doesn’t ensure meaningful control. Oversight can often be nominal or ritualistic.
Rubber-Stamping Risk: In low-discretion, high-throughput settings, humans may routinely approve AI outputs without genuine scrutiny, essentially becoming a mere stamp of approval rather than a genuine check. Research indicates that in some environments, humans "approve" AI decisions over 95% of the time, irrespective of accuracy, highlighting a critical flaw.

Recommendations and Best Practices

Where HITL controls are implemented we need to ensure that they are well-designed to be able to operate effectively. This should involve:

Risk-based deployment: Prioritise HITL in genuinely high-stakes domains (e.g., criminal justice, healthcare, warfare) where errors carry severe consequences. Conversely, this may be unnecessary for low-risk applications.
Continuous evaluation: Human oversight systems never be assumed effective by design. They should be audited and stress-tested to confirm operating effectiveness.
Clear roles and escalation paths: Avoid vague oversight mandates. Instead clearly articulate how and when humans are expected to intervene, defining clear escalation protocols.
Tooling and UX support: Invest in intuitive interfaces and decision aids that empower humans to interpret and challenge AI recommendations effectively.

Where HITL controls are not appropriate we need to ensure that overall governance frameworks are well designed and operated, and that proper consideration is given to roles and accountabilities

Risk management processes should consider specific scenarios for HITL failure within the taxonomy; compensating controls for cognitive bias and interface flaws; and stress-testing protocols simulating human oversight breakdowns

Conclusion

Human-in-the-loop controls are a valuable governance mechanism - but they are not a panacea. Tried and tested risk management principles still apply, designing effective and appropriate controls depends on understanding the risks inherent in your business processes and use cases. AI risks are no different.

The usefulness of HITL controls depends on thoughtful design, domain-specific constraints, and recognition of human limitations. The future likely lies in dynamic, context-sensitive oversight models that blend human judgment with machine precision - augmented by institutional accountability mechanisms that go beyond individual decision-makers.