Autonomy
agents plan task sequences, call tools, and act without direct human intervention
a project with Alive Engine
Alive Engine
IMT-Atlantique · BRAIN
Around twenty researchers making AI more accessible: less energy, less data, fewer priors on the data. A broad international network, and years of work on large language models and now on agentic AI.
A platform to build, deploy, and supervise persistent AI teammates that learn continuously, accumulate expertise, and work alongside human teams. On-prem for sensitive data, sandboxed, and fully auditable.
How an autonomous AI runs inside the hospital, on its own data, and what that solves.
Equip a model with the tools of the service: databases, files, a shell, search. It stops chatting and starts doing the work.
The clinician UI, gateway, model, agents, and hospital data all run on the hospital network. Cloud inference is not in the loop.
The agent may pull outside knowledge in. It cannot send patient context out to a third-party model.
Clinicians can reach the agent from a secure messaging app. The far end of the thread terminates inside the hospital network.
phone interface, encrypted thread
agent and server stay inside the hospital perimeter
A central agent orchestrates focused sub-agents, each with its own context and tools.
Acute stroke in the emergency room, where every minute is brain tissue.
The deck's stroke use case is about one measurable clock: time from hospital arrival to thrombolysis.
Figures from the stroke-pathway literature (Bulmer et al., Front. Neurol. 2021).
Patients who arrive by ambulance get a head start. Self-presenting patients wait for triage. And a system delay sits between imaging and the neuro read.
On a suspected stroke, the central agent spawns an intake sub-agent. By ambulance? It structures the handover. By private vehicle? It opens a secure thread with the patient or a companion, before triage.
Intake agent · collects onset, symptoms, contraindications before the patient is triaged
Records agent · pulls antecedents from database and cross-checks with intake
A records agent queries the hospital databases and returns a structured brief while the patient is still arriving.
An imaging agent drafts a preliminary read immediately, raises urgency, then the neurologist confirms or overrides.
Each contact point is a separate sub-agent with a bounded context. Information passes up as structured summaries, never raw context.
Agents attack the pre-triage gap, retrieve history in parallel, and cut the imaging-to-read dead time. Minutes saved from hospital arrival to thrombolysis become neurons saved.
Anchor: combining process changes alone cut median arrival-to-thrombolysis time by up to 26.7% in simulation (Bulmer et al., 2021). Agentic gains shown are a projected concept, not a measured result.
A funded research project on reliability, traceability, drift monitoring, and validation for medical agents.
The research project starts from a simple risk: in medicine, autonomous systems must be supervised for what they do, not only for what they say.
The research question is continuous trust supervision: reliability, reasoning coherence, traceability, and controlled evolution over time.
The project frames trust as a longitudinal problem: a medical agent must remain reliable, coherent, and traceable as it learns.
How do we verify that an autonomous medical agent still behaves correctly after tool use, memory updates, and self-improvement cycles?
agents plan task sequences, call tools, and act without direct human intervention
errors are not cosmetic: reasoning failures can affect patient decisions
persistent agents change through memory, profiling, and introspection cycles
A six-month maturation project with IMT-Atlantique BRAIN and Alive Engine, focused on supervision modules for learning medical agents.
what already works, what fails, and which evaluation gaps matter for medical agents
catch capability loss before an updated agent reaches clinical workflow
make recommendations inspectable, replayable, and attributable to evidence
track data, behavior, and confidence shifts over time
test with clinicians, thresholds, overrides, and sign-off workflow
The platform used in the project to create, deploy, and supervise persistent agents that learn continuously.
The project uses Alive Engine as the experimental substrate: continuous tasks, persistent state, introspection cycles, sandboxed execution, and real-time supervision.
agents consolidate what they learn instead of resetting every run
tools, sandboxes, and model calls remain inside the deployment boundary
every action, evidence source, and handoff can be replayed
thresholds, approvals, overrides, and halt states are first-class controls
The state of the art does not yet supervise whether an autonomous medical agent remains stable after introspection and self-improvement.
It can reduce hallucinations in aggregate, while increasing major omissions in some clinical summarisation settings.
Small updates accumulate. An agent can move away from its original objective without a regression protocol.
Guardrails and traces mostly monitor outputs. They rarely verify the agent’s self-evaluation cycles.
Strong medical evaluation datasets exist, but none evaluate whether a learning agent stays reliable over time.
The first half of the project turns agent learning into something testable and replayable.
Run periodic regression suites after introspection cycles to verify that mastered medical tasks did not degrade.
Attach every medical conclusion to sources, logical steps, and decision points, using knowledge graphs for provenance.
The second half of the project checks whether supervision works on a concrete medical-agent use case.
Detect unwanted changes in bias, tone, inter-agent trust, and decision orientation over interactions and learning cycles.
Deploy a medical agent on Alive Engine and evaluate differential diagnosis, drug interaction detection, and literature synthesis.
The maturation project should produce reusable supervision components, experimental evidence, and the basis for a longer collaboration and further works.
periodic tests for learning agents after introspection cycles
traceable conclusions with source and decision provenance
monitoring of personality, bias, tone, and inter-agent trust
Alive Engine deployment evaluated on open medical datasets
Longer ambition: a formal trust-supervision framework for learning multi-agent systems in critical domains.
Questions welcome.
Alive Engine
IMT-Atlantique · BRAIN