OS for Agents - Peiqi Tang

A post-app world for wearables

Apps are not designed to consume much real time context. They always land you on the home page with the same set of content regardless of what's going on in your life.

A set of capabilities packaged at design time by companies and creators, based on user research and assumptions about what the user wants, as well as what makes sense for their business.

They have to include all the capabilities to cover all their users and all the use cases. For any particular user, this means most of the functions are noises at that particular moment.
Creators have to impose an info architecture so people can find what they need. The reality is that people spend too much time in the app just navigating to the actual function or info they need.

The number of apps are overwhelming thanks to the growing ecosystem in the last decade. Managing apps has officially become a task now.

Apps are reflections of business models, not a user's daily flow. They are siloed, individual islands in the sea of data. Navigating your digital and physical life requires you to switch back and forth among different apps.

The download model assumes all apps are equal. But naturally some apps are used less frequently than others. A rarely used app still occupies your device the same way as your frequently used apps, until you decide to delete it. And then later you find out you need it again just for this one specific time, so you have to download it again.

Mobile phones can endure all these because they have a big enough screen, but also at the cost of taking people out of the moment.

However, wearables promise to fix the problem mobile phones created. Instead of taking us out from the moment, they can assist us in a way that keeps us focused on the moment, and the life around us.

Diagram comparing the app model to a wearable OS and agent model.

Agents as the successors of apps on wearables

If our OS hosts agents instead of apps, a lot of the problems mentioned above will naturally be resolved.

Current Apps	Agents
Not designed to consume much real time context; assume the same starting point each time.	Use context to start at a different point of the flow each time.
Capabilities packaged at design time.	Capabilities acquired based on the jobs-to-be-done at that moment.
Users have to manually manage a large amount of apps.	The OS can manage agents for users. Agents can also manage themselves.
Siloed; lots of friction to switch between apps.	Proactive. Created based on jobs-to-be-done to minimize switching.
The way the OS organizes and presents apps assumes equal frequency of usage across all apps.	The OS and agents can understand and anticipate usage patterns based on context.

With users' permissions, agents take initiatives and proactively engage in 3 different ways:

Interact with users to acquire goals and ways of working

Interact with the world (physical and digital) to gather context

Interact with tools to execute tasks

Diagram showing how agents interact with users, the physical and digital world, and tools to execute tasks.

One of the big reasons for agents to work better than apps on wearables is the realtime specific context made available through the sensors on wearables. Smartphones and our laptops don't have access to context that rich and in such granularity.

A Few Possible Wearable Shell Model for the World of Agents

At the heart of a shell model is the kind of relationship users want with the OS and the agent. One of the most critical debates is how much autonomy does the system have on managing and deploying agents and how much control and transparency does the user want in the process. A more centralized model offloads most of the decision to the system, while a more distributed model relies more on users to do the orchestration.

The Dora Pocket shell model illustration.

The dora pocket - a black box of agents

Users form a long term relationship with the OS, who serves as a meta agent managing all other agents.

The OS handles what agent would get called based on the context.

Users don't know what agents are in the OS's pocket, how many of them nor how they are organized. The pocket is a black box for users.

The OS can pick another agent from the pocket if things don't work out.

Agents could be reusable or disposable and the OS will make the decision.

Users can make their request and complain. But most of the time the OS takes the initiative in choosing agents.

My Personal Council shell model illustration.

My personal council - lifelong partnership with a small group of agents

Users and the council know each other really well. Each agent in the council is specialized in one foundational pursuit of life that persists through time.

The council can individually or collectively form opinions, give advice and execute plans.

They might have conflicting goals or characteristics to be able to bring you various perspectives and handle all the different aspects of life.

Agents in the council can employ and manage other agents (with user's permission). Each agent could be equipped with different "skills" and context depending on the task at hand. Therefore users can manage a fairly small list of agents aligned with core life level goals that won't change much over time, while enabling them to accomplish a potentially limitless variety of tasks.

Users can rely on "stand-ups", "check-ins" and "reviews" to make sure agents stay aligned with goals and on track with plans.

OS fades into the background to power the relationship and connection, manage permissions for context and organize the inventory of "skills".

Agent Marketplace

OS provides and manages a marketplace of agents.

Users find and hire an agent from the marketplace based on the job at hand.

The user-agent relationship ends when the job ends. It's temporary and transactional.

Users can keep hiring the same agent or choose a different one for the job next time if the previous experience was not satisfying.

Attention and Intents

I personally prefer the shell model to be my personal council. But regardless of how centralized or distributed the model is on managing agents, the classic mixed initiatives interaction design problem is still valid in the age of AI Agent: when both the human and the AI agent can take initiative in the same task, what should the dance look like?

I got inspired while talking to Amy Karlson and started to think about this mixed initiative space as a quadrant diagram, with human attention and human intent as the 2 perpendicular axis.

The attention axis lays out the amount of attention the user pays to the agent and the job while it is being done. The intent axis shows a spectrum of intent density the user has for a specific job.

Quadrant diagram mapping attention and intent density for mixed initiative agent interaction.

Low attention and low intent density: Delegate to get it done.

It's not an important task to you. You don't enjoy doing it but need it to be done. You trust the AI agent so you delegate the task to it after providing a simple requirement.

Low attention and high intent density: Hand off with a detailed Scope of Work

You have detailed requirements and want things to be done in a specific way. The planning and execution stage of this kind of task has a clear cut. Therefore you can align with an AI agent in the planning stage then have the agent execute the task on itself while you focus on other things.

High attention and low intent density: Monitor to keep things on track

The cost of making mistakes is too high, or the info being handled in the task is too sensitive. Even though you don't have much intent to inject, you would like to pay close attention to make sure things stay on track, and fix it immediately when a problem emerges.

High attention and high intent density: Work together hand in hand

You want to inject/preserve your PoV on even the smallest details. The planning and execution is mixed together. You come up with the plan and make decisions as you are doing the work. Therefore you want to be involved in every step of the way.