As TikTok owner ByteDance rolls out Doubao AI screen-reading AI assistants, convenience and ... More capability are accelerating — so are the stakes for privacy and transparency.

NurPhoto via Getty Images

If there’s one idea the global artificial intelligence industry seems to agree on, it’s this: intelligent agents are the next phase of AI. More than just chatbots, these tools are designed to plan, execute and assist across multiple domains. From OpenAI to China’s Manus AI, developers are working toward smart AI assistants that can handle tasks autonomously — sometimes with minimal prompting.

But while much of the conversation in the West centers on technical benchmarks and API integrations, China’s largest tech platforms are pursuing a different path. There, the rise of “screen-aware” agents — AI assistants that can view, interpret and act based on what’s displayed on your screen by asking the user to turn on your PC’s assistive functions — represents a shift in how AI is being embedded into everyday computing.

One of the most notable examples is ByteDance’s Doubao AI, a consumer-facing assistant that introduced a Screen Sharing Call feature last month. With the user’s permission, the tool gains visibility over everything currently shown on the desktop during a voice interaction. It can read WeChat conversations, solve math problems on-screen, identify active applications and summarize video clips — all without switching context.

This design echoes Microsoft’s now-infamous Recall feature, which was similarly built to help users remember on-screen activities by capturing snapshots. Microsoft faced strong public backlash last year and delayed the rollout twice, ultimately limiting it to local data storage and making it opt-in only when it announced earlier this month that it was rereleasing it.

Despite Recall’s rocky start, this approach has signaled a broader trend in Chinese AI development — one focused on integrating agents directly into daily workflows by giving them wide-reaching visual context. While the technology promises to make assistants significantly more useful, it also raises serious and urgent questions about how AI developers are addressing privacy, consent and transparency.

Smart AI Assistants Expand Capabilities and Access

The key technical enabler behind Doubao AI’s feature is access to a system-level setting known as accessibility services. Originally designed to help users with disabilities, this permission grants apps the ability to view and interact with nearly all visual elements on the screen.

In practice, this means that once a user grants Doubao AI this permission, it can continuously observe what’s happening on the desktop during a voice call. While the feature does request user consent, the language used is often technical, and the implications are not always clear. Unlike screen sharing in video calls — where users deliberately choose what to show — Doubao AI sees everything unless the user manually closes apps or disables the function.

What’s more, the information interpreted by Doubao AI is uploaded to ByteDance’s servers for processing. This introduces a second layer of consideration: What happens to that data, how is it stored and what safeguards are in place to prevent misuse or unauthorized access?

Pavan Davuluri, Microsoft corporate vice president, Windows and devices, speaks about Recall on May ... More 20, 2024.

AFP via Getty Images

So far, Doubao hasn’t seen the same level of scrutiny that Recall did — at least not publicly. That may reflect differences in user expectations or regulatory maturity, but it also highlights how quickly powerful features can become normalized without broad public discussion.

An Ecosystem-Level Shift

Doubao isn’t an outlier. Other Chinese developers are rolling out screen-aware capabilities as well. Zhipu AI’s GLM-PC can carry out cross-application tasks on the desktop, while AutoGLM is focused on browser-based actions. Smartphone manufacturers like Xiaomi, OPPO, Honor and Vivo are embedding similar AI functions at the system level, often relying on accessibility permissions to parse visual content.

In some cases, these integrations happen with minimal user notification. OPPO’s ColorOS privacy policy, for instance, notes that its Xiaobu assistant can use accessibility services to scan screen elements — without explicitly prompting the user or offering an opt-out. This low-friction approach increases usability, but it also shifts more responsibility onto users to understand what they’ve enabled.

Oppo store in Shanghai, China

VCG via Getty Images

From a product standpoint, it’s easy to see the appeal. AI assistants that can “see” the user’s screen are able to provide more relevant, contextual help. They can summarize a document that’s open, highlight unread messages or offer quick actions tied to what’s currently displayed. In environments where fast interaction matters — such as mobile operating systems or productivity tools — screen awareness can be a significant feature advantage.

But it also raises structural questions about how digital consent is designed. If an AI assistant sees more than the user expects — or retains data longer than necessary — the balance between functionality and control can quickly shift.

Diverging Philosophies

The differing responses to screen-aware AI in China and the West reflect deeper contrasts in how AI is being developed and governed.

In the U.S. and Europe, AI assistants are typically sandboxed using formal APIs, explicit tool integrations and clear scopes of action. Developers emphasize consent mechanisms and often shape deployment timelines around public feedback — as Microsoft’s Recall experience shows.

In China, the reliance on accessibility services and GUI-based interpretation allows for quicker integration across apps and platforms, but often with less granular control. It’s a practical workaround to API fragmentation, and it accelerates product development. But the trade-off is that users may not fully understand what permissions they’ve granted — or how their data is being handled.

There are some signs of industry self-regulation. Xiaomi CEO Lei Jun has called for establishing standards for AI terminals, including a grading system and unified privacy frameworks. And the China Software Industry Association recently released guidelines on accessibility permissions, recommending limits on data access and greater transparency for users.

Still, these measures are early-stage and largely advisory. As more assistants adopt screen-sharing or screen-reading features, the need for clearer norms and accountability will only grow.

Balancing Promise and Risk

It’s worth emphasizing: screen-aware assistants are not inherently problematic. Used carefully, they can enhance productivity, improve accessibility and simplify digital tasks. There are legitimate use cases for users who want AI support that adapts in real time to what’s on screen.

But like any powerful feature, the impact depends on how it’s implemented — and how well users understand it. The key challenges include:

Transparency: Are users clearly informed when the assistant has screen access?
Consent: Is permission requested in meaningful, understandable language?
Boundaries: Are there built-in limits on what the assistant can see or do?
Data Handling: Is the screen content processed locally or sent to the cloud? How is it stored and for how long?

Without clear answers to these questions, screen-sharing by default could become normalized before its implications are fully understood.

Looking Ahead

As AI assistants evolve into intelligent agents capable of multistep reasoning and cross-platform actions, visual context will almost certainly become part of their toolkit. Whether through screenshot-based models, accessibility hooks or integrated APIs, screen awareness adds a layer of intelligence that static inputs can’t replicate.

The question is not whether this trend will continue — it will — but whether the industry builds the right infrastructure to ensure it’s done responsibly.

China’s deployment of these tools offers a preview of what’s technically possible and where frictionless design can lead. It also provides a cautionary tale: powerful AI features introduced without transparency risk undermining trust just as AI enters more personal spaces.

Developers, regulators and users alike would benefit from treating screen-aware agents not as a niche product tweak but as a pivotal shift in how we interact with machines. If the smart AI assistant of the future is to be a true partner, it must be built on a foundation of clarity, respect and informed choice.

China’s ‘Smart’ AI Assistants Echo Microsoft Recall’s Privacy Flaws

Smart AI Assistants Expand Capabilities and Access

An Ecosystem-Level Shift

Diverging Philosophies

Balancing Promise and Risk

Looking Ahead