Apple's iconic App Store was recently updated to feature AI-generated summaries of user reviews, and now we know how it all works.

In October 2024, an unlisted App Store article revealed that Apple wanted to summarize user application reviews with the help of artificial intelligence. Months later, in March 2025, the feature became available to the general public with the release of iOS 18.4.

While we already had a few details about Apple's AI-generated review summaries, a new post on Apple's Machine Learning blog explains the intricacies and specifics of the feature.

The characteristics and goals of AI-generated review summaries

The ultimate goal of these summaries is to provide users with a clear picture of an app's reviews, so that they may more easily decide whether or not to purchase or install a particular application. In summarizing user reviews, however, Apple had to make sure that the AI output was up to date and that it didn't include off-topic or offensive information.

App Store applications often receive updates, and changes such as new features, bug fixes, or in-app items often influence user reviews. App reviews themselves also vary by style, length, and even relevance. Apple's AI summarization needed to account for all of these factors, so the company implemented a multi-step process.

How Apple's AI summarizes user reviews

First, user reviews with spam and profanity are filtered out. Eligible reviews are then put through a series of different LLMs or large language models, which extract key insights from user reviews. After that, common themes are aggregated, and user sentiment is balanced. The result is an AI-generated summary that reflects broad user sentiment, with a length of 100 to 300 words.

During the first phase of the process, known as "Insight Extraction," user reviews are boiled down to distinct insights. Apple says that these insights encapsulate "one specific aspect of the review, articulated in standardized, natural language, and confined to a single topic and sentiment."

"Dynamic Topic Modeling" lets Apple's AI compare relevant topics across different reviews, so that the software can identify the most prominent topics discussed. The approach and terminology bear some resemblance to Apple's AI test applications, which we outlined in 2024.

For each app, a set of topics, along with the "most representative" insights for these topics, are used by AI in the creation of summaries. The specially designed LLMs ensured that user sentiment was balanced, and that the summaries maintained the required form and length.

During development, Apple's AI-generated summaries were evaluated for characteristics such as groundedness, composition, helpfulness, and more. This part of the process involved human reviewers, which serves as an indication of how seriously Apple took its AI summary development.

Apple's blog details all of the steps mentioned here, with more specific information on the technology used during each part of the process. All in all, the iPhone maker's approach ensures that AI-generated summaries of user reviews are accurate, helpful, spam-free, and up to date.

1 Comment

mpantone 19 Years · 2406 comments

About 2 days ago

AppleInsider said:
ADuring development, Apple's AI-generated summaries were evaluated for characteristics such as groundedness, composition, helpfulness, and more. This part of the process involved human reviewers, which serves as an indication of how seriously Apple took its AI summary development.
It's not like Apple had a choice in the matter. The end product (AI review summaries) is intended to be consumed by humans. If it's unappealing, uninformative or useless, it's not helpful.
Apple's blog details all of the steps mentioned here, with more specific information on the technology used during each part of the process. All in all, the iPhone maker's approach ensures that AI-generated summaries of user reviews are accurate, helpful, spam-free, and up to date.
Ultimately it's the human at the end who decides whether or not the AI generated output is meaningful. The machines are working for us. And not just this usage case, all usages of machine learning/artificial intelligence.

That said, Apple engineers probably looked at existing AI generated summaries. Amazon is full of them these days and even search engines have result/hit summaries. And we already know of several situations like AI-generated bungled news headline summaries that put egg on Apple's face. So there really better be some scrutiny on the output.

If it generates user review summaries that are only useful about 2/3 of the time, that's probably insufficient. It really needs to exceed 99% usefulness for everyone visiting the store. I simply don't have the time and interest to start reading some AI-generated review summary only to end up deciding it's worthless. If that's the case, I'm probably better off reading 2-3 reviews with the top "This was helpful" scores. And at least there's likely a source (username like JaneyBrown, 350+ reviews, 15 years on the App Store) given. If I see a review written by someone who just registered today, I will take a hard pass.

All of this consumer-facing LLM-generated AI output is pretty marginal in usage here in April 2025. It'll really be years before it's widely accepted. If I read sixty review summaries and twenty are lame, how much trust and confidence am I going to put into the next one I read? If you had human assistants researching and writing these summaries, would you be happy with a 66% success rate? Or would you think that a third of their time is being wasted.

I'm glad to see Apple trying to make an effort but

TRUST IS EARNED.