Skip to content

Latest commit

 

History

History
409 lines (316 loc) · 14 KB

04-collectors.md

File metadata and controls

409 lines (316 loc) · 14 KB

Collectors TimeToRead

On day 2, you learned that the Stream API can help you work with collections in a declarative manner. We looked at collect, which is a terminal operation that collects the result set of a stream pipeline in a List. collect is a reduction operation that reduces a stream to a value. The value could be a Collection, Map, or a value object. You can use collect to achieve following:

  1. Reducing stream to a single value: Result of the stream execution can be reduced to a single value. Single value could be a Collection or numeric value like int, double, etc or a custom value object.

  2. Group elements in a stream: Group all the tasks in a stream by TaskType. This will result in a Map<TaskType, List<Task>> with each entry containing a TaskType and its associated Tasks. You can use any other Collection instead of a List as well. If you don't need all the tasks associated with a TaskType, you can alternatively produce a Map<TaskType, Task>. One example could be grouping tasks by type and obtaining the first created task.

  3. Partition elements in a stream: You can partition a stream into two groups -- e.g. due and completed tasks.

Collector in Action

To feel the power of Collector let us look at the example where we have to group tasks by their type. In Java 8, we can achieve grouping by TaskType by writing code shown below. Please refer to day 2 blog where we talked about the example domain we will use in this series

privatestaticMap<TaskType, List<Task>> groupTasksByType(List<Task> tasks) { returntasks.stream().collect(Collectors.groupingBy(task -> task.getType())); }

The code shown above uses groupingByCollector defined in the Collectors utility class. It creates a Map with key as the TaskType and value as the list containing all the tasks which have same TaskType. To achieve the same in Java 7, you would have to write the following.

publicstaticvoidmain(String[] args) { List<Task> tasks = getTasks(); Map<TaskType, List<Task>> allTasksByType = newHashMap<>(); for (Tasktask : tasks) { List<Task> existingTasksByType = allTasksByType.get(task.getType()); if (existingTasksByType == null) { List<Task> tasksByType = newArrayList<>(); tasksByType.add(task); allTasksByType.put(task.getType(), tasksByType); } else { existingTasksByType.add(task); } } for (Map.Entry<TaskType, List<Task>> entry : allTasksByType.entrySet()) { System.out.println(String.format("%s =>> %s", entry.getKey(), entry.getValue())); } }

Collectors: Common reduction operations

The Collectors utility class provides a lot of static utility methods for creating collectors for most common use cases like accumulating elements into a Collection, grouping and partitioning elements, or summarizing elements according to various criteria. We will cover the most common Collectors in this blog.

Reducing to a single value

As discussed above, collectors can be used to collect stream output to a Collection or produce a single value.

Collecting data into a List

Let's write our first test case -- given a list of Tasks we want to collect all the titles into a List.

importstaticjava.util.stream.Collectors.toList; publicclassExample2_ReduceValue { publicList<String> allTitles(List<Task> tasks) { returntasks.stream().map(Task::getTitle).collect(toList()); } }

The toList collector uses the List's add method to add elements into the resulting List. toList collector uses ArrayList as the List implementation.

Collecting data into a Set

If we want to make sure only unique titles are returned and we don't care about order then we can use toSet collector.

importstaticjava.util.stream.Collectors.toSet; publicSet<String> uniqueTitles(List<Task> tasks) { returntasks.stream().map(Task::getTitle).collect(toSet()); }

The toSet method uses a HashSet as the Set implementation to store the result set.

Collecting data into a Map

You can convert a stream to a Map by using the toMap collector. The toMap collector takes two mapper functions to extract the key and values for the Map. In the code shown below, Task::getTitle is Function that takes a task and produces a key with only title. The task -> task is a lambda expression that just returns itself i.e. task in this case.

privatestaticMap<String, Task> taskMap(List<Task> tasks) { returntasks.stream().collect(toMap(Task::getTitle, task -> task)); }

We can improve the code shown above by using the identity default method in the Function interface to make code cleaner and better convey developer intent, as shown below.

importstaticjava.util.function.Function.identity; privatestaticMap<String, Task> taskMap(List<Task> tasks) { returntasks.stream().collect(toMap(Task::getTitle, identity())); }

The code to create a Map from the stream will throw an exception when duplicate keys are present. You will get an error like the one shown below.

Exception in thread "main" java.lang.IllegalStateException: Duplicate key Task{title='Read Version Control with Git book', type=READING} at java.util.stream.Collectors.lambda$throwingMerger$105(Collectors.java:133) 

You can handle duplicates by using another variant of the toMap function which allows us to specify a merge function. The merge function allows a client to specify how they want to resolve collisions between values associated with the same key. In the code shown below, we just used the newer value, but you can equally write an intelligent algorithm to resolve collisions.

privatestaticMap<String, Task> taskMap_duplicates(List<Task> tasks) { returntasks.stream().collect(toMap(Task::getTitle, identity(), (t1, t2) -> t2)); }

You can use any other Map implementation by using the third variant of toMap method. This requires you to specify MapSupplier that will be used to store the result.

public Map<String, Task> collectToMap(List<Task> tasks) { return tasks.stream().collect(toMap(Task::getTitle, identity(), (t1, t2) -> t2, LinkedHashMap::new)); } 

Similar to the toMap collector, there is also toConcurrentMap collector, which produces a ConcurrentMap instead of a HashMap.

Using other collections

The specific collectors like toList and toSet do not allow you to specify the underlying List or Set implementation. You can use the toCollection collector when you want to collect the result to other types of collections, as shown below.

private static LinkedHashSet<Task> collectToLinkedHaskSet(List<Task> tasks) { return tasks.stream().collect(toCollection(LinkedHashSet::new)); } 

Finding Task with longest title

publicTasktaskWithLongestTitle(List<Task> tasks) { returntasks.stream().collect(collectingAndThen(maxBy((t1, t2) -> t1.getTitle().length() - t2.getTitle().length()), Optional::get)); }

Count total number of tags

publicinttotalTagCount(List<Task> tasks) { returntasks.stream().collect(summingInt(task -> task.getTags().size())); }

Generate summary of Task titles

publicStringtitleSummary(List<Task> tasks) { returntasks.stream().map(Task::getTitle).collect(joining(";")); }

Grouping Collectors

One of the most common use case of Collector is to group elements. Let's look at various examples to understand how we can perform grouping.

Example 1: Grouping tasks by type

Let's look at the example shown below, where we want to group all the tasks based on their TaskType. You can very easily perform this task by using the groupingBy Collector of the Collectors utility class. You can make it more succinct by using method references and static imports.

importstaticjava.util.stream.Collectors.groupingBy; privatestaticMap<TaskType, List<Task>> groupTasksByType(List<Task> tasks) { returntasks.stream().collect(groupingBy(Task::getType)); }

It will produce the output shown below.

{CODING=[Task{title='Write a mobile application to store my tasks', type=CODING, createdOn=2015-07-03}], WRITING=[Task{title='Write a blog on Java 8 Streams', type=WRITING, createdOn=2015-07-04}], READING=[Task{title='Read Version Control with Git book', type=READING, createdOn=2015-07-01}, Task{title='Read Java 8 Lambdas book', type=READING, createdOn=2015-07-02}, Task{title='Read Domain Driven Design book', type=READING, createdOn=2015-07-05}]} 

Example 2: Grouping by tags

privatestaticMap<String, List<Task>> groupingByTag(List<Task> tasks) { returntasks.stream(). flatMap(task -> task.getTags().stream().map(tag -> newTaskTag(tag, task))). collect(groupingBy(TaskTag::getTag, mapping(TaskTag::getTask,toList()))); } privatestaticclassTaskTag { finalStringtag; finalTasktask; publicTaskTag(Stringtag, Tasktask) { this.tag = tag; this.task = task; } publicStringgetTag() { returntag; } publicTaskgetTask() { returntask; } }

Example 3: Group task by tag and count

Combining classifiers and Collectors

privatestaticMap<String, Long> tagsAndCount(List<Task> tasks) { returntasks.stream(). flatMap(task -> task.getTags().stream().map(tag -> newTaskTag(tag, task))). collect(groupingBy(TaskTag::getTag, counting())); }

Example 4: Grouping by TaskType and createdOn

privatestaticMap<TaskType, Map<LocalDate, List<Task>>> groupTasksByTypeAndCreationDate(List<Task> tasks) { returntasks.stream().collect(groupingBy(Task::getType, groupingBy(Task::getCreatedOn))); }

Partitioning

There are times when you want to partition a dataset into two datasets based on a predicate. For example, we can partition tasks into two groups by defining a partitioning function that partitions tasks into two groups -- one with due date before today, and one with the others.

privatestaticMap<Boolean, List<Task>> partitionOldAndFutureTasks(List<Task> tasks) { returntasks.stream().collect(partitioningBy(task -> task.getDueOn().isAfter(LocalDate.now()))); }

Generating statistics

Another group of collectors that are very helpful are collectors that produce statistics. These work on the primitive datatypes like int, double, and long; and can be used to produce statistics like those shown below.

IntSummaryStatisticssummaryStatistics = tasks.stream().map(Task::getTitle).collect(summarizingInt(String::length)); System.out.println(summaryStatistics.getAverage()); //32.4System.out.println(summaryStatistics.getCount()); //5System.out.println(summaryStatistics.getMax()); //44System.out.println(summaryStatistics.getMin()); //24System.out.println(summaryStatistics.getSum()); //162

There are other variants as well for other primitive types like LongSummaryStatistics and DoubleSummaryStatistics

You can also combine one IntSummaryStatistics with another using the combine operation.

firstSummaryStatistics.combine(secondSummaryStatistics); System.out.println(firstSummaryStatistics)

Joining all titles

privatestaticStringallTitles(List<Task> tasks) { returntasks.stream().map(Task::getTitle).collect(joining(", ")); }

Writing a custom Collector

importcom.google.common.collect.HashMultiset; importcom.google.common.collect.Multiset; importjava.util.Collections; importjava.util.EnumSet; importjava.util.Set; importjava.util.function.BiConsumer; importjava.util.function.BinaryOperator; importjava.util.function.Function; importjava.util.function.Supplier; importjava.util.stream.Collector; publicclassMultisetCollector<T> implementsCollector<T, Multiset<T>, Multiset<T>> { @OverridepublicSupplier<Multiset<T>> supplier() { returnHashMultiset::create; } @OverridepublicBiConsumer<Multiset<T>, T> accumulator() { return (set, e) -> set.add(e, 1); } @OverridepublicBinaryOperator<Multiset<T>> combiner() { return (set1, set2) -> { set1.addAll(set2); returnset1; }; } @OverridepublicFunction<Multiset<T>, Multiset<T>> finisher() { returnFunction.identity(); } @OverridepublicSet<Characteristics> characteristics() { returnCollections.unmodifiableSet(EnumSet.of(Characteristics.IDENTITY_FINISH)); } }
importcom.google.common.collect.Multiset; importjava.util.Arrays; importjava.util.List; publicclassMultisetCollectorExample { publicstaticvoidmain(String[] args) { List<String> names = Arrays.asList("shekhar", "rahul", "shekhar"); Multiset<String> set = names.stream().collect(newMultisetCollector<>()); set.forEach(str -> System.out.println(str + ":" + set.count(str))); } }

Word Count in Java 8

We will end this section by writing the famous word count example in Java 8 using Streams and Collectors.

publicstaticvoidwordCount(Pathpath) throwsIOException { Map<String, Long> wordCount = Files.lines(path) .parallel() .flatMap(line -> Arrays.stream(line.trim().split("\\s"))) .map(word -> word.replaceAll("[^a-zA-Z]", "").toLowerCase().trim()) .filter(word -> word.length() > 0) .map(word -> newSimpleEntry<>(word, 1)) .collect(groupingBy(SimpleEntry::getKey, counting())); wordCount.forEach((k, v) -> System.out.println(String.format("%s ==>> %d", k, v))); }

Analytics

close