Is the Java Stream API intended to replace loops?

Question

I mean the question in the sense of: Should the occurrence of simple loops on collections in code in Java 8 and higher be regarded as code smell (except in justified exceptions)?

When it came to Java 8, I assumed that it would be good to treat everything with Stream API now wherever possible. I thought, especially when I use parallelStream() wherever I know that order doesn't matter, this gives the JVM the ability to optimize the execution of my code.

Teammates think differently here. They think lambda transforms are hard to read and we don't use streams much now. I agree that streams are hard to read if the code formatter forces you to write them like this:

return projects.parallelStream().filter(lambda -> lambda.getGeneratorSource() != null) .flatMap(lambda -> lambda.getFolders().prallelStream().map(mu -> Pair.of(mu, lambda.getGeneratorSource()))) .filter(lambda -> !lambda.getLeft().equals(lambda.getRight())).map(Pair::getLeft) .filter(lambda -> lambda.getDerivative().isPresent() || lambda.getDpi().isPresent() || lambda.getImageScale().isPresent() || lambda.getImageSize().isPresent()) .findAny().isPresent();

I would rather prefer to write them like this:

return projects.parallelStream() // skip all projects that don’t declare a generator source .filter(λ -> λ.getGeneratorSource() != null ) /* We need to remove the folders which are the source folders of their * project. To do so, we create pairs of each folder with the source * folder … */ .flatMap(λ -> λ.getFolders().stream() .map(μ -> Pair.of(μ, λ.getGeneratorSource()) // Pair<Folder, Folder> ) ) // … and drop all folders that ARE the source folders .filter(λ -> !λ.getLeft().equals(λ.getRight()) ) /* For the further processing, we only need the folders, so we can unbox * them now */ .map(Pair::getLeft) // only look for folders that declare a method to generate images .filter(λ -> λ.getDerivative().isPresent() || λ.getDpi().isPresent() || λ.getImageScale().isPresent() || λ.getImageSize().isPresent() ) // return whether there is any .findAny().isPresent();

Yes, return is at the top, and it looks quite differently than:

for (Project project : projects) { if (Objects.nonNull(project.getGeneratorSource())) { continue; } for (Folder folder : project.getFolders()) { if (folder.equals(project.getGeneratorSource())) { continue; } else if (folder.getDerivative().isPresent() || folder.getDpi().isPresent() || folder.getImageScale().isPresent() || folder.getImageSize().isPresent()) { return true; } } } return false;

But isn’t that just a matter of habit? Therefore my question is more a question of assessment:

Should Stream be something basic to use (should Java beginners learn for loops at all/first, or should they first learn to use streams), or is that too much premature optimization, and one should use streams rather after it is shown that a code makes a bottleneck.

(Less opinion-based: How do modern Java teaching books (are there still books in IT training?) handle this?)

Edit 1: To prevent the question from losing focus: My question is: Is Stream is intended as a basic programming paradigm every Java programmer should use often, or as a performance feature for senior software enhancers, that should be avoided? You can also look at it internally: Can the JVM decide whether to use a stream construction or process the objects sequentially (e.g. may it only do that if it detects a hotspot?), or does it have to set up complicated parallelization logic in the background each time, with multiple threads, so that this could produce a lot of overhead? In the second case that would be a clear argument to me against using Stream all and everywhere.

Your last code example is far easier to read than the others. — Robert Harvey, CommentedMay 27, 2020 at 15:38
lambda expressions should have single-character (like "s", "t" or "_") identifiers. "lambda" is a bad choice. — Thorbjørn Ravn Andersen, CommentedOct 11, 2023 at 21:41

JimmyJames supports Canada · Accepted Answer · 2020-05-26 18:55:57Z

Is Stream is intended as a basic programming paradigm every Java programmer should use often, or as a performance feature for senior software enhancers, that should be avoided?

I think there's another option that you are not seeing. The primary benefit of Streams (in my humble opinion) is that it's a paradigm for concurrent and parallel algorithms that is much easier to get right.

That is, it's not intended to replace loops in general (although this is a somewhat popular opinion.) But it's not really intended only for experts either. It's intended to allow average developers implement approaches that in the past, only experts could manage.

To this point, a lot of people will use parallel stream to implement multi-threading. This can be fine and it might even lead to an optimal solution but it's actually much less sophisticated than the existing concurrency libraries. And I don't mean just that the code is simpler to write. I mean that the capabilities of the concurrency libraries offer a vast array of design options beyond what the streams API provides. The problem is that you need to have a really strong understanding of concurrency, the Java memory model, and an extensive library.

So ultimately, I don't see streams as an advanced option but one designed for beginners.

Having said that, there are a lot of functional capabilities in the streams API that are better than their for-loop equivalent. Mapping and filtering are much cleaner and provide much more reuse options than e.g. if statements buried in a loop. Unfortunately, Java doesn't make streams and loops work well together by default. There's a reason for that but it tends to lead people on a path of 'streams or bust' and IMO, reduces the readability of code in many circumstances.

Caleth · Accepted Answer · 2020-05-26 10:29:35Z

A reason to prefer Streams over for is that for does everything. You get to use different names for different operations, rather than having to recognise a pattern spread across tens of lines.

The problem with both versions is doing everything in one method. You are missing functions like

Stream<T> <T> notNull(Stream<T> objects) { return objects.filter(o -> o != null); } Stream<Pair<Folder, Folder>> getFolderPairs(Project project) { return project.getFolders().stream() .map(folder -> Pair.of(folder, project.getGeneratorSource()) ); } boolean isNotSourceFolder(Pair<Folder, Folder> folders) { return !folders.getLeft().equals(folders.getRight()); } Stream<Folder> subfoldersFor(Stream<Project> projects) { return notNull(projects) .flatMap(getFolderPairs) .filter(isNotSourceFolder) .map(Pair::getLeft) } boolean canGenerateImages(Folder folder) { return (folder.getDerivative().isPresent() || folder.getDpi().isPresent() || folder.getImageScale().isPresent() || folder.getImageSize().isPresent()) }

Note that I use sensible names for the parameters of my lambda expressions, rather than lambda everywhere.

Using those, your original function becomes much simpler, even with a formatter that wants things together.

return subfoldersFor(projects.parallelStream()) .anyMatch(canGenerateImages);

This is close to what we ended up with, but my question is more in the direction of whether Stream is intended as a basic programming paradigm or as a performance feature for senior software enhancers. — Matthias Ronge, CommentedMay 26, 2020 at 10:09

Ralf Kleberhoff · Accepted Answer · 2020-05-27 11:56:44Z

Streams are a fine addition to the Java ecosystem, but they cannot replace the traditional for loop.

Limited access to enclosing variables

The main limitation is that code inside lambdas cannot fully access variables of the enclosing method. And that cannot be overcome as lambdas are in fact shorthands for instances of anonymous functional classes. Code in a lambda expression technically does not reside in the enclosing scope, but in a completely different class.

Readability

I hav come to associate streams-based code with poor readability, and that drawback mainly comes from the habit of chaining all stream operations into one, huge expression.

In traditional, non-streams programming, we regard such a chaining of many method calls a code smell, but with streams we happily (?) write expressions occupying lines and lines of code.

Why don't we split these expressions into intellegible parts, giving the intermediate streams meaningful names as local variables, with the benefit of seeing the stream type in the variable declarations? Or group together a few streams operations into a method with a meaningful name? There's nothing forcing us to write streams expressions as single, monolithic blocks.

The reason probably is that most tutorials lead us into that direction, never showing streams expressions broken into manageable pieces.

Debugging

And then there's debugging. Streams-based code is much harder to debug:

The actual execution is triggered by the final collecting step of the operations chain, so e.g. single-stepping through the code is virtually impossible.
Stack traces don't resemble the code structure. There are lots of synthetic method calls between the enclosing method and the business logic inside the stream operations. This not only affects debugging sessions, but also production-time error reporting. The highly-appreciated stack trace logging loses a lot of its value.

Performance

And finally, we have a performance issue. Even with the best optimizers, the overhead of chaining together lots of distinct functional-interface instances has its price. It might not be significant in most situations, but it's there.

Conclusion

While the readability issue can be overcome by adopting a different coding style, there remain enough drawbacks, so streams can't adequately replace traditional for loops.

Jamy Spencer · Accepted Answer · 2023-10-10 17:02:04Z

No streams are not a drop in replacement for loops. Streams, especially when used with parallelStream obviously create multiple threads to process the data on. This is an expensive process that should only be used when needed...and its probably a bigger number than you think. It depends a lot on the type of transform being done in the stream, whether its processing intensive and/or io intensive but more likely than not the collection being operated on needs to be > 1k for it to not be SLOWER to use parallelStream. The limited testing I've done showed that in that use case less than 1000 items were always slower to use parallelStream over loops and it was 10k items before any meaningful performance increase was seen. But its super dependent on the data you have and architecture its deployed on. A huge monolith with 50 cores at your disposal will be different than on a micro ec2 instance.

But lets change focus really to why streams. Processing amounts of data that will possibly not fit in memory. Thats why they were created and thats where they should be used. Have a query that could return a TB of data, you definitely need streams, and have to know how to use them.

There is something to be said for using the stream api in most cases to help future-proof the application when it becomes apparent that the data invloved is so large that parallelizing the processing is needed and for the whole team to become more familiar with the api. But unless there is really at least one senior dev that understands the api and has the time for code reviews and giving helpful guidance it really just gives devs the chance to learn anti-patterns.

ANTI_PATTERNS:

Storing intermediate transforms in non-stream collections(this now stores the whole thing in memory, eliminating the main benefit of allowing processing of large amounts of data that may not fit in memory)
Streaming and especially parallelizing inherently limited datasets. I saw someone throw a single item...and something that would always be a single item in a list and stream, parallelize it...
Its already been mentioned but just for completeness, and I've seen a major bug in prod from this, the results are unsorted, and if you are in the big-data mindset where you should probably be if you are heavily using streams, you need to work around this completely because if you can't load all the data in memory, you can't sort it...at least in the context of doing transforms on data in motion which is what we are talking about with the streams api.

Stack Exchange Network

Is the Java Stream API intended to replace loops?

4 Answers 4

Linked

Hot Network Questions

Is the Java Stream API intended to replace loops?

4 Answers 4

Linked

Related

Hot Network Questions