Beware Of findFirst() And findAny()

After filtering a Java 8 Stream it is common to use findFirst() or findAny() to get the element that survived the filter. But that might not do what you really meant and subtle bugs can ensue.

So What’s Wrong With findFirst() And findAny()?

As we can see from their Javadoc (here and here) both methods return an arbitrary element from the stream – unless the stream has an encounter order, in which case findFirst() returns the first element. Easy.

A simple example looks like this:

Of course this is just the fancy version of the good old for-each-loop:

But both variants contain the same potential bug: they are built on the implicit assumption that there can only be one customer with any given ID.

Now, this might be a very reasonable assumption. Maybe this is a known invariant, guarded by dedicated parts of the system, relied upon by others. In that case this is totally fine.

Often the code relies on a unique matching element but does nothing to assert this.

But in many cases I see out in the wild, it is not. Maybe the customers were just loaded from an external source that makes no guarantees about the uniqueness of their IDs. Maybe an existing bug allowed two books with the same ISBN. Maybe the search term allows surprisingly many unforeseen matches (did anyone say regular expressions?).

Often the code’s correctness relies on the assumption that there is a unique element matching the criteria but it does nothing to enforce or assert this.

Worse, the misbehavior is entirely data-driven, which might hide it during testing. Unless we have this scenario in mind, we might simply overlook it until it manifests in production.

Even worse, it fails silently! If the assumption that there is only one such element proves to be wrong, we won’t notice this directly. Instead the system will misbehave subtly for a while before the effects are observed and the cause can be identified.

So of course there is nothing inherently wrong with findFirst() and findAny(). But it is easy to use them in a way that leads to bugs within the modeled domain logic.

Failing Fast

So let’s fix this! Say we’re pretty sure that there’s at most one matching element and we would like the code to fail fast if there isn’t. With a loop we have to manage some ugly state and it would look as follows:

Now, streams give us a much nicer way. We can use the often neglected reduce, about which the documentation says:

Performs a reduction on the elements of this stream, using an associative accumulation function, and returns an Optional describing the reduced value, if any. This is equivalent to:

but is not constrained to execute sequentially.

Doesn’t that look similar to our loop above?! Crazy coincidence…

So all we need is an accumulator that throws the desired exception as soon as it is called:

This looks a little strange but it does what we want. To make it more readable, we should put it into a Stream utility class and give it a nice name:

Now we can call it as follows:

How is that for intention revealing code?

This will materialize the entire stream.

It should be noted that, unlike findFirst() and findAny(), this is of course no short-circuiting operation and will materialize the entire stream. That is, if there is indeed only one element. The processing of course stops as soon as a second element is encountered.

Context

There is actually a question on StackOverflow about this precise use case and it has a lot of interesting answers. The alternative I like the most, is a collector that throws when it encounters a second argument. With it the code would look as follows:

It is also worth noting that Guava has a similar functionality for iterators, namely Iterables::getOnlyElement, which returns the only element from the specified Iterable. It behaves different for an empty iterable, though, where it throws a NoSuchElementException. (.NET’s Enumerable::Single does the same, by the way.)

Reflection

We have seen how findFirst() and findAny() do not suffice to express the assumption that there is at most one element left in the stream. If we want to express that assumption and make sure the code fails fast if it is violated, we need to reduce(toOnlyElement()).

You can find the code on GitHub and use it as you like – it is in the public domain.

Thanks to Boris Terzic for making me aware of this intention mismatch in the first place.

Share & Follow

You liked this post? Then share it with your friends and followers!
twittergoogle_plusredditlinkedin
And if you like what I'm writing about, why don't you follow me?
twittergoogle_plusrss

Other Posts