Java 8 – Streams

Introduction

In the first article we learned about Lambdas, functional interfaces and method references introduced in Java 8. In the previous article we saw some of the new methods added in the Collections hierarchy. In this article we look at what is probably the most important addition to Java since generics – Streams. Streams make working on collections easier and makes parallel programming on collections ridiculously easy.

What are streams?

Lets start with an example. In this example we create a list of genre and then find the count of all genre names that start with an ‘r’. The single line of code (line 2) does that. We will explain how it works, but this gives an idea of the power of Streams.

        List<String> genre = new ArrayList<String>(Arrays.asList("rock", "pop", "jazz", "reggae"));
		long a = genre.stream().filter(s -> s.startsWith("r")).count();
		System.out.println(a);
		

so what are streams?
Stream is a sequence of objects or primitive types. Operations can be performed on the elements sequentially or in parallel. Let us look at the lifecycle of a stream
Java 8 Streams
A stream is created from a source, which can be an array, a collection, IO channel etc. Once you create a stream you can perform aggregate operations on it. We will look at all the major functions below. In our example above we performed a filter operation on the stream. The last step in the lifecyle is called a terminal operation. This step results in a result or a side effect. The stream no longer exists after the terminal operation (and hence the name..). The terminal operation can be a count operation or a collect operation. There are certain operations that are called short-circuit terminal operations. The example below explains it.

Remember:Streams have a lifecycle which consists of creation, intermediate operations and terminal operation.

Streams can be created from Collections using the default Stream<E> stream() method in the Collection interface. For arrays use the Arrays.stream(T[] array) method.

Stream Operations

In this section we look at the common stream operations. For the example we use a List called genre which stores the music genre

List<String> genre = new ArrayList<String<(Arrays.asList("rock", "pop", "jazz", "reggae"));

The examples below also explain the concept of short-ciruit, non-interference, statelesness, reduction etc. Here are the operations (click to expand/collapse):

Returns true if all values in the Collection return true for the passed lambda expression (predicate)

genre.stream().allMatch(s -> !s.isEmpty())

This returns true since none of the genre string is empty. This is a Short-circuiting terminal operation. Look at the anyMatch example to understand short circuit operations.

Returns true if any value in the Collection returns true for the passed lambda expression (predicate). All elements may not need to be analysed.

genre.stream().anyMatch(s -> s.indexOf("r") == 0)

This returns true since the first element begins with ‘r’ (even though all of them dont begin with ‘r’).In order to understand what that is, lets write an example.

						System.out.println(genre.stream().peek(s->System.out.println(s)).anyMatch(s -> s.indexOf("r") == 0)); 
						System.out.println(genre.stream().peek(s->System.out.println(s)).count());
						

We have used the peek operation here, what it does is for each element performs an operation specified by the lambda expression (Consumer), in this case it just prints the genre. line 2 does what we expect, i.e. it prints “rock”,”pop”,”jazz”,”reggae” in separate lines. But line 1 prints only “rock” and “true”. what has happened is that the anyMatch found the match in the first element (rock starts with r) and it terminated the operation. The intermediate operation (peek) had its stream terminated too. To put it in words ‘a short circuit operation makes only those parts of the stream available to its predecessor operations that it needs to process’. In our example the anyMatch needs to process only one element and so only one element is available to the peek operation.

returns a string that contains distinct objects. We modify the genre to contain duplicates

						List<String> genre = new ArrayList<String>(Arrays.asList("rock", "pop", "jazz", "reggae","pop"));

System.out.println(genre.stream().distinct().count()); // prints 4

This is a stateful intermediate operation which means that while moving through the stream the operation maintains a state. In our example the operation would need to know that “pop” has appreared in the stream before.

returns the number of elements in the Stream. This is a specialized reduction operation and a terminal operation which produces a result without any side effect.

This operation is an intermediate operation and keeps those elements in the stream that match the predicate.

System.out.println(genre.stream().filter(s -> s.length() <= 4).count());

except reggae all other genres pass the filter and hence this prints 4. The operation is stateless in the sense that it stores no information about the element. It is also ‘non-interfering’ since it does not modify the original datasource. (the list genre does not change)

calls the Consumer lambda expression for each element of the stream. The order in which elements are called is not fixed since this operation also works on a parallel stream. The elements are not modified.

genre.stream().forEach(System.out::println);

Optional<String> combinedgenre = genre.stream().reduce((b, c) -> b.concat(",").concat(c));

combinedgenre contains comma separated list of genre strings. This is a reduction function since it reduces all the elements to a single summary result by applying the lambda expression repeatedly. If the function is associative reduction works well in parallel stream too. An associate function obeys this :

f(a,f(a,b)) = f(f(a,b),c)

int d = genre.stream().reduce(0, (b, c) -> b + c.length(), (b, c) -> b + c);

In this example we are calculating the total length of all words in the genre List. This is the generalised form of the reduce function. It takes in 3 arguments. The first argument is the identity function for the combiner. Think of it as the initial value. The accumulator accumulates the element into the resulting value (as in the previous example). The combiner can be understood when seen in context of parallel operation. It combines the result from the parallel streams.

HashSet<String> genreSet = genre.stream().collect(() -> new HashSet<String>(), (b, c) -> b.add(c), (b, c) -> b.addAll(c));
						

The collect function puts the elements of a stream in a mutable container such as a HashSet. In this example we create a stream from the genre ArrayList and then collect the result in a HashSet. This is a mutable reduction operation since the results are collected in a mutable container. HashSet is a mutable container. In the earlier reduce operation the result was a String which is immutable. The collect function takes in three arguments. The first function creates the collector, in our case the HashSet. The second function appends an element to the collector. The last function combines two results from parallel streams.

The map method applies the function to all elements of the stream. The output is a stream with modified elements.

genre.stream().map(String::toUpperCase).forEach(System.out::println);

In the above example we convert all strings in the genre list to uppercase.

A flatmap takes a stream, applies a function that converts each element to another stream and then merges the substreams (one for each element of the main stream) into a single stream. It can more easily be demonstrated by an example. lets say that we have a map that gives us the list of artist for each genre and we want to create a stream that merges all the artists from all the streams into a single stream.

Map<String, List<String>> artists = new HashMap<String, List<String>>();
artists.put("rock", new ArrayList<String>(Arrays.asList("rockArtistA", "rockArtistB")));
artists.put("pop", new ArrayList<String>(Arrays.asList("popArtistA", "popArtistB")));
artists.put("jazz", new ArrayList<String>(Arrays.asList("jazzArtistA", "jazzArtistB")));
artists.put("reggae", new ArrayList<String>(Arrays.asList("reggaeArtistA", "reggaerockArtistB")));
genre.stream().flatMap(s -> artists.get(s).stream()).forEach(s -> System.out.print(" " + s));
// prints rockArtistA rockArtistB popArtistA popArtistB jazzArtistA jazzArtistB
//reggaeArtistA reggaerockArtistB popArtistA popArtistB
						

Before we wrap up lets formally introduce the concept of lazy evaluation

Lazy evaluation

Streams introduce the concept of lazy evaluation. The intermediate operations that you saw are not evaluated till a terminal operation is called. So lets say that you have a chain of operations with mutliple intermediate operations. The intermediate operations are evaluated only when the terminal function is called. So if you create a stream, call some intermediate operations on it and call the terminal operation later somewhere in the code then nothing happens till that terminal operation is called.

Remember:The intermediate operations are not evaluated till a terminal operation is called.

This concludes our introduction to stream. We have covered the major functions here, there are many more functions but many of them are mostly variations of the above. Most of them can be perfomed in parallel as well for which instread of creating a stream, create a parallelStream(). The power of streams lies in making parallel programming trivial. However, note that parallel programming introduces overhead and certain time synchronization problems. Use it only if you are sure about it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.