mapply and by functions in R

In the previous tutorial we looked at the apply group of functions. In this example we look at mapply and by functions.

mapply
Its a bit difficult to explain the mapply function in words so we directly jump into an example and provide a definition later on.

> mapply(function(x,y){x^y},x=c(2,3),y=c(3,4))
[1]  8 81

Requires explanation, doesn’t it? So here’s how it goes – the first argument is the function FUN. It takes in two parameters x and y. The values of x come from the second argument (x=c(2,3)) and the values of y come from the 3rd argument (y=c(3,4)). x and y both have two values so the function is called twice. The first time its called with the first values of x and y (x=2 and y =3 which gives 8). The second time its called with the second values of x and y (x=3 and y=4 which gives 81)

Definition of mapply function

As promised, here is the formal definition – mapply can be used to call a function FUN over vectors or lists one index at a time. In other words the function is first called over elements at index 1 of all vectors or list, its then called over all elements at index 2 and so on.

The arguments x and y are recycled if they are of different lengths. (however they have to be either all 0 or all non zero)

# the values in y are recycled. 
# i.e. for both the values in x the same value (4) of y is used.
> mapply(function(x,y){x^y},x=c(2,3),y=c(4))
[1] 16 81

You can’t do this though

> mapply(function(x,y){x^y},x=c(2,3,6),y=c())
Error in mapply(function(x, y) { : 
  zero-length inputs cannot be mixed with those of non-zero length

Its not necessary to specify names

> mapply(function(x,y){x^y},c(2,3),c(3,4))
[1]  8 81

We can give names to each index. The names from the first argument is used.

> mapply(function(x,y){x^y},c(a=2,b=3),c(A=3,B=4))
 a  b 
 8 81

unless you specifically ask R to not use names

 > mapply(function(x,y){x^y},c(a=2,b=3),c(A=3,B=4),USE.NAMES=FALSE)
[1]  8 81

If the function needs more arguments that remain same for all the iterations of FUN then use “MoreArgs” argument

 > mapply(function(x,y,z,k){(x+k)^(y+z)},c(a=2,b=3),c(A=3,B=4),MoreArgs=list(1,2))
   a    b 
 256 3125 

The values z and k are 1 and 2 respectively. So the first evaluation of function gives (2+2)^(3+1) and the second gives (3+2)^(4+1)

As with the other apply functions you can use Simplify to reduce the result to a vector, matrix or array

by

The by function is similar to apply function but is used to apply functions over data frame or matrix. We first create a data frame for this example.

# the data frame df contains two columns a and b
> df=data.frame(a=c(1:15),b=c(1,1,2,2,2,2,3,4,4,4,5,5,6,7,7))

We use the by function to get sum of all values of a grouped by values of b. That is, sum of all values of a where b=1, sum of all values of a where b is 2 and so on.

> by(df,factor(df$b),sum)

The by function takes 3 variables. The first is the data frame. The second is the factors over which the function has to be applied. The length of this argument should be same as the length of the data frame. The third is the actual function. This is what it produces

factor(df$b): 1
[1] 5
------------------------------------------------------------ 
factor(df$b): 2
[1] 26
------------------------------------------------------------ 
factor(df$b): 3
[1] 10
------------------------------------------------------------ 
factor(df$b): 4
[1] 39
------------------------------------------------------------ 
factor(df$b): 5
[1] 33
------------------------------------------------------------ 
factor(df$b): 6
[1] 19
------------------------------------------------------------ 
factor(df$b): 7
[1] 43

Even if the data frame has multiple columns the function works well.

> df=data.frame(a=c(1:15),k=c(1:15),b=c(1,1,2,2,2,2,3,4,4,4,5,5,6,7,7))
> by(df,factor(df$b),sum)
factor(df$b): 1
[1] 8
------------------------------------------------------------ 
factor(df$b): 2
[1] 44
------------------------------------------------------------ 
..... [truncated]

Leave a Comment