Basic Data Structures in R

The basic data structures in R can be divided in to two types

Types of data structures

Homogeneous
All the elements in the data structure are of same type (string, number, boolean etc)
Heterogeneous
The elements in the data structure can be of mixed type.

Basic Data structures in R

Here are the basic data structures. We will go through each of them in detail in subsequent tutorials.

Data structure Type Dimensions
Vector Homogeneous 1
List Heterogeneous 1
Matrix Homogeneous 2
Array Homogeneous any
Data frame Heterogeneous 2

Note that there are no scalars in R. scalars can be thought of as vectors of length 1.

Lets look at data structures and understand how they differ

Vector (atomic)

atomic vector is the simplest data structure and can be thought of as a sequence of ordered elements. All elements in the vector have to be of the same type (homogeneous). They can be logical (boolean), integer, numeric (double), complex, character and raw. The logical elements are initialized to ‘FALSE’, the numeric elements are initialized to ‘0’, character elements are initialized to ‘””‘ , raw vectors are initialized to ‘nul’ bytes. Atomic vectors cannot have names (unlike list) or any other attributes. Lets look at some examples

> vector("logical",2)
[1] FALSE FALSE
> vector("integer",2)
[1] 0 0
> vector("numeric",2)
[1] 0 0
> vector("double",2)
[1] 0 0
> vector("character",2)
[1] "" ""
> vector("complex",2)
[1] 0+0i 0+0i
> vector("raw",2)
[1] 00 00
> 

An easy way to create a vector is to use the ‘c’ (combine) function. When different types of elements are used in a combine function, it coerces all elements to a single type based on this hierarchy NULL < raw < logical < integer < double < complex < character

> a=c(1,2)
> is.atomic(a)
[1] TRUE
> typeof(a)
[1] "double"

> a=c("1",2)
> typeof(a)
[1] "character"
> a
[1] "1" "2"

use the ‘L’ suffix to create integers instead of double

> typeof(c(1L,2L))
[1] "integer"
> typeof(c(1,2))
[1] "double"

To test whether the vector contains elements of a particular type use the isxxxx function. The major functions are
is.character(), is.logical(), is.character(),is.numeric(),is.integer(),
is.complex(),is.raw(),is.double() and the generic is.atomic()

> is.character(vector("character",2))
[1] TRUE
> is.atomic(vector("character",2))
[1] TRUE

The
typeof()
method that we saw earlier determines the internal storage type of an object. You can explicitly convert elements of a vector from one type to another using the asXXX() function.

> a=c(1,2)
> typeof(a)
[1] "double"
> b=as.character(a)
> typeof(b)
[1] "character"
> b
[1] "1" "2"

Lets look at ways to subset or retrieve elements from an atomic vector

‘[‘ vs ‘[[‘

[ ] can be used to retrieve a subset of the vector. It returns another list. Even if you want to retrieve a single element, it retrieves a list. ‘[[]]’ can be used to retrieve a single element. It cannot return more than one element.

> a=c(1,2,3)
> a[1]
[1] 1
> a[[1]]
[1] 1
> a[1-3]
[1] 1 3
> a[[1-3]]
Error in a[[1 - 3]] : attempt to select more than one element

Subset using integer

In the previous example we saw how to subset using an integer. lets look at some more examples. We create a vector a of characters. We can retrieve any element using an integer index

> a=c('a','b','c','d','e')
> a[3]
[1] "c"
We can retrieve a subset by specifying multiple indexes
> a[c(1,4)]
[1] "a" "d"
> 
We can subset using a range
> a[seq(1:3)]
[1] "a" "b" "c"
We can give it a negative index to 'exclude' that element. 
> a[-3]
[1] "a" "b" "d" "e"
this excludes the third element.
you can't include both negative and positive numbers
> a[c(-2,1)]
Error in a[c(-2, 1)] : only 0's may be mixed with negative subscripts
You could also give it a mathematical expression
> a[4/2]
[1] "b"
> a[4/3]
[1] "a"
A missing value in index is converted to a missing value in the output
> a[c(1,NA)]
[1] "a" NA 	

Subset using boolean

We could use boolean values to subset. Lets look at some examples

> a[c(TRUE,FALSE,TRUE,TRUE,FALSE)]
[1] "a" "c" "d"
This returns elements at TRUE positions.
You can also use the shortcut T and F
> a[c(TRUE,FALSE,TRUE,T,F)]
[1] "a" "c" "d"
Logical lists are recycled
> a[c(T,F)]
[1] "a" "c" "e"
T,F is recycled to T,F,T,F,T

List

List can allow elements of different types. (However, note that list are internally stored as vectors). The easiest way to create a list is to use the list() function.

> a=list(1,TRUE,T,'a',2.3)
> str(a)
List of 5
 $ : num 1
 $ : logi TRUE
 $ : logi TRUE
 $ : chr "a"
 $ : num 2.3

some things to note

  • we have created a list containing elements of different types.
  • logical elements can be represented as TRUE as well as T
  • the str() function can be used to compactly display the internals of an R object.

A list can contain other lists.

> b=list(a,5,6)
> str(b)
List of 3
 $ :List of 5
  ..$ : num 1
  ..$ : logi TRUE
  ..$ : logi TRUE
  ..$ : chr "a"
  ..$ : num 2.3
 $ : num 5
 $ : num 6

something very interesting happens when we convert the list to an atomic vector. It ‘unfolds’ all inner lists recursively and returns as a single atomic vector. Use the function
unlist()
to convert a list to an atomic vector

> unlist(b)
[1] "1"    "TRUE" "TRUE" "a"    "2.3"  "5"    "6" 

Note that as.vector() does not convert list to an atomic vector, since a list is already a vector (With mode ‘list’). You can convert any other data structure to a list using the as.list() function.

List (as well as atomic vectors) allows names for all elements. Lets look at an example

> a=list(apple=1,orange=2)
> str(a)
List of 2
 $ apple : num 1
 $ orange: num 2
> 

apple and orange are names. All the names can be retrieved using the function attributes(). Note that the names are not actually attributes of a list but are just reported as one.

> attributes(a)
$names
[1] "apple"  "orange"

This is how you remove the names from a list

> attributes(a) <- NULL
> str(a)
List of 2
 $ : num 1
 $ : num 2

you can use is.list() to check if the object is a list.R also has a concept of Pairlist. These are internally represented as a linked list instead of a vector.

Attributes

lists can have attributes. Attributes are additional information that can be stored with the object. Lets look at an example

> a=list(apple=1,orange=2)
> attr(a,"type") <- 'fruit'
> a
$apple
[1] 1

$orange
[1] 2

attr(,"type")
[1] "fruit"

Here we have added an attribute called type to the list. We say that this list contains elements of type fruit. the
attr
function can be used to get or set an attribute

Look at subsetting in atomic vectors for common subsetting operations. Here we describe operations specific to a list

we Use the following list of the examples
> a=c('a','b','c','d','e')
> b=list('alphabets'=a,'fruits'=c('apple'=1,'orange'=2))
> b
$alphabets
[1] "a" "b" "c" "d" "e"

$fruits
 apple orange 
     1      2 

Subset using $

You can use the $ to subset a list using its name

> b$alphabets
[1] "a" "b" "c" "d" "e"

Leave a Comment