# Basic Data Structures in R

The basic data structures in R can be divided in to two types

## Types of data structures

- Homogeneous
- All the elements in the data structure are of same type (string, number, boolean etc)
- Heterogeneous
- The elements in the data structure can be of mixed type.

## Basic Data structures in R

Here are the basic data structures. We will go through each of them in detail in subsequent tutorials.

Data structure | Type | Dimensions |
---|---|---|

Vector | Homogeneous | 1 |

List | Heterogeneous | 1 |

Matrix | Homogeneous | 2 |

Array | Homogeneous | any |

Data frame | Heterogeneous | 2 |

Note that there are no scalars in R. scalars can be thought of as vectors of length 1.

Lets look at data structures and understand how they differ

### Vector (atomic)

> vector("logical",2) [1] FALSE FALSE > vector("integer",2) [1] 0 0 > vector("numeric",2) [1] 0 0 > vector("double",2) [1] 0 0 > vector("character",2) [1] "" "" > vector("complex",2) [1] 0+0i 0+0i > vector("raw",2) [1] 00 00 >

An easy way to create a vector is to use the ‘c’ (combine) function. When different types of elements are used in a combine function, it coerces all elements to a single type based on this hierarchy NULL < raw < logical < integer < double < complex < character

> a=c(1,2) > is.atomic(a) [1] TRUE > typeof(a) [1] "double" > a=c("1",2) > typeof(a) [1] "character" > a [1] "1" "2"

use the ‘L’ suffix to create integers instead of double

> typeof(c(1L,2L)) [1] "integer" > typeof(c(1,2)) [1] "double"

To test whether the vector contains elements of a particular type use the *isxxxx* function. The major functions are

`is.character(), is.logical(), is.character(),is.numeric(),is.integer(),`

`is.complex(),is.raw(),is.double() and the generic is.atomic()`

> is.character(vector("character",2)) [1] TRUE > is.atomic(vector("character",2)) [1] TRUE

The

`typeof()`

method that we saw earlier determines the internal storage type of an object. You can explicitly convert elements of a vector from one type to another using the *asXXX()* function.

> a=c(1,2) > typeof(a) [1] "double" > b=as.character(a) > typeof(b) [1] "character" > b [1] "1" "2"

Lets look at ways to subset or retrieve elements from an atomic vector

*‘[‘ vs ‘[[‘*

[ ] can be used to retrieve a subset of the vector. It returns another list. Even if you want to retrieve a single element, it retrieves a list. ‘[[]]’ can be used to retrieve a single element. It cannot return more than one element.

> a=c(1,2,3) > a[1] [1] 1 > a[[1]] [1] 1 > a[1-3] [1] 1 3 > a[[1-3]] Error in a[[1 - 3]] : attempt to select more than one element

*Subset using integer*

In the previous example we saw how to subset using an integer. lets look at some more examples. We create a vector a of characters. We can retrieve any element using an integer index

> a=c('a','b','c','d','e') > a[3] [1] "c" We can retrieve a subset by specifying multiple indexes > a[c(1,4)] [1] "a" "d" > We can subset using a range > a[seq(1:3)] [1] "a" "b" "c" We can give it a negative index to 'exclude' that element. > a[-3] [1] "a" "b" "d" "e" this excludes the third element. you can't include both negative and positive numbers > a[c(-2,1)] Error in a[c(-2, 1)] : only 0's may be mixed with negative subscripts You could also give it a mathematical expression > a[4/2] [1] "b" > a[4/3] [1] "a" A missing value in index is converted to a missing value in the output > a[c(1,NA)] [1] "a" NA

*Subset using boolean*

We could use boolean values to subset. Lets look at some examples

> a[c(TRUE,FALSE,TRUE,TRUE,FALSE)] [1] "a" "c" "d" This returns elements at TRUE positions. You can also use the shortcut T and F > a[c(TRUE,FALSE,TRUE,T,F)] [1] "a" "c" "d" Logical lists are recycled > a[c(T,F)] [1] "a" "c" "e" T,F is recycled to T,F,T,F,T

### List

*list()*function.

> a=list(1,TRUE,T,'a',2.3) > str(a) List of 5 $ : num 1 $ : logi TRUE $ : logi TRUE $ : chr "a" $ : num 2.3

some things to note

- we have created a list containing elements of different types.
- logical elements can be represented as TRUE as well as T
- the
`str()`

function can be used to compactly display the internals of an R object.

A list can contain other lists.

> b=list(a,5,6) > str(b) List of 3 $ :List of 5 ..$ : num 1 ..$ : logi TRUE ..$ : logi TRUE ..$ : chr "a" ..$ : num 2.3 $ : num 5 $ : num 6

something very interesting happens when we convert the list to an atomic vector. It ‘unfolds’ all inner lists recursively and returns as a single atomic vector. Use the function

`unlist()`

to convert a list to an atomic vector

> unlist(b) [1] "1" "TRUE" "TRUE" "a" "2.3" "5" "6"

Note that as.vector() does not convert list to an atomic vector, since a list is already a vector (With mode ‘list’). You can convert any other data structure to a list using the as.list() function.

List (as well as atomic vectors) allows names for all elements. Lets look at an example

> a=list(apple=1,orange=2) > str(a) List of 2 $ apple : num 1 $ orange: num 2 >

apple and orange are names. All the names can be retrieved using the function attributes(). Note that the names are not actually attributes of a list but are just reported as one.

> attributes(a) $names [1] "apple" "orange"

This is how you remove the names from a list

> attributes(a) <- NULL > str(a) List of 2 $ : num 1 $ : num 2

you can use is.list() to check if the object is a list.R also has a concept of Pairlist. These are internally represented as a linked list instead of a vector.

#### Attributes

lists can have attributes. Attributes are additional information that can be stored with the object. Lets look at an example

> a=list(apple=1,orange=2) > attr(a,"type") <- 'fruit' > a $apple [1] 1 $orange [1] 2 attr(,"type") [1] "fruit"

Here we have added an attribute called type to the list. We say that this list contains elements of type fruit. the

`attr`

function can be used to get or set an attribute

Look at subsetting in atomic vectors for common subsetting operations. Here we describe operations specific to a list

we Use the following list of the examples > a=c('a','b','c','d','e') > b=list('alphabets'=a,'fruits'=c('apple'=1,'orange'=2)) > b $alphabets [1] "a" "b" "c" "d" "e" $fruits apple orange 1 2

*Subset using $*

You can use the $ to subset a list using its name

> b$alphabets [1] "a" "b" "c" "d" "e"