Basic Data Structures in R

The basic data structures in R can be divided in to two types

Types of data structures

Homogeneous
All the elements in the data structure are of same type (string, number, boolean etc)
Heterogeneous
The elements in the data structure can be of mixed type.

Basic Data structures in R

Here are the basic data structures. We will go through each of them in detail in subsequent tutorials.

Data structure Type Dimensions
Vector Homogeneous 1
List Heterogeneous 1
Matrix Homogeneous 2
Array Homogeneous any
Data frame Heterogeneous 2

Note that there are no scalars in R. scalars can be thought of as vectors of length 1.

Lets look at data structures and understand how they differ

Vector (atomic)

atomic vector is the simplest data structure and can be thought of as a sequence of ordered elements. All elements in the vector have to be of the same type (homogeneous). They can be logical (boolean), integer, numeric (double), complex, character and raw. The logical elements are initialized to ‘FALSE’, the numeric elements are initialized to ‘0’, character elements are initialized to ‘””‘ , raw vectors are initialized to ‘nul’ bytes. Atomic vectors cannot have names (unlike list) or any other attributes. Lets look at some examples

An easy way to create a vector is to use the ‘c’ (combine) function. When different types of elements are used in a combine function, it coerces all elements to a single type based on this hierarchy NULL < raw < logical < integer < double < complex < character

use the ‘L’ suffix to create integers instead of double

To test whether the vector contains elements of a particular type use the isxxxx function. The major functions are
is.character(), is.logical(), is.character(),is.numeric(),is.integer(),
is.complex(),is.raw(),is.double() and the generic is.atomic()

The
typeof()
method that we saw earlier determines the internal storage type of an object. You can explicitly convert elements of a vector from one type to another using the asXXX() function.

Lets look at ways to subset or retrieve elements from an atomic vector

‘[‘ vs ‘[[‘

[ ] can be used to retrieve a subset of the vector. It returns another list. Even if you want to retrieve a single element, it retrieves a list. ‘[[]]’ can be used to retrieve a single element. It cannot return more than one element.

Subset using integer

In the previous example we saw how to subset using an integer. lets look at some more examples. We create a vector a of characters. We can retrieve any element using an integer index

Subset using boolean

We could use boolean values to subset. Lets look at some examples

List

List can allow elements of different types. (However, note that list are internally stored as vectors). The easiest way to create a list is to use the list() function.

some things to note

  • we have created a list containing elements of different types.
  • logical elements can be represented as TRUE as well as T
  • the str() function can be used to compactly display the internals of an R object.

A list can contain other lists.

something very interesting happens when we convert the list to an atomic vector. It ‘unfolds’ all inner lists recursively and returns as a single atomic vector. Use the function
unlist()
to convert a list to an atomic vector

Note that as.vector() does not convert list to an atomic vector, since a list is already a vector (With mode ‘list’). You can convert any other data structure to a list using the as.list() function.

List (as well as atomic vectors) allows names for all elements. Lets look at an example

apple and orange are names. All the names can be retrieved using the function attributes(). Note that the names are not actually attributes of a list but are just reported as one.

This is how you remove the names from a list

you can use is.list() to check if the object is a list.R also has a concept of Pairlist. These are internally represented as a linked list instead of a vector.

Attributes

lists can have attributes. Attributes are additional information that can be stored with the object. Lets look at an example

Here we have added an attribute called type to the list. We say that this list contains elements of type fruit. the
attr
function can be used to get or set an attribute

Look at subsetting in atomic vectors for common subsetting operations. Here we describe operations specific to a list

Subset using $

You can use the $ to subset a list using its name

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.