3 Data types and data structures

Learning objectives

  1. Understand the differences between classes, objects and data types in R
  2. Create objects of different types
  3. Subset and index objects
  4. Learn and use vectorized operations

3.1 Data Types

3.2 Data Structures

3.3 Vectors

Key points:
- Can only contain objects of the same class
- Most basic type of R object
- Variables are vectors

3.3.1 Numeric

They store numbers as double, and it is stored with decimals. The term double refers to the number of bytes required to store it. Each double is accurate up to 16 significant digits.

Creating a numeric vector using c()

x <- c(0.3, 0.1)
x
## [1] 0.3 0.1

Using the vector() function

x <- vector(mode = "numeric",length = 10)
x
##  [1] 0 0 0 0 0 0 0 0 0 0

Using the numeric() function

x <- numeric(length = 10)
x
##  [1] 0 0 0 0 0 0 0 0 0 0

Creating a numeric vector with a sequence of numbers

# x <- seq(1,10,1)
# x

x <- seq(1,10,2)
x
## [1] 1 3 5 7 9
x <- rep(2,10)
x
##  [1] 2 2 2 2 2 2 2 2 2 2

Check length of vector with length()

x
##  [1] 2 2 2 2 2 2 2 2 2 2
length(x)
## [1] 10
y <- rep(2,5)
y
## [1] 2 2 2 2 2
length(y)
## [1] 5
length(x) == length(y)
## [1] FALSE

3.3.2 Integer

They store numbers that can be written without a decimal component.

Creating an integer vector using c()

x <- c(1L,2L,3L,4L,5L)  
x
## [1] 1 2 3 4 5

Creating an integer vector of a sequences of numbers

x <- 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10

3.3.3 Logical

Creating a logical vector with c()

x <- c(TRUE,FALSE,T,F)
x
## [1]  TRUE FALSE  TRUE FALSE

Creating a logical vector with vector()

x <- vector(mode = "logical",length = 5)
x
## [1] FALSE FALSE FALSE FALSE FALSE

Creating a logical vector using logical()

x <- logical(length = 10)
x
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

3.3.4 Character

x<-c("a","b","c")
x
## [1] "a" "b" "c"
x<-vector(mode = "character",length=10)
x
##  [1] "" "" "" "" "" "" "" "" "" ""
x<-character(length = 3)
x
## [1] "" "" ""

Some useful functions to modify strings

tolower(LETTERS)
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
toupper(letters)
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
paste(letters,1:length(letters),sep="_") # Note the implicit coercion
##  [1] "a_1"  "b_2"  "c_3"  "d_4"  "e_5"  "f_6"  "g_7"  "h_8"  "i_9"  "j_10" "k_11" "l_12" "m_13" "n_14" "o_15"
## [16] "p_16" "q_17" "r_18" "s_19" "t_20" "u_21" "v_22" "w_23" "x_24" "y_25" "z_26"

3.3.5 Vector attributes

The elements of a vector can have names

x<-1:5
names(x)<-c("one","two","three","four","five")
x
##   one   two three  four  five 
##     1     2     3     4     5
x<-logical(length = 4)
names(x)<-c("F1","F2","F3","F4")
x
##    F1    F2    F3    F4 
## FALSE FALSE FALSE FALSE

3.3.6 Built-in functions

To inspect the contents of a vector

is.vector(x) # Check if it is a vector
## [1] TRUE
is.na(x) # Check if it is empty
##    F1    F2    F3    F4 
## FALSE FALSE FALSE FALSE
is.null(x) # Check if it is NULL
## [1] FALSE
is.numeric(x) # Check if it is numeric
## [1] FALSE
is.logical(x) # Check if it is logical
## [1] TRUE
is.character(x) # Check if it is character
## [1] FALSE

To know what kind of vector you are working with

class(x) # Atomic class type
## [1] "logical"
typeof(x) # Object type or data structure (matrix, list, array...)
## [1] "logical"
str(x)
##  Named logi [1:4] FALSE FALSE FALSE FALSE
##  - attr(*, "names")= chr [1:4] "F1" "F2" "F3" "F4"

To know more about the data contained in the vector

Mathematical operations

sum(x)
## [1] 0
min(x) 
## [1] 0
max(x)
## [1] 0
x <- seq(1,10,1)
mean(x) 
## [1] 5.5
median(x) 
## [1] 5.5
sd(x)
## [1] 3.02765
log(x) 
##  [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101 2.0794415 2.1972246 2.3025851
exp(x)
##  [1]     2.718282     7.389056    20.085537    54.598150   148.413159   403.428793  1096.633158  2980.957987
##  [9]  8103.083928 22026.465795

Other operations

length(x)
## [1] 10
table(x)
## x
##  1  2  3  4  5  6  7  8  9 10 
##  1  1  1  1  1  1  1  1  1  1
summary(x)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    3.25    5.50    5.50    7.75   10.00

3.3.7 Vector Operations

x<-1:10
y<-11:20
x*2
##  [1]  2  4  6  8 10 12 14 16 18 20
x+y
##  [1] 12 14 16 18 20 22 24 26 28 30
x*y
##  [1]  11  24  39  56  75  96 119 144 171 200
x^y
##  [1] 1.000000e+00 4.096000e+03 1.594323e+06 2.684355e+08 3.051758e+10 2.821110e+12 2.326305e+14 1.801440e+16
##  [9] 1.350852e+18 1.000000e+20

3.3.8 Recycling

If one of the vectors is smaller than the other, operations are still possible. R will replicate the smaller vector to enable the operation to occur. IMPORTANT: if the larger vector is NOT a multiple of the smaller vector, the replication will still occur but will end at the length of the larger vector.

x<-1:10
y<-c(1,2,3)
x+y
## Warning in x + y: longer object length is not a multiple of shorter object length
##  [1]  2  4  6  5  7  9  8 10 12 11

3.3.8.1 Exercise

Calculate the sum of the following sequence of fractions:

x = 1/(1^2) + 1/(2^2) + 1/(3^2) + ... + 1/(n^2)

# n=100
sum(1/(1:100)^2)
## [1] 1.634984
# n=10000
sum(1/(1:10000)^2)
## [1] 1.644834

3.3.9 Indexing and subsetting

For this example, lets create a vector of random numbers from 1 to 100 of size 15.

x<-sample(x = 1:100,size = 15,replace = F) 
x
##  [1] 83 49 98 26 97 54 37 33  2 15 52 14 38 30 10

Using the index/position

x[1] # Get the first element
## [1] 83
x[13] # Get the thirteenth element
## [1] 38

Using a vector of indices

x[1:12] # The first 12 numbers
##  [1] 83 49 98 26 97 54 37 33  2 15 52 14
x[c(1,5,6,8,9,13)] # Specific positions only
## [1] 83 97 54 33  2 38
names(x) <- letters[1:length(x)]

x[c('a','c','d')]
##  a  c  d 
## 83 98 26

Using a logical vector

# Only numbers that are less than or equal to 10
x<10
##     a     b     c     d     e     f     g     h     i     j     k     l     m     n     o 
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
x[x>95] 
##  c  e 
## 98 97
# 
# # Only even numbers 
# x%%2 == 0
# x[x%%2 == 0]
x<10
##     a     b     c     d     e     f     g     h     i     j     k     l     m     n     o 
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
x[x<=10] # Only numbers that are less than or equal to 10
##  i  o 
##  2 10

Skipping elements using indices

x[c(-1, -5)]
##  b  c  d  f  g  h  i  j  k  l  m  n  o 
## 49 98 26 54 37 33  2 15 52 14 38 30 10

Skipping elements using names

x<-1:10
names(x)<-letters[1:10]
x[names(x) != "a"]
##  b  c  d  e  f  g  h  i  j 
##  2  3  4  5  6  7  8  9 10

3.3.9.1 Exercise

Find all the odd numbers in x

3.4 Lists

Key points:
- Can contain objects of multiple classes
- Extremely powerful when combined with some R built-in functions

Creating lists with different data types

l <- list(1:10, list("hello",'hi'), TRUE)
l
## [[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## [[2]]
## [[2]][[1]]
## [1] "hello"
## 
## [[2]][[2]]
## [1] "hi"
## 
## 
## [[3]]
## [1] TRUE

Assigning names as we create the list

l<-list(title = "Numbers", 
        numbers = 1:10, 
        logic = TRUE )
l
## $title
## [1] "Numbers"
## 
## $numbers
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $logic
## [1] TRUE
names(l)
## [1] "title"   "numbers" "logic"
l$numbers
##  [1]  1  2  3  4  5  6  7  8  9 10

3.4.1 Indexing and subsetting

Using [[]] instead of []

l[[1]]
## [1] "Numbers"

Using $ for named lists

l$logic
## [1] TRUE

3.4.2 Built-in functions

l<-list(sample(1:100,10),
        sample(1:100,10),
        sample(1:100,10))
names(l)<-c("r1","r2","r3")
l
## $r1
##  [1] 38 78 46 34 30 40 98 70 48 59
## 
## $r2
##  [1] 70 10 17 41 78 82  3 99 68 27
## 
## $r3
##  [1] 79 12 67 39 43 94 26 13 64 53

Performing operations on all elements of the list using lapply

lsums<-lapply(l,sum)
lsums
## $r1
## [1] 541
## 
## $r2
## [1] 495
## 
## $r3
## [1] 490
lsums <- lapply(l,function(a){
  sum(a)^2
})
lsums
## $r1
## [1] 292681
## 
## $r2
## [1] 245025
## 
## $r3
## [1] 240100

3.5 Factors

Key points:

  • Useful when for categorical data
  • Can have implicit order, if needed
  • Each element has a label or level
  • They are important in statistical modelling and plotting with ggplot
  • Some operations behave differently on factors

Creating factors with factor

cols<-factor(x = c(rep("red",4),rep("blue",5),rep("green",2)),
             levels = c("red","blue","green"))
cols
##  [1] red   red   red   red   blue  blue  blue  blue  blue  green green
## Levels: red blue green
samples <- c("case", "control", "control", "case")
samples
## [1] "case"    "control" "control" "case"
samples_factor <- factor(samples, levels = c("control", "case"))
samples_factor
## [1] case    control control case   
## Levels: control case
str(samples_factor)
##  Factor w/ 2 levels "control","case": 2 1 1 2

3.6 Exercise

See what happens when you convert a factor to a numeric in the code chunk below. What do you get?

#Take the samples variable and convert it to a numeric
#What function do you need to do this (hint: we used it a few chunks ago!)

3.6.1 Built-in functions

Grouping elements in a vector using tapply

measurements<-sample(1:1000,6)
samples<-factor(c(rep("case",3),rep("control",3)), levels = c("control", "case"))
tapply(measurements, samples, mean)
##  control     case 
## 396.3333 463.6667

3.7 Matrices

Creating a matrix full of zeros with matrix()

m<-matrix(0, ncol=6, nrow=3)
m
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    0    0    0    0    0    0
## [2,]    0    0    0    0    0    0
## [3,]    0    0    0    0    0    0
class(m)
## [1] "matrix" "array"
typeof(m)
## [1] "double"

Creating a matrix from a vector of numbers

m<-matrix(1:5, ncol=2, nrow=5)
m
##      [,1] [,2]
## [1,]    1    1
## [2,]    2    2
## [3,]    3    3
## [4,]    4    4
## [5,]    5    5

3.7.1 Attributes

Names of each dimension

colnames(m)<-letters[1:2]
rownames(m)<-LETTERS[1:5]
m
##   a b
## A 1 1
## B 2 2
## C 3 3
## D 4 4
## E 5 5
str(m)
##  int [1:5, 1:2] 1 2 3 4 5 1 2 3 4 5
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:5] "A" "B" "C" "D" ...
##   ..$ : chr [1:2] "a" "b"

3.7.2 Built-in functions

To know the size of the matrix

dim(m)
## [1] 5 2
ncol(m)
## [1] 2
nrow(m)
## [1] 5

3.7.2.1 Exercise

What do you think that length(m) will return?

3.8 Data frames

Key points:

  • Columns in data frames are vectors
  • Each column can be of a different data type
  • A data frame is essentially a list of vectors

Creating a data frame using data.frame()

df<-data.frame(numbers=1:10,
               low_letters=letters[1:10],
               logical_values=rep(c(T,F),each=5))
df
##    numbers low_letters logical_values
## 1        1           a           TRUE
## 2        2           b           TRUE
## 3        3           c           TRUE
## 4        4           d           TRUE
## 5        5           e           TRUE
## 6        6           f          FALSE
## 7        7           g          FALSE
## 8        8           h          FALSE
## 9        9           i          FALSE
## 10      10           j          FALSE
class(df)
## [1] "data.frame"
typeof(df)
## [1] "list"
str(df)
## 'data.frame':    10 obs. of  3 variables:
##  $ numbers       : int  1 2 3 4 5 6 7 8 9 10
##  $ low_letters   : chr  "a" "b" "c" "d" ...
##  $ logical_values: logi  TRUE TRUE TRUE TRUE TRUE FALSE ...

Re-naming columns

colnames(df)[2]<-"lowercase"
head(df)
##   numbers lowercase logical_values
## 1       1         a           TRUE
## 2       2         b           TRUE
## 3       3         c           TRUE
## 4       4         d           TRUE
## 5       5         e           TRUE
## 6       6         f          FALSE
View(df)

3.8.1 Indexing and sub-setting

df$numbers
##  [1]  1  2  3  4  5  6  7  8  9 10
df["numbers"]
##    numbers
## 1        1
## 2        2
## 3        3
## 4        4
## 5        5
## 6        6
## 7        7
## 8        8
## 9        9
## 10      10
df[1,]
##   numbers lowercase logical_values
## 1       1         a           TRUE
df[,1]
##  [1]  1  2  3  4  5  6  7  8  9 10
df[3,3]
## [1] TRUE

3.9 Coercion

Converting between data types with as. functions

x<-1:10
as.list(x)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] 4
## 
## [[5]]
## [1] 5
## 
## [[6]]
## [1] 6
## 
## [[7]]
## [1] 7
## 
## [[8]]
## [1] 8
## 
## [[9]]
## [1] 9
## 
## [[10]]
## [1] 10
l<-list(numbers=1:10,
        lowercase=letters[1:10])
l
## $numbers
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $lowercase
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
typeof(l)
## [1] "list"
df<-as.data.frame(l)
df
##    numbers lowercase
## 1        1         a
## 2        2         b
## 3        3         c
## 4        4         d
## 5        5         e
## 6        6         f
## 7        7         g
## 8        8         h
## 9        9         i
## 10      10         j
typeof(df)
## [1] "list"

3.10 Hands on: Data types

  • Make a matrix with the numbers 1:50, with 5 columns and 10 rows. Did the matrix function fill your matrix by column, or by row, as its default behavior?
  • Create a list of length two containing a character vector for each of the data sections: (1) Data types and (2) Data structures. Populate each character vector with the names of the data types and data structures, respectively.
  • There are several subtly different ways to call variables, observations and elements from data frames. Try them all and discuss with your team what they return. (Hint, use the function typeof())
  • Take the list you created in 3 and coerce it into a data frame. Then change the names of the columns to “dataTypes” and “dataStructures”.