3 Data types and data structures

Learning objectives

  1. Understand the differences between classes, objects and data types in R
  2. Create objects of different types
  3. Subset and index objects
  4. Learn and use vectorized operations

3.1 Data Types

3.2 Atomic Classes

Atomic classes are the fundamental data type found in R. All subsequent data structures are used to store entries of different atomic classes.

3.2.1 Numeric

They store numbers as double, and it is stored with decimals. The term double refers to the number of bytes required to store it. Each double is accurate up to 16 significant digits.

3.2.2 Integer

They store numbers that can be written without a decimal component. Adding an L after an integer tells R to store it as an integer class instead of a numeric

3.2.3 Logical

They store the outputs of logical statements - TRUE or FALSE. Can be converted to integer where TRUE = 1 and FALSE = 0.

3.2.4 Character

Represents text. Can either be a single character or a word/sentence.

3.2.5 Missing Value

Used by R to indicate a missing data entry. Useful for manipulating data sets where missing entries are common.

3.3 Arithmetic Operations

# Additon  
2+100000
## [1] 100002
# Subtraction  
3-5
## [1] -2
# Multiplication  
71*9
## [1] 639
# Division  
90/((3*5) + 4)
## [1] 4.736842
# Power  
2^3
## [1] 8

3.4 Logical operators

# First create two numeric variables  
var1 <- 35  
var2 <- 27
# Equal to  
var1 == var2
## [1] FALSE
# Less than or equal to  
var1 <= var2   
## [1] FALSE
var1 != var2
## [1] TRUE
# They also work with other classes  
var1 <- "mango" 
var2 <- "mangos"
var1 == var2
## [1] FALSE

Strings are compared character by character until they are not equal or there are no more characters left to compare.

var1 < var2
## [1] TRUE

We can test if a variable is contained in another object

"c" %in% letters  
## [1] TRUE
"c" %in% LETTERS
## [1] FALSE

3.5 Exercise

  1. Write a piece of code that stores a number in a variable and then check if it is greater than 5. Try to use comments!
  2. Bonus: Is there a way to store the result of checking if the number is greater than 5?

3.6 Data Structures

3.7 Vectors

Key points:
- Can only contain objects of the same class
- Most basic type of R object
- Variables are vectors

3.7.1 Numeric

Creating a numeric vector using c()

x <- c(0.3, 0.1)
x
## [1] 0.3 0.1

Using the vector() function

x <- vector(mode = "numeric",length = 10)
x
##  [1] 0 0 0 0 0 0 0 0 0 0

Using the numeric() function

x <- numeric(length = 10)
x
##  [1] 0 0 0 0 0 0 0 0 0 0

Creating a numeric vector with a sequence of numbers

# x <- seq(1,10,1)
# x

x <- seq(1,10,2)
x
## [1] 1 3 5 7 9
x <- rep(2,10)
x
##  [1] 2 2 2 2 2 2 2 2 2 2

Check length of vector with length()

x
##  [1] 2 2 2 2 2 2 2 2 2 2
length(x)
## [1] 10
y <- rep(2,5)
y
## [1] 2 2 2 2 2
length(y)
## [1] 5
length(x) == length(y)
## [1] FALSE

3.7.2 Integer

Creating an integer vector using c()

x <- c(1L,2L,3L,4L,5L)  
x
## [1] 1 2 3 4 5

Creating an integer vector of a sequences of numbers

x <- 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10

3.7.3 Logical

Creating a logical vector with c()

x <- c(TRUE,FALSE,T,F)
x
## [1]  TRUE FALSE  TRUE FALSE

Creating a logical vector with vector()

x <- vector(mode = "logical",length = 5)
x
## [1] FALSE FALSE FALSE FALSE FALSE

Creating a logical vector using logical()

x <- logical(length = 10)
x
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

3.7.4 Character

x<-c("a","b","c")
x
## [1] "a" "b" "c"
x<-vector(mode = "character",length=10)
x
##  [1] "" "" "" "" "" "" "" "" "" ""
x<-character(length = 3)
x
## [1] "" "" ""

Some useful functions to modify strings

tolower(LETTERS)
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
toupper(letters)
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
paste(letters,1:length(letters),sep="_") # Note the implicit coercion
##  [1] "a_1"  "b_2"  "c_3"  "d_4"  "e_5"  "f_6"  "g_7"  "h_8"  "i_9"  "j_10" "k_11" "l_12" "m_13" "n_14"
## [15] "o_15" "p_16" "q_17" "r_18" "s_19" "t_20" "u_21" "v_22" "w_23" "x_24" "y_25" "z_26"

3.7.5 Vector attributes

The elements of a vector can have names

x<-1:5
names(x)<-c("one","two","three","four","five")
x
##   one   two three  four  five 
##     1     2     3     4     5
x<-logical(length = 4)
names(x)<-c("F1","F2","F3","F4")
x
##    F1    F2    F3    F4 
## FALSE FALSE FALSE FALSE

3.8 Factors

Key points:

  • Useful when for categorical data
  • Can have implicit order, if needed
  • Each element has a label or level
  • They are important in statistical modelling and plotting with ggplot
  • Some operations behave differently on factors

Creating factors with factor

cols<-factor(x = c(rep("red",4),
                   rep("blue",5),
                   rep("green",2)),              
             levels = c("red","blue","green"))
cols
##  [1] red   red   red   red   blue  blue  blue  blue  blue  green green
## Levels: red blue green
samples <- c("case", "control", "control", "case") 
samples 
## [1] "case"    "control" "control" "case"
samples_factor <- factor(samples, levels = c("control", "case")) 
samples_factor 
## [1] case    control control case   
## Levels: control case
str(samples_factor)
##  Factor w/ 2 levels "control","case": 2 1 1 2

3.9 Exercise

See what happens when you convert a factor to a numeric in the code chunk below. What do you get?

#Take the samples variable and convert it to a numeric 

#What function do you need to do this (hint: as.character() converts elements to character types)

3.9.1 Built-in functions

To inspect the contents of a vector

is.vector(x) # Check if it is a vector
## [1] TRUE
is.na(x) # Check if it is empty
##    F1    F2    F3    F4 
## FALSE FALSE FALSE FALSE
is.null(x) # Check if it is NULL
## [1] FALSE
is.numeric(x) # Check if it is numeric
## [1] FALSE
is.logical(x) # Check if it is logical
## [1] TRUE
is.character(x) # Check if it is character
## [1] FALSE

To know what kind of vector you are working with

class(x) # Atomic class type
## [1] "logical"
typeof(x) # Object type or data structure (matrix, list, array...)
## [1] "logical"
str(x)
##  Named logi [1:4] FALSE FALSE FALSE FALSE
##  - attr(*, "names")= chr [1:4] "F1" "F2" "F3" "F4"

To know more about the data contained in the vector

Mathematical operations

sum(x)
## [1] 0
min(x) 
## [1] 0
max(x)
## [1] 0
x <- seq(1,10,1)
mean(x) 
## [1] 5.5
median(x) 
## [1] 5.5
sd(x)
## [1] 3.02765
log(x) 
##  [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101 2.0794415 2.1972246 2.3025851
exp(x)
##  [1]     2.718282     7.389056    20.085537    54.598150   148.413159   403.428793  1096.633158  2980.957987
##  [9]  8103.083928 22026.465795

Other operations

length(x)
## [1] 10
table(x)
## x
##  1  2  3  4  5  6  7  8  9 10 
##  1  1  1  1  1  1  1  1  1  1
summary(x)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    3.25    5.50    5.50    7.75   10.00

Grouping elements in a vector using tapply

measurements<-sample(1:1000,6) 
samples<-factor(c(rep("case",3),rep("control",3)), 
                levels = c("control", "case"))
tapply(measurements, samples, mean)
##  control     case 
## 731.0000 266.3333

3.9.2 Vector Operations

x<-1:10
y<-11:20
x*2
##  [1]  2  4  6  8 10 12 14 16 18 20
x+y
##  [1] 12 14 16 18 20 22 24 26 28 30
x*y
##  [1]  11  24  39  56  75  96 119 144 171 200
x^y
##  [1] 1.000000e+00 4.096000e+03 1.594323e+06 2.684355e+08 3.051758e+10 2.821110e+12 2.326305e+14 1.801440e+16
##  [9] 1.350852e+18 1.000000e+20

3.9.3 Recycling

If one of the vectors is smaller than the other, operations are still possible. R will replicate the smaller vector to enable the operation to occur.

IMPORTANT: if the larger vector is NOT a multiple of the smaller vector, the replication will still occur but will end at the length of the larger vector.

x<-1:10
y<-c(1,2,3)
x+y
## Warning in x + y: longer object length is not a multiple of shorter object length
##  [1]  2  4  6  5  7  9  8 10 12 11

3.9.3.1 Exercise

Calculate the sum of the following sequence of fractions:

x = 1/(1^2) + 1/(2^2) + 1/(3^2) + ... + 1/(n^2)

# n=100

# n=10000

3.9.4 Indexing and subsetting

For this example, lets create a vector of random numbers from 1 to 100 of size 15.

x<-sample(x = 1:100,size = 15,replace = F) 
x
##  [1] 19 47 20 43 92 91 56 49 17 54 78 79 86 33 30

Using the index/position

x[1] # Get the first element
## [1] 19
x[13] # Get the thirteenth element
## [1] 86

Using a vector of indices

x[1:12] # The first 12 numbers
##  [1] 19 47 20 43 92 91 56 49 17 54 78 79
x[c(1,5,6,8,9,13)] # Specific positions only
## [1] 19 92 91 49 17 86
names(x) <- letters[1:length(x)]

x[c('a','c','d')]
##  a  c  d 
## 19 20 43

Using a logical vector

# Only numbers that are less than or equal to 10
x<10
##     a     b     c     d     e     f     g     h     i     j     k     l     m     n     o 
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
x[x>95] 
## named integer(0)
# 
# # Only even numbers 
# x%%2 == 0
# x[x%%2 == 0]
x<10
##     a     b     c     d     e     f     g     h     i     j     k     l     m     n     o 
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
x[x<=10] # Only numbers that are less than or equal to 10
## named integer(0)

Skipping elements using indices

x[c(-1, -5)]
##  b  c  d  f  g  h  i  j  k  l  m  n  o 
## 47 20 43 91 56 49 17 54 78 79 86 33 30

Skipping elements using names

x<-1:10
names(x)<-letters[1:10]
x[names(x) != "a"]
##  b  c  d  e  f  g  h  i  j 
##  2  3  4  5  6  7  8  9 10

3.9.4.1 Exercise

Find all the odd numbers in x

Hint: 3 %% 2 = 1 and 4 %% 2 = 0

3.10 Lists

Key points:
- Can contain objects of multiple classes
- Extremely powerful when combined with some R built-in functions

Creating lists with different data types

l <- list(1:10, list("hello",'hi'), TRUE)
l
## [[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## [[2]]
## [[2]][[1]]
## [1] "hello"
## 
## [[2]][[2]]
## [1] "hi"
## 
## 
## [[3]]
## [1] TRUE

Assigning names as we create the list

l<-list(title = "Numbers", 
        numbers = 1:10, 
        logic = TRUE )
l
## $title
## [1] "Numbers"
## 
## $numbers
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $logic
## [1] TRUE
names(l)
## [1] "title"   "numbers" "logic"
l$numbers
##  [1]  1  2  3  4  5  6  7  8  9 10

3.10.1 Indexing and subsetting

Using [[]] instead of []

l[[1]]
## [1] "Numbers"

Using $ for named lists

l$logic
## [1] TRUE

3.10.2 Built-in functions

l<-list(sample(1:100,10),
        sample(1:100,10),
        sample(1:100,10))
names(l)<-c("r1","r2","r3")
l
## $r1
##  [1] 97 65 79  7 35 80 45 61 28 10
## 
## $r2
##  [1]  86  99  43  48  34 100   7   8  36  74
## 
## $r3
##  [1]  9 70 69 99 46 11 31 49 14 47

Performing operations on all elements of the list using lapply

lsums<-lapply(l,sum)
lsums
## $r1
## [1] 507
## 
## $r2
## [1] 535
## 
## $r3
## [1] 445
lsums <- lapply(l,function(a){
  sum(a)^2
})
lsums
## $r1
## [1] 257049
## 
## $r2
## [1] 286225
## 
## $r3
## [1] 198025

3.11 Matrices

Creating a matrix full of zeros with matrix()

m<-matrix(0, ncol=6, nrow=3)
m
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    0    0    0    0    0    0
## [2,]    0    0    0    0    0    0
## [3,]    0    0    0    0    0    0
class(m)
## [1] "matrix" "array"
typeof(m)
## [1] "double"

Creating a matrix from a vector of numbers

m<-matrix(1:5, ncol=2, nrow=5)
m
##      [,1] [,2]
## [1,]    1    1
## [2,]    2    2
## [3,]    3    3
## [4,]    4    4
## [5,]    5    5

3.11.1 Attributes

Names of each dimension

colnames(m)<-letters[1:2]
rownames(m)<-LETTERS[1:5]
m
##   a b
## A 1 1
## B 2 2
## C 3 3
## D 4 4
## E 5 5
str(m)
##  int [1:5, 1:2] 1 2 3 4 5 1 2 3 4 5
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:5] "A" "B" "C" "D" ...
##   ..$ : chr [1:2] "a" "b"

3.11.2 Built-in functions

To know the size of the matrix

dim(m)
## [1] 5 2
ncol(m)
## [1] 2
nrow(m)
## [1] 5

3.11.2.1 Exercise

What do you think that length(m) will return?

3.12 Data frames

Key points:

  • Columns in data frames are vectors
  • Each column can be of a different data type
  • A data frame is essentially a list of vectors

Creating a data frame using data.frame()

df<-data.frame(numbers=1:10,
               low_letters=letters[1:10],
               logical_values=rep(c(T,F),each=5))
df
##    numbers low_letters logical_values
## 1        1           a           TRUE
## 2        2           b           TRUE
## 3        3           c           TRUE
## 4        4           d           TRUE
## 5        5           e           TRUE
## 6        6           f          FALSE
## 7        7           g          FALSE
## 8        8           h          FALSE
## 9        9           i          FALSE
## 10      10           j          FALSE
class(df)
## [1] "data.frame"
typeof(df)
## [1] "list"
str(df)
## 'data.frame':    10 obs. of  3 variables:
##  $ numbers       : int  1 2 3 4 5 6 7 8 9 10
##  $ low_letters   : chr  "a" "b" "c" "d" ...
##  $ logical_values: logi  TRUE TRUE TRUE TRUE TRUE FALSE ...

Re-naming columns

colnames(df)[2]<-"lowercase"
head(df)
##   numbers lowercase logical_values
## 1       1         a           TRUE
## 2       2         b           TRUE
## 3       3         c           TRUE
## 4       4         d           TRUE
## 5       5         e           TRUE
## 6       6         f          FALSE
View(df)

3.12.1 Indexing and sub-setting

df$numbers
##  [1]  1  2  3  4  5  6  7  8  9 10
df["numbers"]
##    numbers
## 1        1
## 2        2
## 3        3
## 4        4
## 5        5
## 6        6
## 7        7
## 8        8
## 9        9
## 10      10
df[1,]
##   numbers lowercase logical_values
## 1       1         a           TRUE
df[,1]
##  [1]  1  2  3  4  5  6  7  8  9 10
df[3,3]
## [1] TRUE

3.13 Coercion

Converting between data types with as. functions

x<-1:10
as.list(x)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] 4
## 
## [[5]]
## [1] 5
## 
## [[6]]
## [1] 6
## 
## [[7]]
## [1] 7
## 
## [[8]]
## [1] 8
## 
## [[9]]
## [1] 9
## 
## [[10]]
## [1] 10
l<-list(numbers=1:10,
        lowercase=letters[1:10])
l
## $numbers
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $lowercase
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
typeof(l)
## [1] "list"
df<-as.data.frame(l)
df
##    numbers lowercase
## 1        1         a
## 2        2         b
## 3        3         c
## 4        4         d
## 5        5         e
## 6        6         f
## 7        7         g
## 8        8         h
## 9        9         i
## 10      10         j
typeof(df)
## [1] "list"

3.14 Hands on: Data types

  • Make a matrix with the numbers 1:50, with 5 columns and 10 rows. Did the matrix function fill your matrix by column, or by row, as its default behavior?
  • Create a list of length two containing a character vector for each of the data sections: (1) Data types and (2) Data structures. Populate each character vector with the names of the data types and data structures, respectively.
  • There are several subtly different ways to call variables, observations and elements from data frames. Try them all and discuss with your team what they return. (Hint, use the function typeof())
  • Take the list you created in 3 and coerce it into a data frame. Then change the names of the columns to “dataTypes” and “dataStructures”.