What's a Matrix and how to use them - R Basics

In this post, we talk about matrices. What are they? How do we use them? Can we dodge bullets in slow motion?

In the previous post, we learnt about basics of vectors and what we can do with them. Now it's time to turn it up a notch with MATRICES. A matrix is essentially kind of like a 2-dimensional vector (bear with my explanation, it will make sense soon). Not to be confused with The Matrix movie where you can dodge bullets in slow motions (sorry to disappoint).

Creating a Matrix

First, let's create a variable from numbers 1 to 10.

> v1 <- 1:10
> v1
 [1]  1  2  3  4  5  6  7  8  9 10

So as you can see, the number are all on a single row with no columns. To create a matrix, we just simply have to use the matrix() function.

> matrix(v1)
      [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
 [5,]    5
 [6,]    6
 [7,]    7
 [8,]    8
 [9,]    9
[10,]   10

In this example, matrix(v1) turned our variable, v1 into a matrix with just 1 column and 10 rows. On the left, you see numbers in square brackets such as [1,] or [4,]. This represents the row number of the matrix and you can use it to index / slice a specific section of the matrix. Same goes the the numbers at the top such as [,1], that is for the columns. I'll get into this soon.

To make the matrix split into the number of rows you want, just add a 'nrow = xx' argument into your matrix function. In the 2nd example, I told R that I want my matrix to have 2 rows so now it has 2 rows and 5 columns.

> matrix(v1,nrow=2)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    5    7    9
[2,]    2    4    6    8   10

R sorts the elements within a matrix based on columns by default. So if you want R to sort the elements based on the rows 1st, just add a 'byrow = T/F' argument. You can see in the 3rd example that now my matrix has 2 rows and is filled up in rows 1st then columns.

#Let's assign this to a variable
> v1m <- matrix(v1,nrow=2,byrow = TRUE)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10

Some of you might be curious and are wondering... what happens if the amount of elements and my matrix's number of rows do not divide perfectly? The best way to learn and remember is to try it out yourself ;)

We can also create matrices from vectors and give names to the rows and columns by using colnames() and rownames() for easier reference.

#2 FAKE Stock Prices for the week
> abcd <- c(150,153,152,151,151)
> ffgg <- c(101,95,98,103,100)

> stocks <- c(abcd,ffgg)
> stocks.matrix <- matrix(stocks,nrow=2)
> stocks.matrix
     [,1] [,2] [,3] [,4] [,5]
[1,]  150  152  151   95  103
[2,]  153  151  101   98  100

#Hmmmm... not sure what these numbers mean... time to add some names!
> days <- c('Mon','Tue','Wed','Thu','Fri')
> stock.names <- c('abcd','ffgg')

> colnames(stocks.matrix) <- days
> rownames(stocks.matrix) <- stock.names
> stocks.matrix
     Mon Tue Wed Thu Fri
abcd 150 152 151  95 103
ffgg 153 151 101  98 100
#Now this looks a lot more understandable

Similar to vectors, you can also perform basic arithmetic on matrices like addition, subtraction, multiplication, division, exponents and comparison operators in the same way. I won't get into that, you can just play around in RStudio to get an understanding. You can also add a matrix and vector together, but you will realise that it will perform the calculation based on columns 1st. Just a little additional information. To make it perform addition based on rows 1st, it gets a little complicated and I won't get into that. If you are really curious, google for your solutions. The reality of programming is 90% googling and finding solutions that fit your needs.

Matrix Operations

Continuing on from our stocks.matrix example, we also have a few function that might suit your needs.

#colSums() will return the sum of every column within your matrix
> colSums(stocks.matrix)
Mon Tue Wed Thu Fri 
303 303 252 193 203 

#rowSums will return the sum of every row within your matrix
> rowSums(stocks.matrix)
abcd ffgg 
 651  603

#rowMeans will return the mean(average) of each row
> rowMeans(stocks.matrix)
 abcd  ffgg 
130.2 120.6 

#And lastly, colMeans will return the mean(average) of each column
> colMeans(stocks.matrix)
  Mon   Tue   Wed   Thu   Fri 
151.5 151.5 126.0  96.5 101.5

There are 2 more very fun functions that you will definitely need in your future and those are rbind() and cbind(). Just as the name suggests, rbind() binds a new row into an existing matrix while cbind() binds a new column into an existing matrix.

#Using our stocks.matrix again, let's say we want to add another stock into the matrix, we would need another row
#Our new stock
> xyxy <- c(75.35,73.21,77.15,73,72.5)
> stocks.matrix <- rbind(stocks.matrix,xyxy)
> stocks.matrix
        Mon    Tue    Wed Thu   Fri
abcd 150.00 152.00 151.00  95 103.0
ffgg 153.00 151.00 101.00  98 100.0
xyxy  75.35  73.21  77.15  73  72.5

#Now we want to add an average price column into our matrix
> average <- rowMeans(stocks.matrix)
> stocks.matrix <- cbind(stocks.matrix,average)
> stocks.matrix
        Mon    Tue    Wed Thu   Fri average
abcd 150.00 152.00 151.00  95 103.0 130.200
ffgg 153.00 151.00 101.00  98 100.0 120.600
xyxy  75.35  73.21  77.15  73  72.5  74.242

Matrix Indexing and Slicing

Time for some slicing and dicing again~ Actually there's nothing much to talk about here. If you are already familiar with slicing and indexing with vectors, then you should have no problem here. The only additional thing is that, instead of indexing/slicing with one dimension, you do it with two.

Referring back to our v1m matrix, we have a matrix of 2 rows showing [x,] and 5 columns showing [,x]. To call out a specific element within our matrix, just use matrix_name[rownum,colnum]. So if you want to call out 5, you would do the following:

#The number 5 is on the 1st row and 5th column
> v1m[1,5]
[1] 5

You could also use the ":" symbol to use sequences to call out a slice of the matrix.

> v1m[1,2:4]
[1] 2 3 4

Factor and Categorical Matrices

Our final section for tonight, woohoo! Great job if you made it this far. Take a breather or keep pushing on if you like. You know yourself the best and I can't stop you even if I should but just remember to understand the knowledge and take some time to play around for a bit. You might found some things that I didn't know or didn't cover. If you're ready to continue, then read on!

Now I'm going to talk about factoring. For people who don't know what factoring is (like I did), factoring essentially means categorizing. In R, if you provide a vector or a row/column of a matrix with multiple characters, R will just read each character by itself. Let's say you had a vector of answers where you asked people whether they hot, cold warm or chilly.

> answers <- c('hot','cold','cold','chilly','warm','warm','hot','warm')

When I try to get a summary of the vector from RStudio using summary(), R only reads it as 8 separate characters.

> summary(answers)
   Length     Class      Mode 
        8 character character

After factoring, R will try to categorise / factor these characters based on unique values like you see here.

> fact.ans <- factor(answers)
> fact.ans
[1] hot    cold   cold   chilly warm   warm   hot    warm  
Levels: chilly cold hot warm

Now if we run a summary through the factored vector, it gives a helpful count on the number of answer based on the category / factor.

> summary(fact.ans)
chilly   cold    hot   warm 
     1      2      2      3

For normal categorical variables, this would be fine as it is. But for this scenario, I would want to sort my categorical variables from cold to hot. By default, R will sort characters in alphabetical order so we would need to let R know the order that we would like to use. We can do this by adding the "ordered" and "levels" arguments into our factor() function. There are 2 ways of doing this:

#Using a vector for the levels argument (beginner for easier understanding)
> fact.ans <- factor(answers, ordered=TRUE, levels=c('cold','chilly','warm','hot'))
> summary(fact.ans)
  cold chilly   warm    hot 
     2      1      3      2 

#Assigning the ordered vector into a variable
#Then using that variable in our levels argument (easier readability)
> order.ans <- c('cold','chilly','warm','hot')
> fact.ans <- factor(answers, ordered=TRUE, levels=order.ans)
> summary(fact.ans)
  cold chilly   warm    hot 
     2      1      3      2

Congratulations!!! You finished the blog post and read (hopefully learnt) the basics of matrices. Take some time off to absorb and digest what you just learnt, don't rush into the next one. Most importantly, play around with imaginary scenarios and type them in R, or just write them down in a blog or notebook. Writing this post definitely did help me remember matrices a lot better. Thank you for reading and see you again next time!