Functions & The 'Apply' Family - R Basics

In this post, we talk about functions, a key fundamental concept of programming and meeting the 'Apply' Family.

Hello again everyone! Hope you are having a wonderful day! First question I have is, how are you? Before we start, I just want you to reflect on yourself a bit... clear your mind of any worries or problems and focus on what you are doing now. Take a while to slow down and take it in... Are we good? If yes, then let's get started!

Today, the first thing on our agenda is that we are going to be talking about functions and how to use them. Functions are one of the main building blocks when learning R and it will be used a lot when we are solving larger and more complex problems. They are essentially allowing us to repeat certain actions repeatedly without having to write the same code over and over again.

Here's a fun fact: most of the stuff that we previously learnt are actually functions as well, for example, sum, mean, median, min, max. These are built-in functions in R and sometimes R doesn't have a function that serves our needs, so we can build one ourselves after this post.

Functions

Creating a function

To make a function, here's the syntax for R:

function_name <- function(input1,input2,input3......){
    #Execute this block of code when function is called
}

The simplest function we can make is an empty function with 0 inputs needed. For example, let's make a "Hello world!" function.

helloworld <- function(){
  print('Hello world!')
}

helloworld()
[1] "Hello world!"

Please remember the parenthesis "()" after your function name, otherwise R won't recognize it as a function name so it doesn't know you are calling a function. Now, let's proceed to make a function that has ONE input, we'll slowly ramp up so you get the overall understanding. For this example, we will create a function that prints Hello, "name" based on the input.

helloname <- function(name){
  print(paste('Hello',name))
}

helloname('Alice')
[1] "Hello Alice"

helloname('Brand')
[1] "Hello Brand"

Starting to get the hang of it? Cool! Now, let's do an example with TWO inputs. For this example, we will do a simple addition between two input numbers.

addition <- function(a,b){
  print(a+b)
}

addition(10,5)
[1] 15

addition(135,978)
[1] 1113

So far so good right? Some of you might be thinking that this is way too simple... and yeah, you're right, because what's important is the concept behind functions. Trust me, functions can get out of hand REALLY fast so it's important to get the basics down first.

Default values

Sometimes, there might not always be an input available when you call out your function. For example, let's look back at our helloname function.

helloname()
Error in paste("Hello", name) : 
  argument "name" is missing, with no default

An error pops out, letting us know that there is no default for the "name" argument, so the script stopped running.If you try to run it again, more errors pop up. It's safe to have default values in your functions as anything could happen, especially with user inputs.

So how do we put in a default value? To do this, all you have to do is add "=something" behind your argument. For example, let's say we want our helloname function to default to "Blank" if no input is given.

helloname <- function(name = "Blank"){
  print(paste('Hello',name))
}

helloname()
[1] "Hello Blank"

helloname('Sandy')
[1] "Hello Sandy"

Returning Values

Now we are going to talk about the return() function. The return command will return the value of a function which you can use to assign to another variable. For those that are coming from another programming language (like Python), you may be familiar with return already, but if R is your first ever programming language, I have some good news for you! R doesn't need return (woohoo!). By default, R will return the output of a function automatically, however it is still good practice to use return in your R script for better readability. Let me show you a difference between using return in Python and R to better illustrate this.

# In R
add_one <- function(x){
  #No return statement
  x + 1
}

print(add_one(5))
[1] 6

# In Python
def add_one(x):
  #No return statement
  x + 1

print(add_one(5))
None

Now that we finished learning functions, it's time to meet the 'Apply' family!

The 'Apply' Family

When I was talking about functions, I mentioned how they are used to repeat blocks of code without having to type them out over and over again. This is where apply comes in. The apply functions allows us to apply a function over multiple objects or variables. In the base package of R, there are 8 members in the 'Apply' family:

  • apply
  • eapply
  • lapply √
  • mapply
  • rapply
  • sapply √
  • tapply
  • vapply

But for the basics, we will just cover 2 of them as you will be using these 2 most of the time. Starting off with...

lapply

The lapply() function will run our function across multiple variables but (most importantly) it will return a list of the same length as your argument. Let's look at an example 1st:

#Let's create a vector
v <- c(5,5,5,5,5,5)

#Now we create a function to generate a random number
#Then we add that number to our variable
randomadd <- function(a){
  random <- sample(x=1:10,1)
  a + random
}

lapply(v,randomadd)
[[1]]
[1] 15

[[2]]
[1] 12

[[3]]
[1] 13

[[4]]
[1] 12

[[5]]
[1] 7

[[6]]
[1] 12

As you can see, by using lapply, the function runs through for every element within our vector and then it generates a list for us.

sapply

If you don't want R to return your results in a list but you want it in a vector or matrix instead, then we need to use sapply(). It can also be used to return an array but I haven't learnt this myself so I won't go into it.

sapply(v,randomadd)
[1] 11  6 14 13  6 10

Now, you might be wondering, what happens if I DON'T use lapply or sapply? What if I just placed our 'v' variable into the function itself?

randomadd(v)
[1] 8 8 8 8 8 8

randomadd(v)
[1] 9 9 9 9 9 9

If you noticed, what R essentially did was that it generated a random number ONCE, then added that same number to every element in our vector. That's the main difference of using a function with and without lapply or sapply.

Let me repeat myself, when using lapply or sapply, the function will run every time for every element in our vector (meaning it generates a random number for every element), while as when you just put your vector into your function, the function will run once (generate a single random number) then add that to every element in your vector.

For now, just knowing lapply() and sapply() is good enough for you to proceed with your learning journey as you will be using these 2 most of the time. However, if you are curious about the other apply functions and their applications, feel free to read the R documentation by typing ??apply in the RStudio console or read through the answer in this StackOverflow post which serves as an excellent simple guide.

Anonymous functions

I mentioned how complicated functions can potentially get but sometimes we just need a very simple, one-line function to finish the job. This is where anonymous functions come in. Up until now, I have assigned a variable name for all of the functions that I wrote but we can also use functions without assigning a variable and just write the whole code into our lapply() or sapply(). Let's convert our random number addition function into an anonymous function.

sapply(v, function(a){a+sample(x=1:10,1)})
[1]  9  7 10 15 12  7

As you can see, it definitely works, however it will create a cluster of code that might be hard to read at first glance, even for something as simple as this. So be wary of when to use this.

Okay, that's all for today! You are another step closer into your learning journey and you should be proud of yourself. Remember that learning something new is a marathon, it's the journey and experience that we obtain that matter. The destination is only important as an end goal for motivation, don't rush to it otherwise you might burn out even before reaching there. Stay safe everyone and see you next time!