R Functions

https://github.com/RUMgroup/R-Functions/

Martin Herrerias Azcue

ResearchIT, University of Manchester

2023-11-28

Function Syntax

Functions are blocks of code that process input arguments and return some output.

They’re different from scripts in that they have their own environment, and clearly defined inputs and output.

R/function_syntax.R
# multiply two numbers, and say hello [documentation]
multiply <- function(a, b) { # [header]: [name] <- function([arguments])

  # [body]
  answer <- a * b
  print("hello") # [side effects]

  return(answer) # [return value]
}

Function Syntax (cont.)

source("R/function_syntax.R")
result <- multiply(6, 7) # [function call]
[1] "hello"
result
[1] 42
ls()
[1] "multiply" "result"  

Note

  • Functions are variables, i.e. variable name rules apply to function names
  • The code in the function body runs when the function is called, not when it’s sourced
  • Variables inside a function are private (e.g. a, b, and answer exist only inside multiply)

Motivation

After years of research, you’ve come up with the following script:

R/bread_script.R
# An example script that defines a bread recipe

flour_weight <- 1.0
water_weight <- 0.7 * flour_weight
bread_weight <- 0.9 * (flour_weight + water_weight)
cat("Bread weight:", bread_weight, "kg", "\n")
source("R/bread_script.R")
Bread weight: 1.53 kg 
ls()
[1] "bread_weight" "flour_weight" "water_weight"

How NOT to write code variations

R/worst_idea.R
# How NOT to write variations of the same code

flour_weight <- 1.0
water_weight <- 0.7 * flour_weight
bread_weight <- 0.9 * (flour_weight + water_weight)
cat("Bread weight:", bread_weight, "kg", "\n")

flour_weight_2 <- 2.0
water_weight_2 <- 0.8 * flour_weight_2
bread_weight_2 <- 0.9 * (flour_weight_2 + water_weight_2)
cat("Bread weight:", bread_weight_2, "kg", "\n")
source("R/worst_idea.R")
Bread weight: 1.53 kg 
Bread weight: 3.24 kg 

Problems

ls()
[1] "bread_weight"   "bread_weight_2" "flour_weight"   "flour_weight_2"
[5] "water_weight"   "water_weight_2"
  • Code duplication (errors have to be fixed in multiple places)
  • Proliferation of variables (hard to keep using meaningful names)
  • Verbosity (hard to grasp the “big picture”)

Better, but not quite there yet

R/parametric_script.R
# Parametric bread recipe script.
# Requires: flour_weight, hydration, water_loss

water_weight <- hydration * flour_weight
bread_weight <- (1 - water_loss) * (flour_weight + water_weight)
cat("Bread weight:", bread_weight, "kg", "\n")
R/still_bad.R
# Calls R/parametric_script.R with different parameters

rm(list = ls()) # clear the workspace

flour_weight <- 1.0
hydration <- 0.7
water_loss <- 0.1

source("R/parametric_script.R")
bread_weight_1 <- bread_weight

flour_weight <- 2.0
hydration <- 0.8

source("R/parametric_script.R")
bread_weight_2 <- bread_weight
source("R/still_bad.R")
Bread weight: 1.53 kg 
Bread weight: 3.24 kg 

Problems

  • Code duplication
  • Proliferation of variables
  • Verbosity
  • Global variables (hard to track which script changes what)
# Variable used for something else
water_weight <- 10000

# ... many lines later
source("R/parametric_script.R")
Bread weight: 3.24 kg 
water_weight  # forgot my script also uses `water_weight`
[1] 1.6

Using a function

R/bread_function.R
# Function to calculate bread weight
get_bread_weight <- function(flour_weight, hydration = 0.7, water_loss = 0.1) {

  water_weight <- hydration * flour_weight
  bread_weight <- (1 - water_loss) * (flour_weight + water_weight)
  cat("Bread weight:", bread_weight, "kg", "\n")

  return(bread_weight)
}
R/main.R
# Calls R/parametric_script.R with different parameters

rm(list = ls()) # clear the workspace

# loads get_bread_weight function
source("R/bread_function.R")

bread_weight_1 <- get_bread_weight(flour_weight = 1.0, hydration = 0.7)
bread_weight_2 <- get_bread_weight(flour_weight = 2.0, hydration = 0.8)
source("R/main.R")
Bread weight: 1.53 kg 
Bread weight: 3.24 kg 
ls()
[1] "bread_weight_1"   "bread_weight_2"   "get_bread_weight"

Arguments

# Arguments can be passed by name or position
multiply <- function(a, b) {
  return(a * b)
}
multiply(a = 6, b = 7)
[1] 42
multiply(6, 7)
[1] 42
# No arguments
give_me_five <- function() {
  return(5)
}
give_me_five()
[1] 5
# Unspecified number of arguments
printer <- function(...) {
  cat("arguments:", ..., "\n")
}
printer(1, 2, "testing")
arguments: 1 2 testing 

Arguments (defaults)

# Default arguments
raise_to_power <- function(base, exponent = 2) {
  return(base^exponent)
}
raise_to_power(base = 3)
[1] 9
raise_to_power(base = 3, exponent = 3)
[1] 27
# Defaults can use any previous arguments
box <- function(a, b = a, c = b) {
  return(a * b * c)
}
box(2)
[1] 8
box(2, 3)
[1] 18

Return values

return is optional

# without explicit `return`, the last expression is returned
multiply <- function(a, b) {
  a * b
}
multiply(6, 7)
[1] 42
# (this can easily lead to errors)
surprise <- function(a, b) {
  result <- a * b
  cat("The answer is", result, "\n")
}
answer <- surprise(6, 7)
The answer is 42 
answer
NULL
# better...
no_surprise <- function(a, b) {
  result <- a * b
  cat("The answer is", result, "\n")
  return(result)
}
answer <- no_surprise(6, 7)
The answer is 42 
answer
[1] 42

Multiple return values

# use a named list for multiple return values
rectangle <- function(a, b) {

    area <- a * b
    perimeter <- 2 * (a + b)

    list(area = area, perimeter = perimeter)
}
result = rectangle(2, 3)
result$area
[1] 6
result$perimeter
[1] 10

See also the zeallot package, and base::list2env.

Environments

  • Variables inside a function are private
  • The same name can be used for different things inside and outside
rm(list = ls())

print_vars <- function(a, b) {
  cat("a =", a, "b =", b, "\n")
}

b <- 1000
print_vars(a = 2, b = 3)
a = 2 b = 3 
b
[1] 1000

Environments (cont.)

  • Variables outside can (but shouldn’t!) be viewed from inside
  • Variables outside can (but shouldn’t!) be modified from inside
rm(list = ls())

bad_example <- function(x = 1) {

  cat("variables inside:", ls(), "\n")

  # search in parent environments (AVOID!)
  cat("outside_c = ", outside_c, "\n") 

  # assignment in parent environment (AVOID!)
  outside_d <<- 42 
}

outside_c <- 2
outside_d <- 3
bad_example()
variables inside: x 
outside_c =  2 
outside_d
[1] 42

Documentation

Describe what the function does, each of the arguments, and the return value.

You might want to use roxygen2 style comments.

R/bread_function.R
#' Calculates bread weight
#'
#' @param flour_weight Weight of flour in kg
#' @param hydration Water content per unit flour [0, 1]
#' @param water_loss Water loss fraction during baking [0, 1]
#' @return Weight of baked bread in kg
#' @export
get_bread_weight <- function(flour_weight,
                             hydration = 0.7,
                             water_loss = 0.1) {
    # ...
}

Final notes

Advantages of functions

Reusability, Parametrization, Readability, Maintainability, Encapsulation, …

Modularity, Collaboration, Testing, Debugging (base::debugonce).

Where to go next

Join the R User Group!

RUM space on CADIR (MS Teams): https://bit.ly/RUserGroup