5  Statistical Functions

5.1 Introduction

R provides built-in functions for generating random numbers, evaluating probability density and cumulative distribution functions, and determining quantiles/percentiles. These functions are available for a variety of probability distributions.

5.2 Statistical functions

R has several built-in statistical distributions. For each distribution, we have four functions each of which is prefixed by a letter indicating its purpose. These prefixes are:

  • r: random number generator,
  • d: density function,
  • p: cumulative distribution function,
  • q: quantile function.

Each of these prefixes is followed by a short abbreviation of the name of the distribution. In Table 5.1, we summarize some of the most commonly used distributions in R along with their corresponding function name abbreviations.

Table 5.1: Distributions available in R
R function Distribution R function Distribution R function Distribution
beta Beta logis Logistic binom Binomial
nbinom Negative binomial cauchy Cauchy norm Normal
exp Exponential pois Poisson chisq Chi-squared
f Fisher’s F t Student’s t gamma Gamma
unif Uniform geom Geometric weibull Weibull
hyper Hypergeometric mvnorm Multivariate normal wilcox Wilcoxon

Below, we provide some illustrative examples of how to use these functions in R.

# Density function of normal distribution evaluated at 1.96
dnorm(1.96, mean=0, sd=1) 
[1] 0.05844094
# Cumulative distribution of normal distribution evaluated at 1.96
pnorm(1.96, mean=0, sd=1) 
[1] 0.9750021
# Cumulative distribution of normal distribution evaluated at 1.96, upper tail
pnorm(1.96, mean=0, sd=1, lower.tail=F)
[1] 0.0249979
# The 97.5th percentile (or the 0.95th quantile) of the normal distribution
qnorm(0.975, mean=0, sd=1) 
[1] 1.959964
# Generate 5 random numbers from the normal distribution
rnorm(5, mean=0, sd=1) 
[1] -1.2741114  0.3322055  1.3369019  1.7190191 -0.1015594

5.3 Setting the seed for random number generation

In simulation studies, it is often important to ensure that the results are reproducible. This means that if the same analysis is run multiple times, it should produce the same results each time. One way to achieve this is by setting a seed for the random number generator. The simplest way to specify the initial state or seed is to use, set.seed(seed), where the argument seed is a single integer value and different seeds give different pseudo-random values. If we call the set.seed function with the same seed, we will obtain the same results, if the sequence of calls is repeated exactly. If a seed is not specified then the random number generator is initialized using the time of day.

set.seed(17632)
rnorm(5)
[1] -1.8399628 -0.7903303  0.7797193 -0.6107181 -0.2786504

In the example above, we set the seed to 17632 before generating 5 random numbers from the standard normal distribution. If we run this code multiple times, we will get the same set of random numbers each time.

set.seed(17632)
rnorm(5)
[1] -1.8399628 -0.7903303  0.7797193 -0.6107181 -0.2786504

If we change the seed value, we will get a different set of random numbers.

set.seed(12345)
rnorm(5)
[1]  0.5855288  0.7094660 -0.1093033 -0.4534972  0.6058875