Skip to content

User Interaction Sketches

Raphael Sonabend edited this page Mar 30, 2019 · 6 revisions

In this page we look at different ways in which a user may interact and use distr6. These are ordered by increasing complexity and the final two link to other projects that will involve distr6.

Any ideas or suggestions are welcome.

For each of these we will use R6 notation for OOP method calls and S3 notation for dispatch, for example to calculate the mean from a Binomial distribution:

B <- Binomial$new(size = 10, prob = 0.5)
B$mean() # R6 method call
mean(B) # S3 dispatch

Note that the variable and class names used in these sketches are for guidance only and are not the final names.


Use Case: Finding distributions by property and/or trait


Brief: A user, possibly unsure about which distribution to interact with, can list all distributions by properties or traits

Basic flow:

  1. Call to listDistributions(), properties or traits can be specified
  2. System returns a data-frame of all distributions satisfying the requirements (all if none specified)

Use Case: Statistical analysis


Brief: Constructing a distribution in distr6 is equivalent to defining a random variable following that distribution. Then one can evaluate that random variable's density, distribution, quantile or other statistical features.

Basic flow:

  1. Construct a given distribution
  2. Use method calls or dispatch to call the relevant statistical function

Pseudo-code:

B <- Binomial$new(size = 10, prob = 0.2) # Binomial(10, 0.2)
B$mean() # mean(B)
B$var() # var(B)
B$pdf(2) # pdf(B, 2)
B$cdf(2) # cdf(B, 2)
B$quantile(0.4) # quantile(B, 0.4)

Use Case: Simulation


Brief: Simulating numbers from a given distribution.

Basic flow:

  1. Construct a given distribution
  2. Via R6 or S3 simulate x numbers from the distribution

Pseudo-code:

B <- Binomial$new(size = 10, prob = 0.2)
B$rand(100) # rand(100)

Use Case: Plotting distributions


Brief: Plotting specified functions from a given distribution.

Basic flow:

  1. Construct a given distribution
  2. Call plot() on the distribution, either specifying function or cycling through all, and either specifying range or over (reasonable) support

Pseudo-code:

N <- Normal$new(mean = 0, sd = 1)
plot(N, rep="hazard",range=c(-1,1), type="l") # Plots a line plot of hazard function over (-1,1)
plot(N, plots = 2) # Plots line plots (suggested default for AbsContDist) for the first two possible representations (e.g. pdf and cdf) then 'Press ENTER to continue' to see more plots 
qqplot(N)
hist(N)

Use Case: Estimating model parameters


Brief: Given a constructed distribution and a data-frame type object of empirical data, perform statistical inference for a given method to estimate the specified parameter.

Basic flow:

  1. Construct a distribution
  2. Create or load a dataset of empirical data
  3. Call Estimate() with distribution, data and specified method as arguments

Pseudo-code:

N <- Normal$new(sd=1)
x <- load(data.dat)
Estimate(method = "mle", distr = N, param = "mean", data = x)

Note: See design project for open questions regarding how best to implement this.


Use Case: Convolution of Random Variables


Brief: Given two random variables (instances of distributions), we can `add' these together assuming i.i.d. to construct a new distribution object, their convolution.

Basic flow:

  1. Construct an instance of distribution X
  2. Construct an instance of distribution Y
  3. Calculate the convolution X+Y

Psuedo-Code:

N <- Normal$new()
E <- Exponential$new()
convNE <- N + E 
convNE$pdf(1)
convNE$cdf(1)

Note: Similarly for NE etc. We extend the distr design and use X+X convolution of two i.i.d. random variables constructed from the same distribution and use 2X for the scalar multiplication of a single random variable.


Use Case: Conditioning


Brief: Given two random variables X and Y, construct the probability distribution of X|Y

Basic flow:

  1. Construct instance of distribution X
  2. Construct instance of distribution Y
  3. Construct conditional sub-class X|Y

Psuedo-Code:

N <- Normal$new()
E <- Exp$new()
NgY <- Conditional$new(N, Y)
NgY$mean()
NgY$sd()

Note: See design project about how conditional classes should be implemented, also for truncated and decomposed distributions.


Use Case: Mixing Distributions


Brief: Given two or more random variables, construct a mixing distribution based on these and either given probabilities or assumed uniform probabilities.

Basic flow:

  1. Construct instance of distribution X_1
  2. Construct instance of distribution X_2
  3. ...
  4. Construct instance of distribution X_n
  5. Construct mixing sub-class from instances

Pseudo-code:

N <- Normal$new()
E <- Exp$new()
B <- Binomial$new()
mixNEB <- Mixing$new(N,E,B,weights=c(0.1,0.2,0.7))
mixNEB <- Mixing$new(N,E,B) # Weights assumed to be 0.333 each
mixNEB$mean()
mixNEB$sd()
mixNEB$p(2)

Use Case: Creating new distributions


Brief: Define a new distribution via a density/mass/distribution function and internally other representations are automatically generated.

Basic flow:

  1. Construct instance of either ContDist, DiscDist with arguments of either pdf/pmf or cdf
  2. Internal verification to check if this is a valid distribution performed
  3. Internal calculations of other statistical functions

Pseudo-code:

newDist <- DiscreteDist$new(supp = c(1,5,7,21), prob = c(0.1,0.1,0.6,0.2)) # Example taken from distr

Note: This will require careful planning about which classes should be abstract as original designs had DiscreteDist as an abstract class with all implemented distributions as sub-classes. Clearly however this will not work in cases such as this.


Use Case: Describing data


Brief: Given that data follows a particular distribution (known apriori or via inference/simulation) then use a generic data-container to describe the data. See xdataframe project for more details.


Use Case: Probabilistic supervised learning


Brief: Given a modelling interface that is trained on a data-frame of distributions or that makes use of statistical inference in some other way, return a predicted model as a distribution object. See pslr project for more details.