-
Notifications
You must be signed in to change notification settings - Fork 23
User Interaction Sketches
In this page we look at different ways in which a user may interact and use distr6. These are ordered by increasing complexity and the final two link to other projects that will involve distr6.
Any ideas or suggestions are welcome.
For each of these we will use R6 notation for OOP method calls and S3 notation for dispatch, for example to calculate the mean from a Binomial distribution:
B <- Binomial$new(size = 10, prob = 0.5)
B$mean() # R6 method call
mean(B) # S3 dispatch
Note that the variable and class names used in these sketches are for guidance only and are not the final names.
Brief: A user, possibly unsure about which distribution to interact with, can list all distributions by properties or traits
Basic flow:
- Call to
listDistributions()
, properties or traits can be specified - System returns a data-frame of all distributions satisfying the requirements (all if none specified)
Brief: Constructing a distribution in distr6 is equivalent to defining a random variable following that distribution. Then one can evaluate that random variable's density, distribution, quantile or other statistical features.
Basic flow:
- Construct a given distribution
- Use method calls or dispatch to call the relevant statistical function
Pseudo-code:
B <- Binomial$new(size = 10, prob = 0.2) # Binomial(10, 0.2)
B$mean() # mean(B)
B$var() # var(B)
B$pdf(2) # pdf(B, 2)
B$cdf(2) # cdf(B, 2)
B$quantile(0.4) # quantile(B, 0.4)
Brief: Simulating numbers from a given distribution.
Basic flow:
- Construct a given distribution
- Via R6 or S3 simulate x numbers from the distribution
Pseudo-code:
B <- Binomial$new(size = 10, prob = 0.2)
B$rand(100) # rand(100)
Brief: Plotting specified functions from a given distribution.
Basic flow:
- Construct a given distribution
- Call plot() on the distribution, either specifying function or cycling through all, and either specifying range or over (reasonable) support
Pseudo-code:
N <- Normal$new(mean = 0, sd = 1)
plot(N, rep="hazard",range=c(-1,1), type="l") # Plots a line plot of hazard function over (-1,1)
plot(N, plots = 2) # Plots line plots (suggested default for AbsContDist) for the first two possible representations (e.g. pdf and cdf) then 'Press ENTER to continue' to see more plots
qqplot(N)
hist(N)
Brief: Given a constructed distribution and a data-frame type object of empirical data, perform statistical inference for a given method to estimate the specified parameter.
Basic flow:
- Construct a distribution
- Create or load a dataset of empirical data
- Call
Estimate()
with distribution, data and specified method as arguments
Pseudo-code:
N <- Normal$new(sd=1)
x <- load(data.dat)
Estimate(method = "mle", distr = N, param = "mean", data = x)
Note: See design project for open questions regarding how best to implement this.
Brief: Given two random variables (instances of distributions), we can `add' these together assuming i.i.d. to construct a new distribution object, their convolution.
Basic flow:
- Construct an instance of distribution X
- Construct an instance of distribution Y
- Calculate the convolution X+Y
Psuedo-Code:
N <- Normal$new()
E <- Exponential$new()
convNE <- N + E
convNE$pdf(1)
convNE$cdf(1)
Note: Similarly for NE etc. We extend the distr design and use X+X convolution of two i.i.d. random variables constructed from the same distribution and use 2X for the scalar multiplication of a single random variable.
Brief: Given two random variables X and Y, construct the probability distribution of X|Y
Basic flow:
- Construct instance of distribution X
- Construct instance of distribution Y
- Construct conditional sub-class X|Y
Psuedo-Code:
N <- Normal$new()
E <- Exp$new()
NgY <- Conditional$new(N, Y)
NgY$mean()
NgY$sd()
Note: See design project about how conditional classes should be implemented, also for truncated and decomposed distributions.
Brief: Given two or more random variables, construct a mixing distribution based on these and either given probabilities or assumed uniform probabilities.
Basic flow:
- Construct instance of distribution X_1
- Construct instance of distribution X_2
- ...
- Construct instance of distribution X_n
- Construct mixing sub-class from instances
Pseudo-code:
N <- Normal$new()
E <- Exp$new()
B <- Binomial$new()
mixNEB <- Mixing$new(N,E,B,weights=c(0.1,0.2,0.7))
mixNEB <- Mixing$new(N,E,B) # Weights assumed to be 0.333 each
mixNEB$mean()
mixNEB$sd()
mixNEB$p(2)
Brief: Define a new distribution via a density/mass/distribution function and internally other representations are automatically generated.
Basic flow:
- Construct instance of either ContDist, DiscDist with arguments of either pdf/pmf or cdf
- Internal verification to check if this is a valid distribution performed
- Internal calculations of other statistical functions
Pseudo-code:
newDist <- DiscreteDist$new(supp = c(1,5,7,21), prob = c(0.1,0.1,0.6,0.2)) # Example taken from distr
Note: This will require careful planning about which classes should be abstract as original designs had DiscreteDist as an abstract class with all implemented distributions as sub-classes. Clearly however this will not work in cases such as this.
Brief: Given that data follows a particular distribution (known apriori or via inference/simulation) then use a generic data-container to describe the data. See xdataframe project for more details.
Brief: Given a modelling interface that is trained on a data-frame of distributions or that makes use of statistical inference in some other way, return a predicted model as a distribution object. See pslr project for more details.