--- title: "cusumcharter" author: "John MacKintosh" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{cusumcharter} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/", dpi = 300, fig.width = 5, fig.height = 3 ) ``` ```{r setup} library(cusumcharter) library(ggplot2) ``` ## Rationale Basic CUSUM statistics are fairly straightforward to construct in a spreadsheet, if you only need to compare to a target value, or the mean of your data. However, turning this into a control chart takes a lot more work, especially if you have several metrics to analyse. The goal of this package is to allow you to pass simple vectors in, and receive data back that contains all the information you need to create faceted CUSUM control charts - (there is also a function that takes care of this for you). ## A journey through the package functions ```{r} test_vec <- c(1,1,2,3,5,7,11,7,5,7,8,9,5) test_vec ``` Here is our data, nothing too exciting is it? *Or is it?* Let's find out.. ## cusum_single ```{r baseplot, fig.width=5, fig.height=3} test_vec <- c(1,1,2,3,5,7,11,7,5,7,8,9,5) variances <- cusum_single(test_vec) p <- qplot(y = variances) p <- p + geom_line() p + geom_hline(yintercept = 0) p ``` This bare bones plot tells us quite a lot. Our measure of interest started off with small variances, resulting in points below the centre line at 0, but the variances increased, the points crossed over the centre line. ## cusum_single_df We input the same vector - this time using the `cusum_single_df` function. This returns : - our original vector - the target value (if not supplied, the mean is calculated) - the variance of each data point from the target - the cumulative sum of these variances - the cumulative sum of these variances, centred on the target value ( by adding the target to the cumulative sum, i.e. adding `cusum_target` to `cusumx`) ```{r singledf} variance_df <- cusum_single_df(test_vec) variance_df ``` ```{r baseplot2,fig.width=5, fig.height=3} p <- qplot(y = variance_df$x) p <- p + geom_line() p <- p + geom_hline(yintercept = variance_df$target) p ``` ## cusum_control Here we input the same vector, and receive even more information back :\ - our original vector (x)\ - target (supplied or calculated)\ - variance of x from target\ - std_dev - you can supply a standard deviation (if known), or it is calculated for you using the screened moving range of x\ - cusum is the cumulative sum of the variance\ - cplus and cneg depend on the target and `k` values, and also relate to the current x value and previous `cplus` or `cneg` value. These are the values that get plotted on the control chart, hence why you only have to supply an argument for the `x` axis. `- cum_nplus` and `cum_nneg` are also iterative values that calculate the number of consecutively rising or declining points. \- ucl and lcl are derived from the provided `h` value (usually between 4 and 5, defaults to 4). This is multiplied by the `std_dev` to calculate the upper and lower control limits (`ucl` and `lcl` respectively). \- the centre value is always 0 \- obs provides a reference for each value of x, for ease of plotting. ```{r cusumcontrol} cs_data <- cusum_control(test_vec) cs_data ``` ## cusum_control_plot *The `cusum_control` function is the most complex and important function in the package* The outputs are not much to look at however, so we harness the power of `ggplot2` to enjoy the benefits If we assume that we don't want to be outside our upper control limit, we can see that we now have a problem: ```{r cusumcontrolplot,fig.width=5, fig.height=3} cusum_control_plot(cs_data,xvar = obs) ``` We can also highlight points below the lower control limit using `show_below` ```{r controlplotshow,fig.width=5, fig.height=3} cusum_control_plot(cs_data,xvar = obs, show_below = TRUE) ``` ## Faceted displays and flexible x axis We can plot many charts at once by using `dplyr` or \``data.table` to group the data ```{r facetcontrolplots,fig.width=5, fig.height=3} library(dplyr) library(ggplot2) library(cusumcharter) testdata <- tibble::tibble( N = c(-15L,2L,-11L,3L,1L,1L,-11L,1L,1L, 2L,1L,1L,1L,10L,7L,9L,11L,9L), metric = c("metric1","metric1","metric1","metric1","metric1", "metric1","metric1","metric1","metric1","metric2", "metric2","metric2","metric2","metric2","metric2", "metric2","metric2","metric2")) datecol <- as.Date(c("2021-01-01","2021-01-02", "2021-01-03", "2021-01-04" , "2021-01-05", "2021-01-06","2021-01-07", "2021-01-08", "2021-01-09")) testres <- testdata %>% dplyr::group_by(metric) %>% dplyr::mutate(cusum_control(N)) %>% dplyr::ungroup() %>% dplyr::group_by(metric) %>% dplyr::mutate(report_date = datecol) %>% ungroup() p5 <- cusum_control_plot(testres, xvar = report_date, show_below = TRUE, facet_var = metric, title_text = "Highlights above and below control limits") p5 ```