In this vignette, we are going to present how to run
PhenotypeDiagnostics(). We are going to use the following
packages and mock data:
library(CohortConstructor)
library(PhenotypeR)
library(dplyr)
con <- DBI::dbConnect(duckdb::duckdb(),
CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con,
cdmName = "Eunomia Synpuf",
cdmSchema = "main",
writeSchema = "main",
achillesSchema = "main")
cdmNote that we have included achilles tables in our cdm reference, which will be used to speed up some of the analyses.
First, we are going to use the package CohortConstructor to generate three cohorts of warfarin, acetaminophen and morphine users.
# Create a codelist
codes <- list("warfarin" = c(1310149, 40163554),
"acetaminophen" = c(1125315, 1127078, 1127433, 40229134, 40231925, 40162522, 19133768),
"morphine" = c(1110410, 35605858, 40169988))
# Instantiate cohorts with CohortConstructor
cdm$my_cohort <- conceptCohort(cdm = cdm,
conceptSet = codes,
exit = "event_end_date",
overlap = "merge",
name = "my_cohort")Now we will proceed to run phenotypeDiagnotics(). This
function will run the following analyses:
We can specify which analysis we want to perform by setting to TRUE or FALSE each one of the corresponding arguments:
result <- phenotypeDiagnostics(
cohort = cdm$my_cohort,
diagnostics = c("databaseDiagnostics", "codelistDiagnostics",
"cohortDiagnostics", "populationDiagnostics"),
cohortSample = 20000,
matchedSample = 1000,
populationSample = 1e+06,
populationDateRange = as.Date(c(NA, NA))
)
result |> glimpse()Notice that we have three additional arguments:
populationSample: It allows to specify a number of
people that randomly will be extracted from the CDM to perform the
Population diagnostics analysis. If NULL, all the
participants in the CDM will be included. It helps to reduce the
computational time. This is particularly useful when outcomes of
interest are relatively common, but when they are rarer we may wish to
maximise statistical power and calculate estimates for the dataset as a
whole in which case we would set this argument to NULL.populationDateRange: We can use it to specify the time
period when we want to perform our Population
diagnostics analysis.cohortSample: This argument will subset a random sample
of people from our cohort and performs the cohortDiagnostics on this
sample (notice that the attrition and the cohort counts will be
calculated from the original cohorts, not the sampled ones). If the
sample specified is bigger than the number of individuals in the cohort,
no sampling will be performed. We recommend to use this option when
there are cohorts bigger than 20,000 individuals.matchedSample: Similar to populationSample, this
arguments subsets a random sample of people from our cohort and performs
the matched analysis on this sample. If the sample specified is bigger
than the size of the cohort, no sampling will be performed. If we have
specified a cohortSample, the sampling will be performed on top of the
sampled cohorts. If we do not want to create matched cohorts, we can
define matchedSample = 0.To save the results, we can use exportSummarisedResult function from omopgenerics R Package:
Once we get our Phenotype diagnostics result, we can
use shinyDiagnostics to easily create a shiny app and
visualise our results:
Notice that we have specified the minimum number of counts
(minCellCount) for suppression to be shown in the shiny
app, and also that we want the shiny to be launched in a new R session
(open). You can see the shiny app generated for this
example in here.See
Shiny
diagnostics vignette for a full explanation of the shiny app.