| Title: | Data Management Tools for Pharmacometrics |
|---|---|
| Description: | Tools and functions to efficiently create datasets used in pharmacometric analysis. Additional functionality is added to create documentation and prepare files for submission and quality control purposes. |
| Authors: | Richard Hooijmaijers [aut, cre, cph], LAPP Consultants [fnd, cph] |
| Maintainer: | Richard Hooijmaijers <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.2.1 |
| Built: | 2026-05-26 06:57:52 UTC |
| Source: | https://github.com/leidenadvancedpkpd/amp.dm |
This function adds data attributes available in a list to a data frame. Additional checks are done to verify if the attributes are in a valid and use-able format
attr_add(data, attrl, attrib = c("label", "format", "remark"), verbose = TRUE)attr_add(data, attrl, attrib = c("label", "format", "remark"), verbose = TRUE)
data |
the data frame for which the attributes should be set |
attrl |
named list with the attributes for the dataset (see details) |
attrib |
a vector of attributes that should be set for data (currently 'label', 'format' and 'remark' are applicable) |
verbose |
a logical indicating if datachecks should be printed to console |
This function adds attributes available in a list to a data frame. The structure of this list should be available in a specific format. The names items in the list are aligned with the variables in the data frame. For each item, the content of the 'label', 'format' and 'remark' elements will be added as attributes to the dataset. For an example of the format of list see for instance attr_xls.
dataframe with the attributes added
Richard Hooijmaijers
xmpl <- system.file("example/Attr.Template.xlsx",package = "amp.dm") attrl <- attr_xls(xmpl) data <- read.csv(system.file("example/NM.theoph.V1.csv",package = "amp.dm"), na.strings = ".") attr_add(data,attrl) |> str()xmpl <- system.file("example/Attr.Template.xlsx",package = "amp.dm") attrl <- attr_xls(xmpl) data <- read.csv(system.file("example/NM.theoph.V1.csv",package = "amp.dm"), na.strings = ".") attr_add(data,attrl) |> str()
This function extracts attributes available in a data frame and creates a structured list
attr_extract(dfrm)attr_extract(dfrm)
dfrm |
data frame containing the attributes |
named list with the attributes
Richard Hooijmaijers
attrl <- attr_xls(system.file("example/Attr.Template.xlsx",package = "amp.dm")) nm <- read.csv(system.file("example/NM.theoph.V1.csv",package = "amp.dm")) nmf <- attr_add(nm, attrl, verbose = FALSE) attrl2 <- attr_extract(nmf) all.equal(attrl,attrl2)attrl <- attr_xls(system.file("example/Attr.Template.xlsx",package = "amp.dm")) nm <- read.csv(system.file("example/NM.theoph.V1.csv",package = "amp.dm")) nmf <- attr_add(nm, attrl, verbose = FALSE) attrl2 <- attr_extract(nmf) all.equal(attrl,attrl2)
This function searches for the 'format' attribute within a data frame, if found it applies the format to that variable. The resulting variable will be a factor useful for plotting and reporting
attr_factor(data, verbose = TRUE, largestfirst = FALSE)attr_factor(data, verbose = TRUE, largestfirst = FALSE)
data |
the data.frame for which factors should be created |
verbose |
a logical indicating if datachecks should be printed to console |
largestfirst |
either a logical or a character vector indicating if the largest group should be the first level (see details) |
In order to make this function work the 'format' attribute should be present and should be available
as a named vector (e.g. attr(data$GENDER,'format') <- c('0' = 'Male', '1' = 'Female')). If the
attribute is found it overwrites the variable with the format applied to it. Be aware that the original
levels defined in the format could be lost in this process.
The 'largestfirst' argument could be set to a logical which indicates if a for all variables in the dataset, the
largest group should be the first level. The argument could also be a character vector indicating for which of the variables
in the dataset, the largest group should be the first level. In case you want to set a specific order, this can be done
directly in the the format attribute, e.g. attr(data$VAR,'format') <- c('2' = 'level 1', '1' = 'Level 2')
data frame with the formats assigned
Richard Hooijmaijers
dfrm <- data.frame(GENDER=rep(c(0,1),4),RESULT=rnorm(8)) attr(dfrm$GENDER,'format') <- c('0' = 'Male', '1' = 'Female') dfrm <- attr_factor(dfrm) str(dfrm$GENDER)dfrm <- data.frame(GENDER=rep(c(0,1),4),RESULT=rnorm(8)) attr(dfrm$GENDER,'format') <- c('0' = 'Male', '1' = 'Female') dfrm <- attr_factor(dfrm) str(dfrm$GENDER)
This function reads in attributes available in an excel file and creates a structured list
attr_xls(xls, sepfor = "\n", nosort = FALSE)attr_xls(xls, sepfor = "\n", nosort = FALSE)
xls |
character with the name of the excel file containing the attributes |
sepfor |
character of length 1 indicating what the separator for formats should be |
nosort |
logical indicating if sorting of variables should be omitted (otherwise sorting of no. column in excel file is applied) |
named list with the attributes
Richard Hooijmaijers
xmpl <- system.file("example/Attr.Template.xlsx",package = "amp.dm") head(attr_xls(xmpl),3)xmpl <- system.file("example/Attr.Template.xlsx",package = "amp.dm") head(attr_xls(xmpl),3)
This function reports information for the categories, mainly the frequencies, proportions and missing values
check_cat(x, missing = c(-999, NA), detail = 5, threshold = c(NA, NA))check_cat(x, missing = c(-999, NA), detail = 5, threshold = c(NA, NA))
x |
numeric vector with the categories |
missing |
vector with the values that present missing information |
detail |
numeric with he level of detail to print (see below for details) |
threshold |
numeric vector with the threshold numbers and proportions (see details) |
The detail argument can be used to print certain information:
5: All possible information is printed
4: Only the table with frequencies and proportions
3: Only information regarding missing data
2: Only a warning in case number of missing is above threshold (see below)
1: A named vector with the available categories that can be used in num_lump The threshold presents the absolute number (first number) and proportions (second number) to check. If either one of these numbers is above the threshold for missing values, a warning is given. This can be convenient to decide whether or not a category should be used during analyses.
Nothing is returned information is printed on screen
Richard Hooijmaijers
dfrm <- data.frame(cat1 = c(rep(1:5,10),-999), cat2 = c(rep(letters[1:5],10),-999)) check_cat(dfrm$cat1) check_cat(dfrm$cat2, detail=1) check_cat(dfrm$cat1,detail=2,threshold = c(NA,0.1))dfrm <- data.frame(cat1 = c(rep(1:5,10),-999), cat2 = c(rep(letters[1:5],10),-999)) check_cat(dfrm$cat1) check_cat(dfrm$cat2, detail=1) check_cat(dfrm$cat1,detail=2,threshold = c(NA,0.1))
This function checks if there are any common errors or mistakes within a NONMEM dataset, and reports the results back to console, table or dataframe
check_nmdata( x, type = 1, ret = "tbl", capt = NULL, align = NULL, size = "\\footnotesize", ... )check_nmdata( x, type = 1, ret = "tbl", capt = NULL, align = NULL, size = "\\footnotesize", ... )
x |
either a path to a CSV file or a data frame with the NONMEM data that should be checked |
type |
integer with the type of checks. Currently 1 can be used for checks that should all pass for a valid analysis and 2 for checks that trigger further investigation |
ret |
a character vector to define what kind of output should be returned (either "dfrm", "tbl", "file") |
capt |
character with the caption of the table (not used in case data frame is returned) |
align |
alignment of the table passed to general_tbl (not used in case data frame is returned) |
size |
character with font size as for the table general_tbl |
... |
additional arguments passed to general_tbl |
the checks are either printed, returned as dataframe or placed in a PDF
Richard Hooijmaijers
chkf <- system.file("example/NM.theoph.V1.csv",package = "amp.dm") check_nmdata(chkf)chkf <- system.file("example/NM.theoph.V1.csv",package = "amp.dm") check_nmdata(chkf)
Adds a comment regarding assumptions and special attention into package environment, which can be used in code chunks and easily printed after a code chunk
cmnt(string = "", bold = FALSE, verbose = TRUE)cmnt(string = "", bold = FALSE, verbose = TRUE)
string |
character of length one with the comment to add |
bold |
logical indicating if the string should be printed in bold to emphasize importance |
verbose |
logical indicating if the comment should be printed when function is called |
no return value, called for side effects
Richard Hooijmaijers
cmnt("Exclude time points > 12h") cmnt("Subject 6 deviates and is excluded in the analysis", TRUE) # Markdown syntax can be used for comments: cmnt("We can use **bold** and *italic* or `code`") # we can print the contents of the comments with get_log()$cmnt_nfocmnt("Exclude time points > 12h") cmnt("Subject 6 deviates and is excluded in the analysis", TRUE) # Markdown syntax can be used for comments: cmnt("We can use **bold** and *italic* or `code`") # we can print the contents of the comments with get_log()$cmnt_nfo
Prints the results in markdown format to be used directly in inline coding
cmnt_print(clean = TRUE)cmnt_print(clean = TRUE)
clean |
logical indicating if the comments should be deleted after printing (see details) |
The function returns a text string with the comments given up to the point it was called. When clean is set to TRUE (default), the content of the comment dataset is cleaned to overcome repetition of comments each time it is called
character string with the comments
Richard Hooijmaijers
cmnt("Comment to print") cmnt_print()cmnt("Comment to print") cmnt_print()
This function creates a latex table or data frame with the number of records, subjects and variables of one or multiple data frames.
contents_df( dfv, subject = NULL, ret = "tbl", capt = "Information multiple data frames", align = "lllp{8cm}", ... )contents_df( dfv, subject = NULL, ret = "tbl", capt = "Information multiple data frames", align = "lllp{8cm}", ... )
dfv |
a character vector with data frame(s) in global environment for which the overview should be created |
subject |
character string that identifies the subject variable within the data frame |
ret |
a character vector to define what kind of output should be returned (either "dfrm", "tbl", "file") |
capt |
character with the caption of the table (not used in case data frame is returned) |
align |
alignment of the table passed to general_tbl (not used in case data frame is returned) |
... |
additional arguments passed to general_tbl |
This function can be used to create a table with the most important information of a data frame for documentation. The function will list the the number of records, subjects and variables of each data frame within dfv. This function is especially usable to indicate the differences between similar data frames or an overview of all data frames within a working environment
a data frame, code for table or nothing in case a PDF file is created
Richard Hooijmaijers
Theoph1 <- subset(Theoph,Subject!=1) Theoph2 <- subset(Theoph,Subject!=2) contents_df(c('Theoph1','Theoph2'),subject='Subject',ret='dfrm')Theoph1 <- subset(Theoph,Subject!=1) Theoph2 <- subset(Theoph,Subject!=2) contents_df(c('Theoph1','Theoph2'),subject='Subject',ret='dfrm')
This function calculates and reports counts and frequencies stratified by one or more variables within a data frame
counts_df( data, by, id = NULL, style = 1, ret = "tbl", capt = "Information multiple data frames", align = NULL, size = "\\footnotesize", ... )counts_df( data, by, id = NULL, style = 1, ret = "tbl", capt = "Information multiple data frames", align = NULL, size = "\\footnotesize", ... )
data |
data frame for which the table should be created |
by |
character identifying by variables within the data frame for stratification |
id |
character identifying the ID variable within the data frame (see details) |
style |
numeric with the type of output to return (see details) |
ret |
a character vector to define what kind of output should be returned (either "dfrm", "tbl", "file") |
capt |
character with the caption of the table (not used in case data frame is returned) |
align |
alignment of the table passed to general_tbl (not used in case data frame is returned) |
size |
character with font size as for the table general_tbl |
... |
additional arguments passed to general_tbl |
This function generates frequency tables, by default for the number of observation per strata. In case the id argument is used the function will also report the number and frequencies of distinct IDs. By default the observations and percentages are reported in separate columns (convenient for further processing). In case style is set to a value of 2, a single column is created that holds the observations and percentages in a formatted ways (convenient for tabulating)
a data frame, code for table or nothing in case a PDF file is created
Richard Hooijmaijers
data("Theoph") Theoph$trt <- ifelse(as.numeric(Theoph$Subject)<6,1,2) Theoph$sex <- ifelse(as.numeric(Theoph$Subject)<4,1,0) counts_df(data=Theoph, by=c("trt","sex"),id="Subject", ret="dfrm") counts_df(data=Theoph, by="sex",id="Subject", ret="dfrm") counts_df(data=Theoph, by=c("trt","sex"),id="Subject", style=2, ret="dfrm")data("Theoph") Theoph$trt <- ifelse(as.numeric(Theoph$Subject)<6,1,2) Theoph$sex <- ifelse(as.numeric(Theoph$Subject)<4,1,0) counts_df(data=Theoph, by=c("trt","sex"),id="Subject", ret="dfrm") counts_df(data=Theoph, by="sex",id="Subject", ret="dfrm") counts_df(data=Theoph, by=c("trt","sex"),id="Subject", style=2, ret="dfrm")
This function determines if subsequent dose items are exactly the same as the tau value. If this is the case it will count the number of times this occur and create the applicable number of additional dose levels and removes unnecessary rows
create_addl(data, datetime, id, dose, tau, evid = NULL)create_addl(data, datetime, id, dose, tau, evid = NULL)
data |
data frame to perform the action on |
datetime |
character identifying the date/time variable (POSIXct) within the data frame |
id |
character identifying the subject ID within the data frame |
dose |
character identifying the dose within the data frame (ADDL can only be set for equal doses) |
tau |
character identifying the tau (or II) within the data frame |
evid |
character identifying the event ID (EVID) within the data frame. This is used to distinguish observations from dosing records, e.g. 0 for observations |
a data frame with ADDL records added
Richard Hooijmaijers
dts <- c(Sys.time(),Sys.time() + (24*60*60),Sys.time() + (48*60*60),Sys.time() + (96*60*60)) data <- data.frame(id=1,dt=dts,dose=10,tau=24) create_addl(data=data, datetime="dt", id="id", dose="dose", tau="tau") data2 <- rbind(cbind(data,evid=1),data.frame(id=1,dt=dts[4]+60,dose=10,tau=24,evid=0)) create_addl(data=data2, datetime="dt", id="id", dose="dose", tau="tau", evid="evid")dts <- c(Sys.time(),Sys.time() + (24*60*60),Sys.time() + (48*60*60),Sys.time() + (96*60*60)) data <- data.frame(id=1,dt=dts,dose=10,tau=24) create_addl(data=data, datetime="dt", id="id", dose="dose", tau="tau") data2 <- rbind(cbind(data,evid=1),data.frame(id=1,dt=dts[4]+60,dose=10,tau=24,evid=0)) create_addl(data=data2, datetime="dt", id="id", dose="dose", tau="tau", evid="evid")
This function creates the define.pdf file necessary for esubmission.
define_tbl( attr = NULL, ret = "dfrm", capt = "Dataset define form", align = "lp{3cm}lp{8cm}", outnm = NULL, orientation = "portrait", size = "\\footnotesize", src = NULL, ... )define_tbl( attr = NULL, ret = "dfrm", capt = "Dataset define form", align = "lp{3cm}lp{8cm}", outnm = NULL, orientation = "portrait", size = "\\footnotesize", src = NULL, ... )
attr |
list with datasets attributes |
ret |
a character vector to define what kind of output should be returned (either "dfrm", "tbl", "file") |
capt |
character with the caption of the table |
align |
alignment of the table passed to general_tbl |
outnm |
character with the name of the tex file to generate and compile (e.g. define.tex) |
orientation |
character the page orientation in case a file is to be returned (can be either 'portrait' or 'landscape') |
size |
character with font size as for the table general_tbl |
src |
object that holds information regarding the source (e.g. |
... |
additional arguments passed to general_tbl |
a data frame, code for table or nothing in case a PDF file is created
Richard Hooijmaijers
xmpl <- system.file("example/Attr.Template.xlsx",package = "amp.dm") attrl <- attr_xls(xmpl) define_tbl(attrl)xmpl <- system.file("example/Attr.Template.xlsx",package = "amp.dm") attrl <- attr_xls(xmpl) define_tbl(attrl)
This function calculates Estimated Glomerular Filtration Rate (EGFR) values based on most commonly used formulas
egfr( scr = NULL, sex = NULL, age = NULL, race = NULL, ht = NULL, bun = NULL, scys = NULL, prem = NULL, bsa = NULL, formula = "CKD-EPI" )egfr( scr = NULL, sex = NULL, age = NULL, race = NULL, ht = NULL, bun = NULL, scys = NULL, prem = NULL, bsa = NULL, formula = "CKD-EPI" )
scr |
vector with Serum creatinine values in mg/dL |
sex |
vector with SEX values (where female is defined as a value of 1) |
age |
vector with AGE values in years |
race |
vector with RACE values (where caucasian is defined as 1, black as and Japanese as > 2) |
ht |
vector with HEIGHT values in cm |
bun |
vector with Blood urea nitrogen in mg/dL |
scys |
vector with Serum cystatin C in mg/L |
prem |
vector with PREM (premature) values (where PREM is defined as value of 1) |
bsa |
vector with BSA values in m2 provide in case correction should be applied (see details) |
formula |
character with the formula to be used for the EGFR calculations (see details) |
Currently there are different formulas available for calculations:
"CKD-EPI": EGFR according to the Chronic Kidney Disease Epidemiology (CKD-EPI) study formula (Levey):
where indicates the minimum of or 1; indicates the maximum of or 1.
scaling parameter is 0.9 for males and 0.7 for females and scaling parameter is -0.411 for males and -0.329 for females.
"CKD-EPI-ignore-race": EGFR according to the Chronic Kidney Disease Epidemiology (CKD-EPI) refit without race study formula (Delgado):
where indicates the minimum of or 1; indicates the maximum of or 1.
scaling parameter is 0.9 for males and 0.7 for females and scaling parameter is -0.302 for males and -0.241 for females.
"CKD-EPI-Scys", EGFR according to the Chronic Kidney Disease Epidemiology study formula (Inker):
where indicates the minimum of or 1; indicates the maximum of or 1.
"CKD-EPI-Scr-Scys", EGFR according to the Chronic Kidney Disease Epidemiology study formula (Inker):
where indicates the minimum of or 1; indicates the maximum of or 1,
and where indicates the minimum of or 1; indicates the maximum of or 1.
Scaling parameter k is 1 for males and 0.969 for female, scaling parameter l is 1 if White/Caucasian and 1.08 if Black/African American,
scaling parameter is 0.9 for males and 0.7 for females and scaling parameter is -0.207 for males and -0.248 for females.
"CKD-EPI-Scr-Scys-ignore-race", EGFR according to the Chronic Kidney Disease Epidemiology 2021 refit without race study formula (Delgado):
where indicates the minimum of or 1; indicates the maximum of or 1,
and where indicates the minimum of or 1; indicates the maximum of or 1.
Scaling parameter k is 1 for males and 0.963 for female, scaling parameter is 0.9 for males and 0.7 for females and
scaling parameter is -0.144 for males and -0.219 for females.
"CKD-EPI-Japan", EGFR in Japanese adults based on a Japanese coefficient-modified CKD-EPI equation (Horio):
where indicates the minimum of or 1; indicates the maximum of or 1.
Scaling parameter l is 1 for White/Caucasian, 1.159 for Black/African American, 0.813 for Japanese, scaling parameter is 0.9 for males and 0.7 for females and
scaling parameter is -0.411 for males and -0.329 for females.
"CKD-MDRD", EGFR according to the abbreviated Modification of Diet in Renal Disease study formula (Levey):
"CKD-MDRD2", EGFR according to the re-expressed Modification of Diet in Renal Disease (MDRD) study formula (Levey2007):
"Schwartz-original", EGFR in children, according to the original Schwartz formula (Schwartz1987):
where k = 0.33 in pre-term infants up to 1 year, k = 0.45 in full-term infants up to 1 year, k = 0.55 in children 1 year to 13 years, k = 0.55 in girls >13 and <18 years and k = 0.70 in boys >13 and <18 years.
"Schwartz-CKiD", EGFR in children, according to the Chronic Kidney Disease in Children (CKiD) revised Schwartz formula (Schwartz2012):
Scaling parameter k is 1 for males and 1.076 for females.
"Schwartz-1B", EGFR in children, according to the Chronic Kidney Disease in Children (CKiD) 1B Schwartz formula (Schwartz2009):
"Schwartz", EGFR in children, according to the updated ('bedside') Schwartz formula (Schwartz2009):
This equation is not meant for patients < 1 years of age.
"Mayo-Quadratic", EGFR according to the Quadratic Mayo Clinic formula (Rule).
If Scr < 0.8 mg/dL, a value of 0.8 is used in the equation.
"Matsuo-Japan", EGFR in Japanese adults, according to Matsuo:
For all of the calculation methods described above, the reported EGFR values are in the units "mL/minute/1.73m2". This means that the value is referenced to a body surface area (BSA) value of 1.73m2. When a value is provided for BSA, the final outcome will be corrected for the BSA value and the units become "mL/minute". This is done by multiplying the eGFR (referenced to a BSA of 1.73m2) with the individual's BSA (it is the users responsibility to proved BSA values that are calculated using the appropriate formula) and divided by 1.73. Additional information regarding this can be found in a FDA guidance document.
a vector with EGFR values
Richard Hooijmaijers
# dataset with dummy numbers! crea <- data.frame(id=c(1,1,2),Scr=runif(3),SEX=c(1,1,0),AGE=runif(3),RACE=c(1,1,2)) egfr(crea$Scr,crea$SEX,crea$AGE,crea$RACE, formula="CKD-EPI") # example for use in dplyr crea |> dplyr::mutate(EGFR = egfr(Scr,SEX, AGE, RACE, formula="CKD-EPI"))# dataset with dummy numbers! crea <- data.frame(id=c(1,1,2),Scr=runif(3),SEX=c(1,1,0),AGE=runif(3),RACE=c(1,1,2)) egfr(crea$Scr,crea$SEX,crea$AGE,crea$RACE, formula="CKD-EPI") # example for use in dplyr crea |> dplyr::mutate(EGFR = egfr(Scr,SEX, AGE, RACE, formula="CKD-EPI"))
This function expands ADDL and II records. This is done by placing each ADDL record on a separate line. This is convenient in case of individual dose calculations
expand_addl_ii(data, evid = NULL, del_iiaddl = TRUE)expand_addl_ii(data, evid = NULL, del_iiaddl = TRUE)
data |
data frame to perform the expansion on |
evid |
character identifying the event ID (EVID) within the data frame This is used to distinguish observations from dosing records, e.g. 0 for observations |
del_iiaddl |
logical identifying if the ADDL and II variables can be deleted from output |
The function expects that certain variables are present in the data (at least ID, TIME, ADDL and II)
a data frame with expanded dose records
Richard Hooijmaijers
dfrm <- data.frame(ID=c(1,1), TIME=c(0,12),II=c(12,0),ADDL=c(5,0),AMT=c(10,0),EVID=c(1,0)) expand_addl_ii(dfrm,evid="EVID")dfrm <- data.frame(ID=c(1,1), TIME=c(0,12),II=c(12,0),ADDL=c(5,0),AMT=c(10,0),EVID=c(1,0)) expand_addl_ii(dfrm,evid="EVID")
This function can be used in case a start and end date is known for dosing. This function fills down the dates so each date between start and end is placed on a separate row. Subsequently the dataset can be used to merge with available date information and impute missing dates.
fill_dates(data, start, end, tau = 1, repdat = 1)fill_dates(data, start, end, tau = 1, repdat = 1)
data |
data frame for which the dates should be filled down |
start |
character identifying the start date (as date format) within the data frame |
end |
character identifying the end date (as date format) within the data frame |
tau |
integer with the tau in days (e.g. 2 for dosing every other day) |
repdat |
integer with repeats per day (e.g. 2 in case of twice daily dosing) |
a data frame with filled out dates
Richard Hooijmaijers
dfrm <- data.frame(ID=1:2,first=as.Date(c("2016-10-01","2016-12-01"), "%Y-%m-%d"), last=as.Date(c("2016-10-03","2016-12-02"), "%Y-%m-%d")) fill_dates(dfrm, "first", "last") fill_dates(dfrm, "first", "last", 2, 3)dfrm <- data.frame(ID=1:2,first=as.Date(c("2016-10-01","2016-12-01"), "%Y-%m-%d"), last=as.Date(c("2016-10-03","2016-12-02"), "%Y-%m-%d")) fill_dates(dfrm, "first", "last") fill_dates(dfrm, "first", "last", 2, 3)
This function is a wrapper around dplyr::filter. Additional actions are performed on the background to log the information of the filter action, and info regarding the step is printed.
filterr(.data, ..., comment = "")filterr(.data, ..., comment = "")
.data |
the data frame for which the filter should be created |
... |
arguments passed to dplyr::filter |
comment |
character with the reason of filtering used in log file |
The function can be used to keep track of records that are omitted in the data management process. In general one would like to keep all records from the source database (and use flags instead to exclude data) but in cases where this is not possible it is important to know what records are omitted and for which reason. Every time the function is used it creates a records in in a log file which can be used in the documentation.
a filtered data frame
Richard Hooijmaijers
# For full trace-ability of source data, no pipes # are preferred dat1 <- filterr(Theoph,Subject==1) dat2 <- Theoph |> filterr(Subject==2) # Show what is being logged get_log()$filterr_nfo# For full trace-ability of source data, no pipes # are preferred dat1 <- filterr(Theoph,Subject==1) dat2 <- Theoph |> filterr(Subject==2) # Show what is being logged get_log()$filterr_nfo
This function creates a flag identifying the outliers in a vector
flag_outliers(var, type = "boxstat")flag_outliers(var, type = "boxstat")
var |
numeric vector that should be checked for outliers |
type |
character with the type of test to perform for outliers (currently only the "boxstats" is available that uses the boxplot method) |
a numeric vector the same length as var with either 0 (no outlier) or 1 (outlier)
Richard Hooijmaijers
dfrm <- data.frame(a = 1:10, b = c(1:9,50)) flag_outliers(dfrm$a) flag_outliers(dfrm$b)dfrm <- data.frame(a = 1:10, b = c(1:9,50)) flag_outliers(dfrm$a) flag_outliers(dfrm$b)
Returns one or more dataframes with log information related to function like filterr, left_joinr, cmnt, srce and read_data
get_log()get_log()
a named list of dataframes
Richard Hooijmaijers
xldat <- readxl::readxl_example("datasets.xlsx") xlin <- read_data(xldat, comment="read test") get_log()xldat <- readxl::readxl_example("datasets.xlsx") xlin <- read_data(xldat, comment="read test") get_log()
Get the current script name (either interactive Rstudio, markdown or batch script)
get_script(base = TRUE, noext = TRUE)get_script(base = TRUE, noext = TRUE)
base |
logical indicating if the basename should be returned (without path) |
noext |
logical indicating if the file extension should be omitted |
character with the current script name
Richard Hooijmaijers
The function will impute all NA values with either a given statistic (e.g. median) or with the largest group
impute_covar(var, uniques = NULL, type = "median", verbose = FALSE)impute_covar(var, uniques = NULL, type = "median", verbose = FALSE)
var |
vector with the items that should be imputed |
uniques |
vector that defines unique records to enable calculation of stats on non duplicate values |
type |
character of length one defining the type of statistics to perform for imputation (see details) |
verbose |
logical indicating if additional information should be given |
The function can be used to impute continuous or categorical covariates. In case continuous covariates the type argument should be a statistic like median or mean. In case a categorical covariate is used, the type should be set to 'largest' in which case the category that occurs most is used. In case multiple values occur most, the last encountered is used.
a vector where missing values are imputed
Richard Hooijmaijers
dfrm <- data.frame(num1 = c(NA,110)) impute_covar(dfrm$num1,type="median")dfrm <- data.frame(num1 = c(NA,110)) impute_covar(dfrm$num1,type="median")
This function imputes dose records by looking at the missing doses between available records based on a given II value
impute_dose(data, id, datetime, amt = "AMT", ii = "II", thr = 50, back = TRUE)impute_dose(data, id, datetime, amt = "AMT", ii = "II", thr = 50, back = TRUE)
data |
data frame to perform the calculations on |
id |
character identifying the id variable within the data frame |
datetime |
character identifying the date/time variable (POSIXct) within the data frame |
amt |
character identifying the AMT (amount) variable within the data frame |
ii |
character identifying the II (Interdose Interval) variable within the data frame |
thr |
numeric indicating the threshold percentage for imputation (see details) |
back |
logical indicating if the time for imputed doses should be taken from the previous record (TRUE) or the next (FALSE) |
The function fills in the doses by looking at the time difference and II between all records. Be aware that this can result in unexpected results in a few cases, so results should always be handled with care. The function will determine if a dose should be imputed based on the 'thr' value. For each dose, the function determines the percentage difference from the previous dose based on the II value. In case the expected difference is above the threshold value, imputation will be done.
a data frame with imputed dose records
Richard Hooijmaijers
dfrm <- data.frame(id=c(1,1), dt=c(Sys.time(),Sys.time()+ 864120), II=c(24,24),AMT=c(10,10)) impute_dose(dfrm,"id","dt")dfrm <- data.frame(id=c(1,1), dt=c(Sys.time(),Sys.time()+ 864120), II=c(24,24),AMT=c(10,10)) impute_dose(dfrm,"id","dt")
This function is a wrapper around dplyr::left_join. Additional actions are performed on the background to log the information of the join action, and info regarding the step is printed.
left_joinr(x, y, by = NULL, ..., comment = "", keepids = FALSE)left_joinr(x, y, by = NULL, ..., comment = "", keepids = FALSE)
x, y
|
a pair of data frames used for joining |
by |
character vector of variables to join by |
... |
additional arguments passed to dplyr::left_join |
comment |
information for the reason of merging |
keepids |
logical indicating if merge identifiers should be available in output data (for checking purposes) |
The function can be used to keep track of records that are available after a join in the data management process. Joining of data could lead to unexpected results, e.g. creation of cartesian product or loosing data. Therefore it is important to know what the result of a join step is. Every time the function is used it creates a records in in a log file which can be used in the documentation.
a joined data frame
Richard Hooijmaijers
dose <- data.frame(Subject = unique(Theoph$Subject), dose = sample(1:3,length(unique(Theoph$Subject)), replace = TRUE)) dose2 <- dose[dose$Subject%in%1:6,] # Preferred to explicitly list by dat1 <- left_joinr(Theoph, dose, by="Subject") # The base R pipe is preferred for better logging of source data dat2 <- Theoph |> left_joinr(dose, by="Subject") # Avoid long pipes before function for readability in log. e.g dont: dat3 <- Theoph |> dplyr::mutate(ID=3) |> dplyr::bind_cols(X=3) |> left_joinr(dose, by="Subject") # Show what is being logged get_log()$joinr_nfodose <- data.frame(Subject = unique(Theoph$Subject), dose = sample(1:3,length(unique(Theoph$Subject)), replace = TRUE)) dose2 <- dose[dose$Subject%in%1:6,] # Preferred to explicitly list by dat1 <- left_joinr(Theoph, dose, by="Subject") # The base R pipe is preferred for better logging of source data dat2 <- Theoph |> left_joinr(dose, by="Subject") # Avoid long pipes before function for readability in log. e.g dont: dat3 <- Theoph |> dplyr::mutate(ID=3) |> dplyr::bind_cols(X=3) |> left_joinr(dose, by="Subject") # Show what is being logged get_log()$joinr_nfo
This function creates a table including information on functions that log information such as
filterr, left_joinr and read_data
log_df( log, type, coding = FALSE, ret = "dfrm", capt = NULL, align = NULL, size = "\\footnotesize", ... )log_df( log, type, coding = FALSE, ret = "dfrm", capt = NULL, align = NULL, size = "\\footnotesize", ... )
log |
list with logged information typically obtained with get_log |
type |
character with the type of info that should be taken from log (either "filterr_nfo","joinr_nfo" or "read_nfo") |
coding |
logical indicating if the coding (within |
ret |
a character vector to define what kind of output should be returned (either "dfrm", "tbl", "file") |
capt |
character with the caption of the table (not used in case data frame is returned) |
align |
alignment of the table passed to general_tbl (not used in case data frame is returned) |
size |
character with font size as for the table general_tbl |
... |
additional arguments passed to general_tbl |
This function generates information for function that logs information. It attempts to find a good alignment and caption, mainly for outputting to a table. It is possible to set your own captions and alignment, take into account that the alignment differs per type and in case the coding argument is changed.
function creates a PDF file or returns a data frame
Richard Hooijmaijers
dat1 <- filterr(Theoph,Subject==1) dat2 <- Theoph |> filterr(Subject==2) log_df(get_log(), "filterr_nfo")dat1 <- filterr(Theoph,Subject==1) dat2 <- Theoph |> filterr(Subject==2) log_df(get_log(), "filterr_nfo")
This function will change the file attributes so only read access is set
make_readonly(x)make_readonly(x)
x |
character of length 1 with the path that contains files or character vector with filenames to be set to read-only |
This function will attempt to set a read-only attributes on files. This is either done through system commands
such as attrib for windows and chmod for linux (444). With the latter take into account possible issues
with (sudo) rights on files.
In case x is a directory, the function will set readonly attribute for all files in the folder (and recurse into all subfolders!).
nothing is returned, only system commands are issued
Richard Hooijmaijers
## Not run: tmpf <- tempfile(fileext = ".txt") cat("test",file=tmpf) make_readonly(tmpf) ## End(Not run)## Not run: tmpf <- tempfile(fileext = ".txt") cat("test",file=tmpf) make_readonly(tmpf) ## End(Not run)
This function is mainly a wrapper for forcats::fct_lump but applied on numeric variables. Furthermore there is the option to use uniques to determine small categories for instance on individual level
num_lump(x, lumpcat = 99, uniques = NULL, prop = NULL, min = NULL, ...)num_lump(x, lumpcat = 99, uniques = NULL, prop = NULL, min = NULL, ...)
x |
numeric vector with the items that should be lumped |
lumpcat |
the category in which the lumped levels should be added (see details) |
uniques |
vector that defines unique records to enable lumping on non duplicate values |
prop |
numeric with the threshold proportions for lumping |
min |
numeric with the min number of times a level should appear to not lump |
... |
additional arguments passed to forcats::fct_lump_min and/or forcats::fct_lump_prop |
The argument lumpcat is the level in which lumped values should appear and can be one of the following:
numeric with the category number to set the levels to
character specifying "largest" to select the largest category (selected before lumping)
named vector to set the 'algorithm' for instance: c('5'='3', '4'='6') to set category 5 to 3 and 4 to 6 when these categories need lumping
vector with the lumping applied
Richard Hooijmaijers
dfrm <- data.frame(id = 1:30, cat = c(rep(1,8),rep(2,13), rep(3,4),rep(4,5))) num_lump(x=dfrm$cat, lumpcat=99, prop=0.15)dfrm <- data.frame(id = 1:30, cat = c(rep(1,8),rep(2,13), rep(3,4),rep(4,5))) num_lump(x=dfrm$cat, lumpcat=99, prop=0.15)
This function exports data for NONMEM modeling analyses including options that are frequently necessary to adapt.
output_data( x, csv = NULL, xpt = NULL, attr = NULL, verbose = TRUE, maxdig = 6, tonum = TRUE, firstesc = NULL, readonly = FALSE, overwrite = TRUE, ... )output_data( x, csv = NULL, xpt = NULL, attr = NULL, verbose = TRUE, maxdig = 6, tonum = TRUE, firstesc = NULL, readonly = FALSE, overwrite = TRUE, ... )
x |
data frame to be exported. |
csv |
character with the name of the csv file to generate |
xpt |
character with the name of the xpt file to generate |
attr |
character with the name of the rds file to generate |
verbose |
logical indicating if additional information should be written to the console |
maxdig |
numeric with the maximum number of decimals for numeric variables to be in output (see details) |
tonum |
logical indicating if all variables should be made numeric (standard for NONMEM input files) |
firstesc |
character with escape character for first variable, used to exclude row in NONMEM |
readonly |
logical indicating if the output csv file should be made readonly |
overwrite |
logical indicating if (all) output should be overwritten or not |
... |
Arguments passed to |
In case tonum is TRUE, all variables will be made numeric and Inf values will be
set to NA (all NA values will be set to a dot). The rounding set in maxdig
will only be done in case tonum is set to TRUE. For xpt files, the name of the object to export is used as the name
of the dataset inside the xpt file (e.g. output_data(dfrm,xpt='dataset.xpt') will result in an xpt file named
'dataset.xpt' with one dataset named 'dfrm').
a data frame is written to disk
Richard Hooijmaijers
data(Theoph) out_file <- tempfile(fileext = ".csv") output_data(Theoph, csv = out_file, tonum = FALSE)data(Theoph) out_file <- tempfile(fileext = ".csv") output_data(Theoph, csv = out_file, tonum = FALSE)
This function creates histograms for numeric values and barcharts for character or factor variables In case there are more then 10 unique values it will list the first 10 unique values in a 'text' plot
plot_vars(dfrm, vars = names(dfrm), ppp = 16, ...)plot_vars(dfrm, vars = names(dfrm), ppp = 16, ...)
dfrm |
data frame that should be plotted |
vars |
character vector with the variables for which plots should be generated |
ppp |
number plots per page |
... |
additional arguments passed to |
a ggplot/patchwork object with all variables plotted visualized
Richard Hooijmaijers
plot_vars(Theoph)plot_vars(Theoph)
This function reads external data with support for file types that are most commonly used in clinical and pre-clinical data, and provide manual functions for less common types
read_data( file, manfunc = NULL, comment = "", verbose = TRUE, ascii_check = "xls", ... )read_data( file, manfunc = NULL, comment = "", verbose = TRUE, ascii_check = "xls", ... )
file |
character with the name of the file to read (see details for more information) |
manfunc |
character with the manual function to use to read data (can have the form "package::function") |
comment |
character with comment/information for the data that was read-in |
verbose |
logical indicating if information regarding the data that is read is printed in console |
ascii_check |
character of length one defining if the data that has been read in should be checked for valid ASCII characters (see details) |
... |
Additional arguments passed to the read data functions. This can be used to add arguments to for instance read.table or read_excel or for the function defined in manfunc |
The function reads in data, and uses the file extension to select the applicable function for reading. Below is a list of extensions that are recognized and the corresponding function that is used to read the data:
sas7bdat: haven::read_sas
xpt: haven::read_xpt
xls/xlsx: readxl::read_excel
prn/par: read.table
csv: read.csv
The prn and par file formats are basically space delimited files but with some specifics for modeling software
(e.g. prn is NONMEM input file with header starting with '#' and par is NONMEM output file as defined in $TABLE).
This function can be used to read any type of data by using the manfunc
argument. Any function available in R can be used here and even user written functions (see example section).
This argument has precedence over the recognition of extensions. This means for instance that a CSV file can
also be read-in using a different function (e.g. using data.table::fread).
This flexibility is build in to ensure all possible data can be read in using this single function. This is mainly
important for documentation purposes, to ensure all used data can be logged and documented.
The data can be checked for valid ASCII characters using the "ascii_check" argument. By default this is done for
excel files with extension xls or xlsx (ascii_check="xls") other options are "none" to never perform a check
or "all" to perform a check regardless of the way it is read in. The default is chosen as it is likely
that excel files are created manually and could therefore include non ASCII characters, and because
it puts additional overhead on function otherwise.
data frame containing a representation of the data in the file
Richard Hooijmaijers
# For a known filetype you can use: dat <- read_data(paste0(R.home(),"/doc/CRAN_mirrors.csv")) # We can use the arguments from the underlying package that does the reading xmpl <- system.file("example/Attr.Template.xlsx",package = "amp.dm") dat <- read_data(xmpl, range="A2:B3") # In case we get a file format not directly supported by the function # we can use the manfunc to use another function sav <- system.file("examples", "iris.sav", package = "haven") dat <- read_data(sav,manfunc = "haven::read_sav") # It is also possible to write your own function that reads data # (as long as it returns a data.frame or tibble), e.g.: read_nd <- function(x) read.csv(x) |> dplyr::distinct(ID, .keep_all = TRUE) xmpl <- system.file("example/NM.theoph.V1.csv", package = "amp.dm") dat <- read_data(xmpl,manfunc = "read_nd")# For a known filetype you can use: dat <- read_data(paste0(R.home(),"/doc/CRAN_mirrors.csv")) # We can use the arguments from the underlying package that does the reading xmpl <- system.file("example/Attr.Template.xlsx",package = "amp.dm") dat <- read_data(xmpl, range="A2:B3") # In case we get a file format not directly supported by the function # we can use the manfunc to use another function sav <- system.file("examples", "iris.sav", package = "haven") dat <- read_data(sav,manfunc = "haven::read_sav") # It is also possible to write your own function that reads data # (as long as it returns a data.frame or tibble), e.g.: read_nd <- function(x) read.csv(x) |> dplyr::distinct(ID, .keep_all = TRUE) xmpl <- system.file("example/NM.theoph.V1.csv", package = "amp.dm") dat <- read_data(xmpl,manfunc = "read_nd")
This function creates a latex table or data frame with information from the R session (sessionInfo() and Sys.info())
session_tbl( ret = "tbl", capt = "Session info", align = "lp{8cm}", size = "\\footnotesize", incscript = FALSE, ... )session_tbl( ret = "tbl", capt = "Session info", align = "lp{8cm}", size = "\\footnotesize", incscript = FALSE, ... )
ret |
a character vector to define what kind of output should be returned (either "dfrm", "tbl", "file") |
capt |
character with the caption of the table (not used in case data frame is returned) |
align |
alignment of the table passed to general_tbl (not used in case data frame is returned) |
size |
character with font size as for the table general_tbl |
incscript |
logical indicating if the name of the script should be included (using get_script) |
... |
additional arguments passed to general_tbl |
This function can be used to create a table with the most important information of a R session, the user that is running the R session and the current date/time
a data frame, code for table or nothing in case a PDF file is created
Richard Hooijmaijers
session_tbl()session_tbl()
Adds the source of variables into package environment, which can be used in code chunks at the applicable locations and easily added to documentation afterwards
srce(var, source, type = "c")srce(var, source, type = "c")
var |
unquoted string with the variable for which the source should be defined |
source |
unquoted strings with the source(s) used for var (see example) |
type |
character with the type of variable can be either 'c' (copied) or 'd' (derived) |
no return value, called for side effects
Richard Hooijmaijers
# variable AMT copied from Dose variable in Theoph data frame srce(AMT,Theoph.Dose) # variable BMI derived from WEIGHT variable in wt data frame # and HEIGHT variable in ht data frame srce(BMI,c(wt.WEIGHT,ht.HEIGHT),'d') get_log()$srce_nfo# variable AMT copied from Dose variable in Theoph data frame srce(AMT,Theoph.Dose) # variable BMI derived from WEIGHT variable in wt data frame # and HEIGHT variable in ht data frame srce(BMI,c(wt.WEIGHT,ht.HEIGHT),'d') get_log()$srce_nfo
This function creates a latex table or data frame with the basic statistics of a data frame.
stats_df( data, missingval = -999, ret = "tbl", capt = "Statistics data frame", align = "p{2cm}p{1cm}p{1cm}p{4cm}p{1.7cm}p{1.7cm}p{0.8cm}p{1.3cm}", size = "\\footnotesize", ... )stats_df( data, missingval = -999, ret = "tbl", capt = "Statistics data frame", align = "p{2cm}p{1cm}p{1cm}p{4cm}p{1.7cm}p{1.7cm}p{0.8cm}p{1.3cm}", size = "\\footnotesize", ... )
data |
a data frame for which the overview should be created |
missingval |
numeric with the value that indicates missing values, if NULL no missings will be recorded |
ret |
a character vector to define what kind of output should be returned (either "dfrm", "tbl", "file") |
capt |
character with the caption of the table (not used in case data frame is returned) |
align |
alignment of the table passed to general_tbl (not used in case data frame is returned) |
size |
character with font size as for the table general_tbl |
... |
additional arguments passed to general_tbl |
This function can be used to create a table with basic statistics of a data frame. The function will list the min, max, number of NA/missing values, number of unique categories and type of variable of all data items within a data frame. In case a data item has less than 10 unique categories, it will list the unique values. The main reason to use this function is to create a structured table with statistics of a data frame to be included in documentation.
either tex code for table a data frame
Richard Hooijmaijers
stats_df(Theoph)stats_df(Theoph)
This function calculates the most important time variables based on multiple variables in a data frame.
time_calc( data, datetime, evid = NULL, addl = NULL, ii = NULL, amt = "AMT", id = "ID", dig = 2 )time_calc( data, datetime, evid = NULL, addl = NULL, ii = NULL, amt = "AMT", id = "ID", dig = 2 )
data |
data frame to perform the calculations on |
datetime |
character identifying the date/time variable (POSIXct) within the data frame |
evid |
character identifying the event ID (EVID) within the data frame |
addl |
character identifying the additional dose levels (ADDL) within the data frame |
ii |
character identifying the interdose interval (II) within the data frame |
amt |
character identifying the amount variable (only needed if |
id |
character identifying the ID or subject variable |
dig |
numeric indicating with how many digits the resulting times should be available |
The function calculates the TIME, TALD (time after last dose) and TAFD (time after first dose). The different time variables are calculated in hours, where a POSIXct for the datetime variable is expected. The function takes into account ADDL and II records when provided, to correctly identify the TALD. One must be cautious however when having ADDL/II and a complex dosing schedule (e.g. alternating dose schedules, more than 1 dose per day, multiple compounds administration). In general it is good practice to double check the output for multiple subjects in case of complex designs including ADDL/II.
a data frame with the calculated times
Richard Hooijmaijers
dfrm <- data.frame(ID=rep(1,5), dt=Sys.time() + c(0,8e+5,1e+6,2e+6,3e+6), AMT=c(NA,10,NA,0,NA), II=rep(24,5),EVID=c(0,1,0,1,0)) time_calc(dfrm,"dt","EVID")dfrm <- data.frame(ID=rep(1,5), dt=Sys.time() + c(0,8e+5,1e+6,2e+6,3e+6), AMT=c(NA,10,NA,0,NA), II=rep(24,5),EVID=c(0,1,0,1,0)) time_calc(dfrm,"dt","EVID")
This function calculates different variables based on weight and height and conversion from or to kilograms
weight_height(wt = NULL, ht = NULL, sex = NULL, bmi = NULL, type = "bmi")weight_height(wt = NULL, ht = NULL, sex = NULL, bmi = NULL, type = "bmi")
wt |
vector with weight values, in either kg or lb depending on the type (see details) |
ht |
vector with height values in cm (see details) |
sex |
vector with SEX values (Where female is defined as a value of 1) |
bmi |
vector with BMI values (see details) |
type |
character with the type to be used for the calculations (see details) |
Currently the following types are defined within the function:
"kg-lb" : Convert units from kg to lb using the formula
"lb-kg" : Convert units from lb to kg using the formula
"bmi" : Calculates body mass index (BMI) using the standard formula (Quetelet1842,
"bsa": Body Surface Area, according to Gehan and Georg,
"bsa2": Body Surface Area, according to DuBois and DuBois,
"bsam": Body Surface Area, according to Mosteller,
"bsah": Body Surface Area, according to Haycock,
"bsal": Body Surface Area in normal-weight and obese adults up to 250 kg, according to Livingston,
"ffmj": Fat free mass, according to Janmahasatian:
, where is 6680 for males and 8780 for females and is 216 for males and 244 for females.
"ffms": Fat free mass in Indian patients, according to Sinha:
, where is 6680 for males and 8780 for females and is 0.77 for males and 0.70 for females.
"lbmb" : Calculates lean body mass (LBM), according to Boers:
, where is 0.407 for males and 0.252 for females, and is 0.267 for males and 0.473 for females, and is 19.2 for males and 48.3 for females.
"lbmj" : Calculates lean body mass (LBM), according to James:
, where is 1.10 for males and 1.07 for females, and is 128 in males and 148 in females.
"lbmp" : Calculates lean body mass (LBM) for children up to 14 years, according to Peters:
"pnw" : Calculates the Predicted Normal Weight for obese patient, according to Duffull:
, where is 1.57 for males and 1.75 for females, and is 0.0183 for males and 0.0242 for females, and is 10.5 for males and 12.6 for females.
a vector with calculated values
Richard Hooijmaijers
tmp <- data.frame(id=1,WT=runif(3,70,120),HT=runif(3,160,220)) weight_height(wt=tmp$WT,ht=tmp$HT,type="bmi") # example for use in dplyr tmp |> dplyr::mutate(BMI = weight_height(wt=WT,ht=HT,type="bmi"))tmp <- data.frame(id=1,WT=runif(3,70,120),HT=runif(3,160,220)) weight_height(wt=tmp$WT,ht=tmp$HT,type="bmi") # example for use in dplyr tmp |> dplyr::mutate(BMI = weight_height(wt=WT,ht=HT,type="bmi"))