Read by QxMD icon Read

Computational Statistics & Data Analysis

Kevin He, Jian Kang, Hyokyoung G Hong, Ji Zhu, Yanming Li, Huazhen Lin, Han Xu, Yi Li
Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors far greater than the sample size. In order to identify more novel biomarkers and understand biological mechanisms, it is vital to detect signals weakly associated with outcomes among ultrahigh-dimensional predictors. However, existing screening methods, which typically ignore correlation information, are likely to miss weak signals. By incorporating the inter-feature dependence, a covariance-insured screening approach is proposed to identify predictors that are jointly informative but marginally weakly associated with outcomes...
April 2019: Computational Statistics & Data Analysis
Kan Li, Sheng Luo
A multivariate functional joint model framework is proposed which enables the repeatedly measured functional outcomes, scalar outcomes, and survival process to be modeled simultaneously while accounting for association among the multiple (functional and scalar) longitudinal and survival processes. This data structure is increasingly common across medical studies of neurodegenerative diseases and is exemplified by the motivating Alzheimer's Disease Neuroimaging Initiative (ADNI) study, in which serial brain imaging, clinical and neuropsychological assessments are collected to measure the progression of Alzheimer's disease (AD)...
January 2019: Computational Statistics & Data Analysis
Sy Han Chiou, Jing Qian, Elizabeth Mormino, Rebecca A Betensky
Truncated survival data arise when the event time is observed only if it falls within a subject-specific region, known as the truncation set. Left-truncated data arise when there is delayed entry into a study, such that subjects are included only if their event time exceeds some other time. Quasi-independence of truncation and failure refers to factorization of their joint density in the observable region. Under quasi-independence, standard methods for survival data such as the Kaplan-Meier estimator and Cox regression can be applied after simple adjustments to the risk sets...
December 2018: Computational Statistics & Data Analysis
David Lenis, Benjamin Ackerman, Elizabeth A Stuart
Model misspecification is a potential problem for any parametric-model based analysis. However, the measurement and consequences of model misspecification have not been well formalized in the context of causal inference. A measure of model misspecification is proposed, and the consequences of model misspecification in non-experimental causal inference methods are investigated. The metric is then used to explore which estimators are more sensitive to misspecification of the outcome and/or treatment assignment model...
December 2018: Computational Statistics & Data Analysis
Sixia Chen, David Haziza
A novel jackknife empirical likelihood method for constructing confidence intervals for multiply robust estimators is proposed in the context of missing data. Under mild regularity conditions, the proposed jackknife empirical likelihood ratio has been shown to converge to a standard chi-square distribution. A simulation study supports the findings and shows the benefits of the proposed method. The latter has also been applied to 2016 National Health Interview Survey data.
November 2018: Computational Statistics & Data Analysis
Xingjie Shi, Yuan Huang, Jian Huang, Shuangge Ma
Penalization is a popular tool for multi- and high-dimensional data. Most of the existing computational algorithms have been developed for convex loss functions. Nonconvex loss functions can sometimes generate more robust results and have important applications. Motivated by the BLasso algorithm, this study develops the Forward and Backward Stagewise (Fabs) algorithm for nonconvex loss functions with the adaptive Lasso (aLasso) penalty. It is shown that each point along the Fabs paths is a δ -approximate solution to the aLasso problem and the Fabs paths converge to the stationary points of the aLasso problem when δ goes to zero, given that the loss function has second-order derivatives bounded from above...
August 2018: Computational Statistics & Data Analysis
Dewei Wang, Christopher S McMahan, Joshua M Tebbs, Christopher R Bilder
Screening procedures for infectious diseases, such as HIV, often involve pooling individual specimens together and testing the pools. For diseases with low prevalence, group testing (or pooled testing) can be used to classify individuals as diseased or not while providing considerable cost savings when compared to testing specimens individually. The pooling literature is replete with group testing case identification algorithms including Dorfman testing, higher-stage hierarchical procedures, and array testing...
June 2018: Computational Statistics & Data Analysis
Unkyung Lee, Yanqing Sun, Thomas H Scheike, Peter B Gilbert
The cumulative incidence function quantifies the probability of failure over time due to a specific cause for competing risks data. The generalized semiparametric regression models for the cumulative incidence functions with missing covariates are investigated. The effects of some covariates are modeled as non-parametric functions of time while others are modeled as parametric functions of time. Different link functions can be selected to add flexibility in modeling the cumulative incidence functions. The estimation procedures based on the direct binomial regression and the inverse probability weighting of complete cases are developed...
June 2018: Computational Statistics & Data Analysis
Sebastian J Teran Hidalgo, Michael C Wu, Stephanie M Engel, Michael R Kosorok
Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well...
June 2018: Computational Statistics & Data Analysis
So Young Park, Luo Xiao, Jayson D Willbur, Ana-Maria Staicu, N L'ntshotsholé Jumbe
A joint design for sampling functional data is proposed to achieve optimal prediction of both functional data and a scalar outcome. The motivating application is fetal growth, where the objective is to determine the optimal times to collect ultrasound measurements in order to recover fetal growth trajectories and to predict child birth outcomes. The joint design is formulated using an optimization criterion and implemented in a pilot study. Performance of the proposed design is evaluated via simulation study and application to fetal ultrasound data...
June 2018: Computational Statistics & Data Analysis
Hua Ma, Andriy I Bandos, David Gur
Assessing performance of diagnostic markers is a necessary step for their use in decision making regarding various conditions of interest in diagnostic medicine and other fields. Globally useful markers could, however, have ranges of values that are "diagnostically non-informative". This paper demonstrates that the presence of marker values from diagnostically non-informative ranges could lead to a loss in statistical efficiency during nonparametric evaluation and shows that grouping non-informative values provides a natural resolution to this problem...
January 2018: Computational Statistics & Data Analysis
Sheila Gaynor, Eric Bair
Cluster analysis methods are used to identify homogeneous subgroups in a data set. In biomedical applications, one frequently applies cluster analysis in order to identify biologically interesting subgroups. In particular, one may wish to identify subgroups that are associated with a particular outcome of interest. Conventional clustering methods generally do not identify such subgroups, particularly when there are a large number of high-variance features in the data set. Conventional methods may identify clusters associated with these high-variance features when one wishes to obtain secondary clusters that are more interesting biologically or more strongly associated with a particular outcome of interest...
December 2017: Computational Statistics & Data Analysis
Feipeng Zhang, Qunhua Li
Expectile regression is a useful tool for exploring the relation between the response and the explanatory variables beyond the conditional mean. A continuous threshold expectile regression is developed for modeling data in which the effect of a covariate on the response variable is linear but varies below and above an unknown threshold in a continuous way. The estimators for the threshold and the regression coefficients are obtained using a grid search approach. The asymptotic properties for all the estimators are derived, and the estimator for the threshold is shown to achieve root-n consistency...
December 2017: Computational Statistics & Data Analysis
Keunbaik Lee, Changryong Baek, Michael J Daniels
In longitudinal studies, serial dependence of repeated outcomes must be taken into account to make correct inferences on covariate effects. As such, care must be taken in modeling the covariance matrix. However, estimation of the covariance matrix is challenging because there are many parameters in the matrix and the estimated covariance matrix should be positive definite. To overcomes these limitations, two Cholesky decomposition approaches have been proposed: modified Cholesky decomposition for autoregressive (AR) structure and moving average Cholesky decomposition for moving average (MA) structure, respectively...
November 2017: Computational Statistics & Data Analysis
Chang Yu, Daniel Zelterman
Microarray studies generate a large number of p-values from many gene expression comparisons. The estimate of the proportion of the p-values sampled from the null hypothesis draws broad interest. The two-component mixture model is often used to estimate this proportion. If the data are generated under the null hypothesis, the p-values follow the uniform distribution. What is the distribution of p-values when data are sampled from the alternative hypothesis? The distribution is derived for the chi-squared test...
October 2017: Computational Statistics & Data Analysis
Zheyu Wang, Krisztian Sebestyen, Sarah E Monsell
A model-based clustering method is proposed to address two research aims in Alzheimer's disease (AD): to evaluate the accuracy of imaging biomarkers in AD prognosis, and to integrate biomarker information and standard clinical test results into the diagnoses. One challenge in such biomarker studies is that it is often desired or necessary to conduct the evaluation without relying on clinical diagnoses or some other standard references. This is because (1) biomarkers may provide prognostic information long before any standard reference can be acquired; (2) these references are often based on or provide unfair advantage to standard tests...
September 2017: Computational Statistics & Data Analysis
S Faye Williamson, Peter Jacko, Sofía S Villar, Thomas Jaki
Development of treatments for rare diseases is challenging due to the limited number of patients available for participation. Learning about treatment effectiveness with a view to treat patients in the larger outside population, as in the traditional fixed randomised design, may not be a plausible goal. An alternative goal is to treat the patients within the trial as effectively as possible. Using the framework of finite-horizon Markov decision processes and dynamic programming (DP), a novel randomised response-adaptive design is proposed which maximises the total number of patient successes in the trial and penalises if a minimum number of patients are not recruited to each treatment arm...
September 2017: Computational Statistics & Data Analysis
Andrew G Chapple, Marina Vannucci, Peter F Thall, Steven Lin
A variable selection procedure is developed for a semi-competing risks regression model with three hazard functions that uses spike-and-slab priors and stochastic search variable selection algorithms for posterior inference. A rule is devised for choosing the threshold on the marginal posterior probability of variable inclusion based on the Deviance Information Criterion (DIC) that is examined in a simulation study. The method is applied to data from esophageal cancer patients from the MD Anderson Cancer Center, Houston, TX, where the most important covariates are selected in each of the hazards of effusion, death before effusion, and death after effusion...
August 2017: Computational Statistics & Data Analysis
Hongxiao Zhu, Jeffrey S Morris, Fengrong Wei, Dennis D Cox
Many scientific studies measure different types of high-dimensional signals or images from the same subject, producing multivariate functional data. These functional measurements carry different types of information about the scientific process, and a joint analysis that integrates information across them may provide new insights into the underlying mechanism for the phenomenon under study. Motivated by fluorescence spectroscopy data in a cervical pre-cancer study, a multivariate functional response regression model is proposed, which treats multivariate functional observations as responses and a common set of covariates as predictors...
July 2017: Computational Statistics & Data Analysis
Hao Hu, Weixin Yao, Yichao Wu
Finite mixture of regression (FMR) models can be reformulated as incomplete data problems and they can be estimated via the expectation-maximization (EM) algorithm. The main drawback is the strong parametric assumption such as FMR models with normal distributed residuals. The estimation might be biased if the model is misspecified. To relax the parametric assumption about the component error densities, a new method is proposed to estimate the mixture regression parameters by only assuming that the components have log-concave error densities but the specific parametric family is unknown...
July 2017: Computational Statistics & Data Analysis
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"