Read by QxMD icon Read

Journal of Computational and Graphical Statistics

Hui Xu, Xiangdong Gu, Mahlet G Tadesse, Raji Balasubramanian
We present an ensemble tree-based algorithm for variable selection in high dimensional datasets, in settings where a time-to-event outcome is observed with error. The proposed methods are motivated by self-reported outcomes collected in large-scale epidemiologic studies, such as the Women's Health Initiative. The proposed methods equally apply to imperfect outcomes that arise in other settings such as data extracted from electronic medical records. To evaluate the performance of our proposed algorithm, we present results from simulation studies, considering both continuous and categorical covariates...
2018: Journal of Computational and Graphical Statistics
Jean Morrison, Noah Simon
Confidence interval procedures used in low dimensional settings are often inappropriate for high dimensional applications. When many parameters are estimated, marginal confidence intervals associated with the most significant estimates have very low coverage rates: They are too small and centered at biased estimates. The problem of forming confidence intervals in high dimensional settings has previously been studied through the lens of selection adjustment. In that framework, the goal is to control the proportion of non-covering intervals formed for selected parameters...
2018: Journal of Computational and Graphical Statistics
Brian R Gaines, Juhyun Kim, Hua Zhou
We compare alternative computing strategies for solving the constrained lasso problem. As its name suggests, the constrained lasso extends the widely-used lasso to handle linear constraints, which allow the user to incorporate prior information into the model. In addition to quadratic programming, we employ the alternating direction method of multipliers (ADMM) and also derive an efficient solution path algorithm. Through both simulations and benchmark data examples, we compare the different algorithms and provide practical recommendations in terms of efficiency and accuracy for various sizes of data...
2018: Journal of Computational and Graphical Statistics
Kris Sankaran, Susan Holmes
We introduce methods for visualization of data structured along trees, especially hierarchically structured collections of time series. To this end, we identify questions that often emerge when working with hierarchical data and provide an R package to simplify their investigation. Our key contribution is the adaptation of the visualization principles of focus-plus-context and linking to the study of tree-structured data. Our motivating application is to the analysis of bacterial time series, where an evolutionary tree relating bacteria is available a priori...
2018: Journal of Computational and Graphical Statistics
Eric F Lock
We propose a framework for the linear prediction of a multi-way array (i.e., a tensor) from another multi-way array of arbitrary dimension, using the contracted tensor product. This framework generalizes several existing approaches, including methods to predict a scalar outcome from a tensor, a matrix from a matrix, or a tensor from a scalar. We describe an approach that exploits the multiway structure of both the predictors and the outcomes by restricting the coefficients to have reduced CP-rank. We propose a general and efficient algorithm for penalized least-squares estimation, which allows for a ridge ( L 2 ) penalty on the coefficients...
2018: Journal of Computational and Graphical Statistics
Jie Zhou, Jiajia Zhang, Wenbin Lu
For semiparametric survival models with interval censored data and a cure fraction, it is often difficult to derive nonparametric maximum likelihood estimation due to the challenge in maximizing the complex likelihood function. In this paper, we propose a computationally efficient EM algorithm, facilitated by a gamma-poisson data augmentation, for maximum likelihood estimation in a class of generalized odds rate mixture cure (GORMC) models with interval censored data. The gamma-poisson data augmentation greatly simplifies the EM estimation and enhances the convergence speed of the EM algorithm...
2018: Journal of Computational and Graphical Statistics
Janet S Kim, Ana-Maria Staicu, Arnab Maity, Raymond J Carroll, David Ruppert
We study additive function-on-function regression where the mean response at a particular time point depends on the time point itself, as well as the entire covariate trajectory. We develop a computationally efficient estimation methodology based on a novel combination of spline bases with an eigenbasis to represent the trivariate kernel function. We discuss prediction of a new response trajectory, propose an inference procedure that accounts for total variability in the predicted response curves, and construct pointwise prediction intervals...
2018: Journal of Computational and Graphical Statistics
Min Lu, Saad Sadiq, Daniel J Feaster, Hemant Ishwaran
Estimation of individual treatment effect in observational data is complicated due to the challenges of confounding and selection bias. A useful inferential framework to address this is the counterfactual (potential outcomes) model, which takes the hypothetical stance of asking what if an individual had received both treatments. Making use of random forests (RF) within the counterfactual framework we estimate individual treatment effects by directly modeling the response. We find that accurate estimation of individual treatment effects is possible even in complex heterogenous settings but that the type of RF approach plays an important role in accuracy...
2018: Journal of Computational and Graphical Statistics
Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter, Bettina Grün
The use of a finite mixture of normal distributions in model-based clustering allows us to capture non-Gaussian data clusters. However, identifying the clusters from the normal components is challenging and in general either achieved by imposing constraints on the model or by using post-processing procedures. Within the Bayesian framework, we propose a different approach based on sparse finite mixtures to achieve identifiability. We specify a hierarchical prior, where the hyperparameters are carefully selected such that they are reflective of the cluster structure aimed at...
April 3, 2017: Journal of Computational and Graphical Statistics
Jingnan Xue, Faming Liang
Feature screening plays an important role in dimension reduction for ultrahigh-dimensional data. In this paper, we introduce a new feature screening method and establish its sure independence screening property under the ultrahigh-dimensional setting. The proposed method works based on the nonparanormal transformation and Henze-Zirkler's test; that is, it first transforms the response variable and features to Gaussian random variables using the nonparanormal transformation and then tests the dependence between the response variable and features using the Henze-Zirkler's test...
2017: Journal of Computational and Graphical Statistics
Jonathan Fintzi, Xiang Cui, Jon Wakefield, Vladimir N Minin
Stochastic epidemic models describe the dynamics of an epidemic as a disease spreads through a population. Typically, only a fraction of cases are observed at a set of discrete times. The absence of complete information about the time evolution of an epidemic gives rise to a complicated latent variable problem in which the state space size of the epidemic grows large as the population size increases. This makes analytically integrating over the missing data infeasible for populations of even moderate size. We present a data augmentation Markov chain Monte Carlo (MCMC) framework for Bayesian estimation of stochastic epidemic model parameters, in which measurements are augmented with subject-level disease histories...
2017: Journal of Computational and Graphical Statistics
Aditya Mishra, Dipak K Dey, Kun Chen
In multivariate regression models, a sparse singular value decomposition of the regression component matrix is appealing for reducing dimensionality and facilitating interpretation. However, the recovery of such a decomposition remains very challenging, largely due to the simultaneous presence of orthogonality constraints and co-sparsity regularization. By delving into the underlying statistical data generation mechanism, we reformulate the problem as a supervised co-sparse factor analysis, and develop an efficient computational procedure, named sequential factor extraction via co-sparse unit-rank estimation (SeCURE), that completely bypasses the orthogonality requirements...
2017: Journal of Computational and Graphical Statistics
Stuart Lipsitz, Garrett Fitzmaurice, Debajyoti Sinha, Nathanael Hevelone, Jim Hu, Louis L Nguyen
Medical studies increasingly involve a large sample of independent clusters, where the cluster sizes are also large. Our motivating example from the 2010 Nationwide Inpatient Sample (NIS) has 8,001,068 patients and 1049 clusters, with average cluster size of 7627. Consistent parameter estimates can be obtained naively assuming independence, which are inefficient when the intra-cluster correlation (ICC) is high. Efficient generalized estimating equations (GEE) incorporate the ICC and sum all pairs of observations within a cluster when estimating the ICC...
2017: Journal of Computational and Graphical Statistics
Philip T Reiss, David L Miller, Pei-Shien Wu, Wen-Yu Hua
A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. In the proposed method, which we call principal coordinate ridge regression , one regresses the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models...
2017: Journal of Computational and Graphical Statistics
Yiwen Zhang, Hua Zhou, Jin Zhou, Wei Sun
Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts...
2017: Journal of Computational and Graphical Statistics
John A Kamm, Jonathan Terhorst, Yun S Song
A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on...
2017: Journal of Computational and Graphical Statistics
Danjie Zhang, Ming-Hui Chen, Joseph G Ibrahim, Mark E Boye, Wei Shen
Joint models for longitudinal and survival data are routinely used in clinical trials or other studies to assess a treatment effect while accounting for longitudinal measures such as patient-reported outcomes (PROs). In the Bayesian framework, the deviance information criterion (DIC) and the logarithm of the pseudo marginal likelihood (LPML) are two well-known Bayesian criteria for comparing joint models. However, these criteria do not provide separate assessments of each component of the joint model. In this paper, we develop a novel decomposition of DIC and LPML to assess the fit of the longitudinal and survival components of the joint model, separately...
2017: Journal of Computational and Graphical Statistics
J T Gaskins, M J Daniels
The estimation of the covariance matrix is a key concern in the analysis of longitudinal data. When data consists of multiple groups, it is often assumed the covariance matrices are either equal across groups or are completely distinct. We seek methodology to allow borrowing of strength across potentially similar groups to improve estimation. To that end, we introduce a covariance partition prior which proposes a partition of the groups at each measurement time. Groups in the same set of the partition share dependence parameters for the distribution of the current measurement given the preceding ones, and the sequence of partitions is modeled as a Markov chain to encourage similar structure at nearby measurement times...
January 2, 2016: Journal of Computational and Graphical Statistics
Haochang Shou, Russell T Shinohara, Han Liu, Daniel S Reich, Ciprian M Crainiceanu
This work is motivated by a study of a population of multiple sclerosis (MS) patients using dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to identify active brain lesions. At each visit, a contrast agent is administered intravenously to a subject and a series of images are acquired to reveal the location and activity of MS lesions within the brain. Our goal is to identify the enhancing lesion locations at the subject level and lesion enhancement patterns at the population level. We analyze a total of 20 subjects scanned at 63 visits (∼30Gb), the largest population of such clinical brain images...
2016: Journal of Computational and Graphical Statistics
Chiranjit Mukherjee, Abel Rodriguez
Gaussian graphical models are popular for modeling high-dimensional multivariate data with sparse conditional dependencies. A mixture of Gaussian graphical models extends this model to the more realistic scenario where observations come from a heterogenous population composed of a small number of homogeneous sub-groups. In this paper we present a novel stochastic search algorithm for finding the posterior mode of high-dimensional Dirichlet process mixtures of decomposable Gaussian graphical models. Further, we investigate how to harness the massive thread-parallelization capabilities of graphical processing units to accelerate computation...
2016: Journal of Computational and Graphical Statistics
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"