Read by QxMD icon Read

Statistics and its Interface

Thomas Nemmers, Anjana Narayan, Sudipto Banerjee
This article presents a simple and easily implementable Bayesian approach to model and quantify uncertainty in small descriptive social networks. While statistical methods for analyzing networks have seen burgeoning activity over the last decade or so, ranging from social sciences to genetics, such methods usually involve sophisticated stochastic models whose estimation requires substantial structure and information in the networks. At the other end of the analytic spectrum, there are purely descriptive methods based upon quantities and axioms in computational graph theory...
2019: Statistics and its Interface
Michelle DeVeaux, Michael J Kane, Daniel Zelterman
We introduce a discrete distribution suggested by curtailed sampling rules common in early-stage clinical trials. We derive the distribution of the smallest number of independent and identically distributed Bernoulli trials needed to observe either s successes or t failures. This report provides a closed-form expression for the mass function, moment generating function, and provides connections to other, standard distributions.
2018: Statistics and its Interface
Yun Li, Sijian Wang, Peter X-K Song, Naisyin Wang, Ling Zhou, Ji Zhu
The linear mixed-effects model (LMM) is widely used in the analysis of clustered or longitudinal data. This paper aims to address analytic challenges arising from estimation and selection in the application of the LMM to high-dimensional longitudinal data. We develop a doubly regularized approach in the LMM to simultaneously select fixed and random effects. On the theoretical front, we establish large sample properties for the proposed method under the high-dimensional setting, allowing both numbers of fixed effects and random effects to be much larger than the sample size...
2018: Statistics and its Interface
Janet S Kim, Arnab Maity, Ana-Maria Staicu
We propose a flexible regression model to study the association between a functional response and multiple functional covariates that are observed on the same domain. Specifically, we relate the mean of the current response to current values of the covariates by a sum of smooth unknown bivariate functions, where each of the functions depends on the current value of the covariate and the time point itself. In this framework, we develop estimation methodology that accommodates realistic scenarios where the covariates are sampled with or without error on a sparse and irregular design, and prediction that accounts for unknown model correlation structure...
2018: Statistics and its Interface
Jingxiang Chen, Chong Zhang, Michael R Kosorok, Yufeng Liu
Learning in the Reproducing Kernel Hilbert Space (RKHS) has been widely used in many scientific disciplines. Because a RKHS can be very flexible, it is common to impose a regularization term in the optimization to prevent overfitting. Standard RKHS learning employs the squared norm penalty of the learning function. Despite its success, many challenges remain. In particular, one cannot directly use the squared norm penalty for variable selection or data extraction. Therefore, when there exists noise predictors, or the underlying function has a sparse representation in the dual space, the performance of standard RKHS learning can be suboptimal...
2018: Statistics and its Interface
Esra Kürüm, John Hughes, Runze Li, Saul Shiffman
We propose a copula-based joint modeling framework for mixed longitudinal responses. Our approach permits all model parameters to vary with time, and thus will enable researchers to reveal dynamic response-predictor relationships and response-response associations. We call the new class of models TIMECOP because we model dependence using a time-varying copula. We develop a one-step estimation procedure for the TIMECOP parameter vector, and also describe how to estimate standard errors. We investigate the finite sample performance of our procedure via three simulation studies, one of which shows that our procedure performs well under ignorable missingness...
2018: Statistics and its Interface
Guanglei Yu, Liang Zhu, Jianguo Sun, Leslie L Robison
This paper discusses regression analysis of a type of incomplete mixed data arising from event history studies with the proportional rates model. By mixed data, we mean that each study subject may be observed continuously during the whole study period, continuously over some study periods and at some time points, or only at some discrete time points. Therefore, we have combined recurrent event and panel count data. For the problem, we present a multiple imputation-based estimation procedure and one advantage of the proposed marginal model approach is that it can be easily implemented...
2018: Statistics and its Interface
William L Leão, Carlos A Abanto-Valle, Ming-Hui Chen
A stochastic volatility-in-mean model with correlated errors using the generalized hyperbolic skew Student-t (GHST) distribution provides a robust alternative to the parameter estimation for daily stock returns in the absence of normality. An efficient Markov chain Monte Carlo (MCMC) sampling algorithm is developed for parameter estimation. The deviance information, the Bayesian predictive information and the log-predictive score criterion are used to assess the fit of the proposed model. The proposed method is applied to an analysis of the daily stock return data from the Standard & Poor's 500 index (S&P 500)...
2017: Statistics and its Interface
Christian E Galarza, Victor H Lachos, Dipankar Bandyopadhyay
This paper develops a likelihood-based approach to analyze quantile regression (QR) models for continuous longitudinal data via the asymmetric Laplace distribution (ALD). Compared to the conventional mean regression approach, QR can characterize the entire conditional distribution of the outcome variable and is more robust to the presence of outliers and misspecification of the error distribution. Exploiting the nice hierarchical representation of the ALD, our classical approach follows a Stochastic Approximation of the EM (SAEM) algorithm in deriving exact maximum likelihood estimates of the fixed-effects and variance components...
2017: Statistics and its Interface
Christopher Bryant, Hongtu Zhu, Mihye Ahn, Joseph Ibrahim
The aim of this article is to develop a Bayesian random graph mixture model (RGMM) to detect the latent class network (LCN) structure of brain connectivity networks and estimate the parameters governing this structure. The use of conjugate priors for unknown parameters leads to efficient estimation, and a well-known nonidentifiability issue is avoided by a particular parameterization of the stochastic block model (SBM). Posterior computation proceeds via an efficient Markov Chain Monte Carlo algorithm. Simulations demonstrate that LCN outperforms several other competing methods for community detection in weighted networks, and we apply our RGMM to estimate the latent community structures in the functional resting brain networks of 185 subjects from the ADHD-200 sample...
2017: Statistics and its Interface
Baolin Wu, James S Pankow
More and more large cohort studies have conducted or are conducting genome-wide association studies (GWAS) to reveal the genetic components of many complex human diseases. These large cohort studies often collected a broad array of correlated phenotypes that reflect common physiological processes. By jointly analyzing these correlated traits, we can gain more power by aggregating multiple weak effects and shed light on the mechanisms underlying complex human diseases. The majority of existing multi-trait association test methods are based on jointly modeling the multivariate traits conditional on the genotype as covariate, and can readily accommodate the imputed SNPs by using their imputed dosage as a covariate...
2017: Statistics and its Interface
Thaddeus Tarpey, Eva Petkova, Liangyu Zhu
Understanding heterogeneity in phenotypical characteristics, symptoms manifestations and response to treatment of subjects with psychiatric illnesses is a continuing challenge in mental health research. A long-standing goal of medical studies is to identify groups of subjects characterized with a particular trait or quality and to distinguish them from other subjects in a clinically relevant way. This paper develops and illustrates a novel approach to this problem based on a method of optimal-partitioning (clustering) of functional data...
July 1, 2016: Statistics and its Interface
Anastasia Ivanova, Allison M Deal
Many oncology phase II trials are single arm studies designed to screen novel treatments based on efficacy outcome. Efficacy is often assessed as an ordinal variable based on a level of response of solid tumors with four categories: complete response, partial response, stable disease and progression. We describe a two-stage design for a single-arm phase II trial where the primary objective is to test the rate of tumor response defined as complete plus partial response, and the secondary objective is to estimate the rate of disease control defined as tumor response plus stable disease...
2016: Statistics and its Interface
Yang Li, Yanqing Sun
Longitudinal data frequently arise in many fields such as medical follow-up studies focusing on specific longitudinal responses. In such situations, the responses are recorded only at discrete observation times. Most existing approaches for longitudinal data analysis assume that the observation or follow-up times are independent of the underlying response process, either completely or given some known covariates. We present a joint analysis approach in which possible correlations among the responses, observation and follow-up times can be characterized by time-dependent random effects...
2016: Statistics and its Interface
Kun Chen
Reduced-rank methods are very popular in high-dimensional multivariate analysis for conducting simultaneous dimension reduction and model estimation. However, the commonly-used reduced-rank methods are not robust, as the underlying reduced-rank structure can be easily distorted by only a few data outliers. Anomalies are bound to exist in big data problems, and in some applications they themselves could be of the primary interest. While naive residual analysis is often inadequate for outlier detection due to potential masking and swamping, robust reduced-rank estimation approaches could be computationally demanding...
2016: Statistics and its Interface
Chun Wang, Ming-Hui Chen, Elizabeth Schifano, Jing Wu, Jun Yan
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data...
2016: Statistics and its Interface
Wan-Min Tsai, Heping Zhang, Eugenia Buta, Stephanie O'Malley, Ralitza Gueorguieva
The tree-based methodology has been widely applied to identify predictors of health outcomes in medical studies. However, the classical tree-based approaches do not pay particular attention to treatment assignment and thus do not consider prediction in the context of treatment received. In recent years, attention has been shifting from average treatment effects to identifying moderators of treatment response, and tree-based approaches to identify subgroups of subjects with enhanced treatment responses are emerging...
2016: Statistics and its Interface
Taoyun Cao, Xueqin Wang, Heping Zhang
This paper introduces Energy Bagging Tree (EBT) for multivariate nonparametric regression problems. The EBT makes use of a measure of dispersion constructed from a generalized Gini's mean difference as node impurity, and the tree split function therefore corresponds to the product of energy distance and descendants' proportions. As a non-parametric extension of the between-sample variation in the analysis of variance, this measure of dispersion serves well for EBT in understanding certain complex data. Extensive simulation studies indicate that EBT is highly competitive with existing regression tree methods...
2016: Statistics and its Interface
Jiwei Zhao, Heping Zhang
The need for analysis of multiple responses arises from many applications. In behavioral science, for example, comorbidity is a common phenomenon where multiple disorders occur in the same person. The advantage of jointly analyzing multiple correlated responses has been examined and documented. Due to the difficulties of modeling multiple responses, nonparametric tests such as generalized Kendall's Tau have been developed to assess the association between multiple responses and risk factors. These procedures have been applied to genomewide association studies of multiple complex traits...
2016: Statistics and its Interface
Qingrun Zhang, Chris Tyler-Smith, Quan Long
To identify evolutionary events from the footprints left in the patterns of genetic variation in a population, people use many statistical frameworks, including neutrality tests. In datasets from current high throughput sequencing and genotyping platforms, it is common to have missing data and low-confidence SNP calls at many segregating sites. However, the traditional statistical framework for neutrality tests does not allow for these possibilities; therefore the usual way of treating missing data is to ignore segregating sites with missing/low confidence calls, regardless of the good SNP calls at these sites in other individuals...
October 1, 2015: Statistics and its Interface
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"