Papers in the journal Statistics and its Interface (Page 3)

#41

JOURNAL ARTICLE

Model diagnostics in reduced-rank estimation.

Kun Chen

Reduced-rank methods are very popular in high-dimensional multivariate analysis for conducting simultaneous dimension reduction and model estimation. However, the commonly-used reduced-rank methods are not robust, as the underlying reduced-rank structure can be easily distorted by only a few data outliers. Anomalies are bound to exist in big data problems, and in some applications they themselves could be of the primary interest. While naive residual analysis is often inadequate for outlier detection due to potential masking and swamping, robust reduced-rank estimation approaches could be computationally demanding...

28003860

2016: Statistics and its Interface

#42

JOURNAL ARTICLE

Statistical methods and computing for big data.

Chun Wang, Ming-Hui Chen, Elizabeth Schifano, Jing Wu, Jun Yan

Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data...

27695593

2016: Statistics and its Interface

#43

JOURNAL ARTICLE

A modified classification tree method for personalized medicine decisions.

Wan-Min Tsai, Heping Zhang, Eugenia Buta, Stephanie O'Malley, Ralitza Gueorguieva

The tree-based methodology has been widely applied to identify predictors of health outcomes in medical studies. However, the classical tree-based approaches do not pay particular attention to treatment assignment and thus do not consider prediction in the context of treatment received. In recent years, attention has been shifting from average treatment effects to identifying moderators of treatment response, and tree-based approaches to identify subgroups of subjects with enhanced treatment responses are emerging...

26770292

2016: Statistics and its Interface

#44

JOURNAL ARTICLE

Energy bagging tree.

Taoyun Cao, Xueqin Wang, Heping Zhang

This paper introduces Energy Bagging Tree (EBT) for multivariate nonparametric regression problems. The EBT makes use of a measure of dispersion constructed from a generalized Gini's mean difference as node impurity, and the tree split function therefore corresponds to the product of energy distance and descendants' proportions. As a non-parametric extension of the between-sample variation in the analysis of variance, this measure of dispersion serves well for EBT in understanding certain complex data. Extensive simulation studies indicate that EBT is highly competitive with existing regression tree methods...

26594301

2016: Statistics and its Interface

#45

JOURNAL ARTICLE

Modeling Multiple Responses via Bootstrapping Margins with an Application to Genetic Association Testing.

Jiwei Zhao, Heping Zhang

The need for analysis of multiple responses arises from many applications. In behavioral science, for example, comorbidity is a common phenomenon where multiple disorders occur in the same person. The advantage of jointly analyzing multiple correlated responses has been examined and documented. Due to the difficulties of modeling multiple responses, nonparametric tests such as generalized Kendall's Tau have been developed to assess the association between multiple responses and risk factors. These procedures have been applied to genomewide association studies of multiple complex traits...

26543519

2016: Statistics and its Interface

#46

JOURNAL ARTICLE

An extended Tajima's D neutrality test incorporating SNP calling and imputation uncertainties.

Qingrun Zhang, Chris Tyler-Smith, Quan Long

To identify evolutionary events from the footprints left in the patterns of genetic variation in a population, people use many statistical frameworks, including neutrality tests. In datasets from current high throughput sequencing and genotyping platforms, it is common to have missing data and low-confidence SNP calls at many segregating sites. However, the traditional statistical framework for neutrality tests does not allow for these possibilities; therefore the usual way of treating missing data is to ignore segregating sites with missing/low confidence calls, regardless of the good SNP calls at these sites in other individuals...

26681995

October 1, 2015: Statistics and its Interface

#47

Estimating the Sizes of Populations At Risk of HIV Infection From Multiple Data Sources Using a Bayesian Hierarchical Model.

Le Bao, Adrian E Raftery, Amala Reddy

In most countries in the world outside of sub-Saharan Africa, HIV is largely concentrated in sub-populations whose behavior puts them at higher risk of contracting and transmitting HIV, such as people who inject drugs, sex workers and men who have sex with men. Estimating the size of these sub-populations is important for assessing overall HIV prevalence and designing effective interventions. We present a Bayesian hierarchical model for estimating the sizes of local and national HIV key affected populations...

26015851

April 1, 2015: Statistics and its Interface

#48

JOURNAL ARTICLE

A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data.

Francesco C Stingo, Michael D Swartz, Marina Vannucci

Complex diseases, such as cancer, arise from complex etiologies consisting of multiple single-nucleotide polymorphisms (SNPs), each contributing a small amount to the overall risk of disease. Thus, many researchers have gone beyond single-SNPs analysis methods, focusing instead on groups of SNPs, for example by analysing haplotypes. More recently, pathway-based methods have been proposed that use prior biological knowledge on gene function to achieve a more powerful analysis of genome-wide association studies (GWAS) data...

28989562

2015: Statistics and its Interface

#49

JOURNAL ARTICLE

Single-gene negative binomial regression models for RNA-Seq data with higher-order asymptotic inference.

Yanming Di

We consider negative binomial (NB) regression models for RNA-Seq read counts and investigate an approach where such NB regression models are fitted to individual genes separately and, in particular, the NB dispersion parameter is estimated from each gene separately without assuming commonalities between genes. This single-gene approach contrasts with the more widely-used dispersion-modeling approach where the NB dispersion is modeled as a simple function of the mean or other measures of read abundance, and then estimated from a large number of genes combined...

28042360

2015: Statistics and its Interface

#50

A penalized likelihood approach for robust estimation of isoform expression.

Hui Jiang, Julia Salzman

Ultra high-throughput sequencing of transcriptomes (RNA-Seq) has enabled the accurate estimation of gene expression at individual isoform level. However, systematic biases introduced during the sequencing and mapping processes as well as incompleteness of the transcript annotation databases may cause the estimates of isoform abundances to be unreliable, and in some cases, highly inaccurate. This paper introduces a penalized likelihood approach to detect and correct for such biases in a robust manner. Our model extends those previously proposed by introducing bias parameters for reads...

27239250

2015: Statistics and its Interface

#51

Variable selection in strong hierarchical semiparametric models for longitudinal data.

Xianbin Zeng, Shuangge Ma, Yichen Qin, Yang Li

In this paper, we consider the variable selection problem in semiparametric additive partially linear models for longitudinal data. Our goal is to identify relevant main effects and corresponding interactions associated with the response variable. Meanwhile, we enforce the strong hierarchical restriction on the model, that is, an interaction can be included in the model only if both the associated main effects are included. Based on B-splines basis approximation for the nonparametric components, we propose an iterative estimation procedure for the model by penalizing the likelihood with a partial group minimax concave penalty (MCP), and use BIC to select the tuning parameter...

27076867

2015: Statistics and its Interface

#52

JOURNAL ARTICLE

Quantile regression for censored mixed-effects models with applications to HIV studies.

Victor H Lachos, Ming-Hui Chen, Carlos A Abanto-Valle, Caio L N Azevedo

HIV RNA viral load measures are often subjected to some upper and lower detection limits depending on the quantification assays. Hence, the responses are either left or right censored. Linear/nonlinear mixed-effects models, with slight modifications to accommodate censoring, are routinely used to analyze this type of data. Usually, the inference procedures are based on normality (or elliptical distribution) assumptions for the random terms. However, those analyses might not provide robust inference when the distribution assumptions are questionable...

26753050

2015: Statistics and its Interface

#53

JOURNAL ARTICLE

Rare variant testing across methods and thresholds using the multi-kernel sequence kernel association test (MK-SKAT).

Eugene Urrutia, Seunggeun Lee, Arnab Maity, Ni Zhao, Judong Shen, Yun Li, Michael C Wu

Analysis of rare genetic variants has focused on region-based analysis wherein a subset of the variants within a genomic region is tested for association with a complex trait. Two important practical challenges have emerged. First, it is difficult to choose which test to use. Second, it is unclear which group of variants within a region should be tested. Both depend on the unknown true state of nature. Therefore, we develop the Multi-Kernel SKAT (MK-SKAT) which tests across a range of rare variant tests and groupings...

26740853

2015: Statistics and its Interface

#54

motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences.

Dennis Kostka, Tara Friedrich, Alisha K Holloway, Katherine S Pollard

Next-generation sequencing technology enables the identification of thousands of gene regulatory sequences in many cell types and organisms. We consider the problem of testing if two such sequences differ in their number of binding site motifs for a given transcription factor (TF) protein. Binding site motifs impart regulatory function by providing TFs the opportunity to bind to genomic elements and thereby affect the expression of nearby genes. Evolutionary changes to such functional DNA are hypothesized to be major contributors to phenotypic diversity within and between species; but despite the importance of TF motifs for gene expression, no method exists to test for motif loss or gain...

26709360

2015: Statistics and its Interface

#55

Bayesian Case-deletion Model Complexity and Information Criterion.

Hongtu Zhu, Joseph G Ibrahim, Qingxia Chen

We establish a connection between Bayesian case influence measures for assessing the influence of individual observations and Bayesian predictive methods for evaluating the predictive performance of a model and comparing different models fitted to the same dataset. Based on such a connection, we formally propose a new set of Bayesian case-deletion model complexity (BCMC) measures for quantifying the effective number of parameters in a given statistical model. Its properties in linear models are explored. Adding some functions of BCMC to a conditional deviance function leads to a Bayesian case-deletion information criterion (BCIC) for comparing models...

26180578

October 1, 2014: Statistics and its Interface

#56

A note on the relationships between multiple imputation, maximum likelihood and fully Bayesian methods for missing responses in linear regression models.

Qingxia Chen, Joseph G Ibrahim

Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR...

25309677

July 1, 2014: Statistics and its Interface

#57

JOURNAL ARTICLE

Linear mixed models for multiple outcomes using extended multivariate skew-t distributions.

Binbing Yu, A James O'Malley, Pulak Ghosh

Multivariate outcomes with heavy skewness and thick tails often arise from clustered experiments or longitudinal studies. Linear mixed models with multivariate skew- t (MST) distributions for the random effects and the error terms is a popular tool of robust modeling for such outcomes. However the usual MST distribution only allows a common degree of freedom for all marginal distributions, which is only appropriate when each marginal has the same amount of tail heaviness. In this paper, we introduce a new class of extended MST distributions, which allow different degrees of freedom and thereby can accommodate heterogeneity in tail-heaviness across outcomes...

28435512

2014: Statistics and its Interface

#58

JOURNAL ARTICLE

A New Bayesian Lasso.

Himel Mallick, Nengjun Yi

Park and Casella (2008) provided the Bayesian lasso for linear models by assigning scale mixture of normal (SMN) priors on the parameters and independent exponential priors on their variances. In this paper, we propose an alternative Bayesian analysis of the lasso problem. A different hierarchical formulation of Bayesian lasso is introduced by utilizing the scale mixture of uniform (SMU) representation of the Laplace density. We consider a fully Bayesian treatment that leads to a new Gibbs sampler with tractable full conditional posterior distributions...

27570577

2014: Statistics and its Interface

#59

JOURNAL ARTICLE

Approaches to retrospective sampling for longitudinal transition regression models.

Sally Hunsberger, Paul S Albert, Marie Thoma

For binary diseases that relapse and remit, it is often of interest to estimate the effect of covariates on the transition process between disease states over time. The transition process can be characterized by modeling the probability of the binary event given the individual's history. Designing studies that examine the impact of time varying covariates over time can lead to collection of extensive amounts of data. Sometimes it may be possible to collect and store tissue, blood or images and retrospectively analyze this covariate information...

27239249

2014: Statistics and its Interface

#60

JOURNAL ARTICLE

Genotype-based association models of complex diseases to detect gene-gene and gene-environment interactions.

Iryna Lobach, Ruzong Fan, Prashiela Manga

A central problem in genetic epidemiology is to identify and rank genetic markers involved in a disease. Complex diseases, such as cancer, hypertension, diabetes, are thought to be caused by an interaction of a panel of genetic factors, that can be identified by markers, which modulate environmental factors. Moreover, the effect of each genetic marker may be small. Hence, the association signal may be missed unless a large sample is considered, or a priori biomedical data are used. Recent advances generated a vast variety of a priori information, including linkage maps and information about gene regulatory dependence assembled into curated pathway databases...

26191336

2014: Statistics and its Interface

Use the journals feature with a free QxMD account.

Statistics and its Interface

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips