journal
Journals Journal of Machine Learning Re...

Journal of Machine Learning Research : JMLR

https://read.qxmd.com/read/38264325/selective-inference-for-k-means-clustering
#1
JOURNAL ARTICLE
Yiqun T Chen, Daniela M Witten
We consider the problem of testing for a difference in means between clusters of observations identified via <mml:math xmlns:mml="https://www.w3.org/1998/Math/MathML"><mml:mi>k</mml:mi></mml:math>-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. In recent work, Gao et al. (2022) considered a related problem in the context of hierarchical clustering. Unfortunately, their solution is highly-tailored to the context of hierarchical clustering, and thus cannot be applied in the setting of <mml:math xmlns:mml="https://www...
May 2023: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/37484701/inference-for-gaussian-processes-with-mat%C3%A3-rn-covariogram-on-compact-riemannian-manifolds
#2
JOURNAL ARTICLE
Didong Li, Wenpin Tang, Sudipto Banerjee
Gaussian processes are widely employed as versatile modelling and predictive tools in spatial statistics, functional data analysis, computer modelling and diverse applications of machine learning. They have been widely studied over Euclidean spaces, where they are specified using covariance functions or covariograms for modelling complex dependencies. There is a growing literature on Gaussian processes over Riemannian manifolds in order to develop richer and more flexible inferential frameworks for non-Euclidean data...
March 2023: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/38500567/surrogate-assisted-semi-supervised-inference-for-high-dimensional-risk-prediction
#3
JOURNAL ARTICLE
Jue Hou, Zijian Guo, Tianxi Cai
Risk modeling with electronic health records (EHR) data is challenging due to no direct observations of the disease outcome and the high-dimensional predictors. In this paper, we develop a surrogate assisted semi-supervised learning approach, leveraging small labeled data with annotated outcomes and extensive unlabeled data of outcome surrogates and high-dimensional predictors. We propose to impute the unobserved outcomes by constructing a sparse imputation model with outcome surrogates and high-dimensional predictors...
2023: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/38249291/conditional-distribution-function-estimation-using-neural-networks-for-censored-and-uncensored-data
#4
JOURNAL ARTICLE
Bingqing Hu, Bin Nan
Most work in neural networks focuses on estimating the conditional mean of a continuous response variable given a set of covariates. In this article, we consider estimating the conditional distribution function using neural networks for both censored and uncensored data. The algorithm is built upon the data structure particularly constructed for the Cox regression with time-dependent covariates. Without imposing any model assumptions, we consider a loss function that is based on the full likelihood where the conditional hazard function is the only unknown nonparametric parameter, for which unconstrained optimization methods can be applied...
2023: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/37701522/inference-for-a-large-directed-acyclic-graph-with-unspecified-interventions
#5
JOURNAL ARTICLE
Chunlin Li, Xiaotong Shen, Wei Pan
Statistical inference of directed relations given some unspecified interventions (i.e., the intervention targets are unknown) is challenging. In this article, we test hypothesized directed relations with unspecified interventions. First, we derive conditions to yield an identifiable model. Unlike classical inference, testing directed relations requires to identify the ancestors and relevant interventions of hypothesis-specific primary variables. To this end, we propose a peeling algorithm based on nodewise regressions to establish a topological order of primary variables...
2023: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/37588020/learning-optimal-group-structured-individualized-treatment-rules-with-many-treatments
#6
JOURNAL ARTICLE
Haixu Ma, Donglin Zeng, Yufeng Liu
Data driven individualized decision making problems have received a lot of attentions in recent years. In particular, decision makers aim to determine the optimal Individualized Treatment Rule (ITR) so that the expected specified outcome averaging over heterogeneous patient-specific characteristics is maximized. Many existing methods deal with binary or a moderate number of treatment arms and may not take potential treatment effect structure into account. However, the effectiveness of these methods may deteriorate when the number of treatment arms becomes large...
2023: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/37206375/bayesian-data-selection
#7
JOURNAL ARTICLE
Eli N Weinstein, Jeffrey W Miller
Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the "data selection" problem: finding a lower-dimensional statistic-such as a subset of variables-that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining "background" components of the data, and perform standard Bayesian model selection for the choice of statistic...
2023: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/37102181/generalized-matrix-factorization-efficient-algorithms-for-fitting-generalized-linear-latent-variable-models-to-large-data-arrays
#8
JOURNAL ARTICLE
Łukasz Kidziński, Francis K C Hui, David I Warton, Trevor Hastie
Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses...
November 2022: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/38264536/tree-based-node-aggregation-in-sparse-graphical-models
#9
JOURNAL ARTICLE
Ines Wilms, Jacob Bien
High-dimensional graphical models are often estimated using regularization that is aimed at reducing the number of edges in a network. In this work, we show how even simpler networks can be produced by aggregating the nodes of the graphical model. We develop a new convex regularized method, called the tree-aggregated graphical lasso or tag-lasso, that estimates graphical models that are both edge-sparse and node-aggregated. The aggregation is performed in a data-driven fashion by leveraging side information in the form of a tree that encodes node similarity and facilitates the interpretation of the resulting aggregated nodes...
September 2022: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/38481523/tree-values-selective-inference-for-regression-trees
#10
JOURNAL ARTICLE
Anna C Neufeld, Lucy L Gao, Daniela M Witten
We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage...
2022: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/38105917/bayesian-subset-selection-and-variable-importance-for-interpretable-prediction-and-classification
#11
JOURNAL ARTICLE
Daniel R Kowal
Subset selection is a valuable tool for interpretable learning, scientific discovery, and data compression. However, classical subset selection is often avoided due to selection instability, lack of regularization, and difficulties with post-selection inference. We address these challenges from a Bayesian perspective. Given any Bayesian predictive model <mml:math xmlns:mml="https://www.w3.org/1998/Math/MathML"><mml:mi>ℳ</mml:mi></mml:math>, we extract a family of near-optimal subsets of variables for linear prediction or classification...
2022: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/38098839/estimation-and-inference-on-high-dimensional-individualized-treatment-rule-in-observational-data-using-split-and-pooled-de-correlated-score
#12
JOURNAL ARTICLE
Muxuan Liang, Young-Geun Choi, Yang Ning, Maureen A Smith, Ying-Qi Zhao
With the increasing adoption of electronic health records, there is an increasing interest in developing individualized treatment rules, which recommend treatments according to patients' characteristics, from large observational data. However, there is a lack of valid inference procedures for such rules developed from this type of data in the presence of high-dimensional covariates. In this work, we develop a penalized doubly robust method to estimate the optimal individualized treatment rule from high-dimensional data...
2022: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/37974910/prior-adaptive-semi-supervised-learning-with-application-to-ehr-phenotyping
#13
JOURNAL ARTICLE
Yichi Zhang, Molei Liu, Matey Neykov, Tianxi Cai
Electronic Health Record (EHR) data, a rich source for biomedical research, have been successfully used to gain novel insight into a wide range of diseases. Despite its potential, EHR is currently underutilized for discovery research due to its major limitation in the lack of precise phenotype information. To overcome such difficulties, recent efforts have been devoted to developing supervised algorithms to accurately predict phenotypes based on relatively small training datasets with gold standard labels extracted via chart review...
2022: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/37873545/generalized-sparse-additive-models
#14
JOURNAL ARTICLE
Asad Haris, Noah Simon, Ali Shojaie
We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm for this class is presented that easily scales to thousands of observations and features. We prove minimax optimal convergence bounds for this class under a weak compatibility condition. In addition, we characterize the rate of convergence when this compatibility condition is not met...
2022: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/37799290/bayesian-covariate-dependent-gaussian-graphical-models-with-varying-structure
#15
JOURNAL ARTICLE
Yang Ni, Francesco C Stingo, Veerabhadran Baladandayuthapani
We introduce Bayesian Gaussian graphical models with covariates (GGMx), a class of multivariate Gaussian distributions with covariate-dependent sparse precision matrix. We propose a general construction of a functional mapping from the covariate space to the cone of sparse positive definite matrices, which encompasses many existing graphical models for heterogeneous settings. Our methodology is based on a novel mixture prior for precision matrices with a non-local component that admits attractive theoretical and empirical properties...
2022: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/37645242/the-importance-of-being-correlated-implications-of-dependence-in-joint-spectral-inference-across-multiple-networks
#16
JOURNAL ARTICLE
Konstantinos Pantazis, Avanti Athreya, Jesús Arroyo, William N Frost, Evan S Hill, Vince Lyzinski
Spectral inference on multiple networks is a rapidly-developing subfield of graph statistics. Recent work has demonstrated that joint, or simultaneous, spectral embedding of multiple independent networks can deliver more accurate estimation than individual spectral decompositions of those same networks. Such inference procedures typically rely heavily on independence assumptions across the multiple network realizations, and even in this case, little attention has been paid to the induced network correlation that can be a consequence of such joint embeddings...
2022: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/37576335/non-asymptotic-properties-of-individualized-treatment-rules-from-sequentially-rule-adaptive-trials
#17
JOURNAL ARTICLE
Daiqi Gao, Yufeng Liu, Donglin Zeng
Learning optimal individualized treatment rules (ITRs) has become increasingly important in the modern era of precision medicine. Many statistical and machine learning methods for learning optimal ITRs have been developed in the literature. However, most existing methods are based on data collected from traditional randomized controlled trials and thus cannot take advantage of the accumulative evidence when patients enter the trials sequentially. It is also ethically important that future patients should have a high probability to be treated optimally based on the updated knowledge so far...
2022: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/37234236/interpretable-classification-of-categorical-time-series-using-the-spectral-envelope-and-optimal-scalings
#18
JOURNAL ARTICLE
Zeda Li, Scott A Bruce, Tian Cai
This article introduces a novel approach to the classification of categorical time series under the supervised learning paradigm. To construct meaningful features for categorical time series classification, we consider two relevant quantities: the spectral envelope and its corresponding set of optimal scalings. These quantities characterize oscillatory patterns in a categorical time series as the largest possible power at each frequency, or spectral envelope , obtained by assigning numerical values, or scalings , to categories that optimally emphasize oscillations at each frequency...
2022: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/37205013/extensions-to-the-proximal-distance-method-of-constrained-optimization
#19
JOURNAL ARTICLE
Alfonso Landeros, Oscar Hernan Madrid Padilla, Hua Zhou, Kenneth Lange
The current paper studies the problem of minimizing a loss f ( x ) subject to constraints of the form Dx ∈ S , where S is a closed set, convex or not, and D is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method of optimization with the proximal distance principle. The latter is driven by minimization of penalized objectives <mml:math xmlns:mml="https://www...
2022: Journal of Machine Learning Research: JMLR
https://read.qxmd.com/read/35983506/d-gcca-decomposition-based-generalized-canonical-correlation-analysis-for-multi-view-high-dimensional-data
#20
JOURNAL ARTICLE
Hai Shu, Zhe Qu, Hongtu Zhu
Modern biomedical studies often collect multi-view data, that is, multiple types of data measured on the same set of objects. A popular model in high-dimensional multi-view data analysis is to decompose each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views, a low-rank distinctive-source matrix corresponding to each view, and an additive noise matrix. We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA)...
2022: Journal of Machine Learning Research: JMLR
journal
journal
46647
1
2
Fetch more papers »
Fetching more papers... Fetching...
Remove bar
Read by QxMD icon Read
×

Save your favorite articles in one place with a free QxMD account.

×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"

We want to hear from doctors like you!

Take a second to answer a survey question.