Most recent papers in the journal Journal of Machine Learning Research : JMLR

#1

JOURNAL ARTICLE

Selective inference for k -means clustering.

Yiqun T Chen, Daniela M Witten

We consider the problem of testing for a difference in means between clusters of observations identified via <mml:math xmlns:mml="https://www.w3.org/1998/Math/MathML"><mml:mi>k</mml:mi></mml:math>-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. In recent work, Gao et al. (2022) considered a related problem in the context of hierarchical clustering. Unfortunately, their solution is highly-tailored to the context of hierarchical clustering, and thus cannot be applied in the setting of <mml:math xmlns:mml="https://www...

38264325

May 2023: Journal of Machine Learning Research: JMLR

#2

JOURNAL ARTICLE

Inference for Gaussian Processes with Matérn Covariogram on Compact Riemannian Manifolds.

Didong Li, Wenpin Tang, Sudipto Banerjee

Gaussian processes are widely employed as versatile modelling and predictive tools in spatial statistics, functional data analysis, computer modelling and diverse applications of machine learning. They have been widely studied over Euclidean spaces, where they are specified using covariance functions or covariograms for modelling complex dependencies. There is a growing literature on Gaussian processes over Riemannian manifolds in order to develop richer and more flexible inferential frameworks for non-Euclidean data...

37484701

March 2023: Journal of Machine Learning Research: JMLR

#3

JOURNAL ARTICLE

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction.

Jue Hou, Zijian Guo, Tianxi Cai

Risk modeling with electronic health records (EHR) data is challenging due to no direct observations of the disease outcome and the high-dimensional predictors. In this paper, we develop a surrogate assisted semi-supervised learning approach, leveraging small labeled data with annotated outcomes and extensive unlabeled data of outcome surrogates and high-dimensional predictors. We propose to impute the unobserved outcomes by constructing a sparse imputation model with outcome surrogates and high-dimensional predictors...

38500567

2023: Journal of Machine Learning Research: JMLR

#4

JOURNAL ARTICLE

Conditional Distribution Function Estimation Using Neural Networks for Censored and Uncensored Data.

Bingqing Hu, Bin Nan

Most work in neural networks focuses on estimating the conditional mean of a continuous response variable given a set of covariates. In this article, we consider estimating the conditional distribution function using neural networks for both censored and uncensored data. The algorithm is built upon the data structure particularly constructed for the Cox regression with time-dependent covariates. Without imposing any model assumptions, we consider a loss function that is based on the full likelihood where the conditional hazard function is the only unknown nonparametric parameter, for which unconstrained optimization methods can be applied...

38249291

2023: Journal of Machine Learning Research: JMLR

#5

JOURNAL ARTICLE

Inference for a Large Directed Acyclic Graph with Unspecified Interventions.

Chunlin Li, Xiaotong Shen, Wei Pan

Statistical inference of directed relations given some unspecified interventions (i.e., the intervention targets are unknown) is challenging. In this article, we test hypothesized directed relations with unspecified interventions. First, we derive conditions to yield an identifiable model. Unlike classical inference, testing directed relations requires to identify the ancestors and relevant interventions of hypothesis-specific primary variables. To this end, we propose a peeling algorithm based on nodewise regressions to establish a topological order of primary variables...

37701522

2023: Journal of Machine Learning Research: JMLR

#6

JOURNAL ARTICLE

Learning Optimal Group-structured Individualized Treatment Rules with Many Treatments.

Haixu Ma, Donglin Zeng, Yufeng Liu

Data driven individualized decision making problems have received a lot of attentions in recent years. In particular, decision makers aim to determine the optimal Individualized Treatment Rule (ITR) so that the expected specified outcome averaging over heterogeneous patient-specific characteristics is maximized. Many existing methods deal with binary or a moderate number of treatment arms and may not take potential treatment effect structure into account. However, the effectiveness of these methods may deteriorate when the number of treatment arms becomes large...

37588020

2023: Journal of Machine Learning Research: JMLR

#7

JOURNAL ARTICLE

Bayesian Data Selection.

Eli N Weinstein, Jeffrey W Miller

Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the "data selection" problem: finding a lower-dimensional statistic-such as a subset of variables-that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining "background" components of the data, and perform standard Bayesian model selection for the choice of statistic...

37206375

2023: Journal of Machine Learning Research: JMLR

#8

JOURNAL ARTICLE

Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays.

Łukasz Kidziński, Francis K C Hui, David I Warton, Trevor Hastie

Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses...

37102181

November 2022: Journal of Machine Learning Research: JMLR

#9

JOURNAL ARTICLE

Tree-based Node Aggregation in Sparse Graphical Models.

Ines Wilms, Jacob Bien

High-dimensional graphical models are often estimated using regularization that is aimed at reducing the number of edges in a network. In this work, we show how even simpler networks can be produced by aggregating the nodes of the graphical model. We develop a new convex regularized method, called the tree-aggregated graphical lasso or tag-lasso, that estimates graphical models that are both edge-sparse and node-aggregated. The aggregation is performed in a data-driven fashion by leveraging side information in the form of a tree that encodes node similarity and facilitates the interpretation of the resulting aggregated nodes...

38264536

September 2022: Journal of Machine Learning Research: JMLR

#10

JOURNAL ARTICLE

Tree-Values: Selective Inference for Regression Trees.

Anna C Neufeld, Lucy L Gao, Daniela M Witten

We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage...

38481523

2022: Journal of Machine Learning Research: JMLR

#11

JOURNAL ARTICLE

Bayesian subset selection and variable importance for interpretable prediction and classification.

Daniel R Kowal

Subset selection is a valuable tool for interpretable learning, scientific discovery, and data compression. However, classical subset selection is often avoided due to selection instability, lack of regularization, and difficulties with post-selection inference. We address these challenges from a Bayesian perspective. Given any Bayesian predictive model <mml:math xmlns:mml="https://www.w3.org/1998/Math/MathML"><mml:mi>ℳ</mml:mi></mml:math>, we extract a family of near-optimal subsets of variables for linear prediction or classification...

38105917

2022: Journal of Machine Learning Research: JMLR

#12

JOURNAL ARTICLE

Estimation and inference on high-dimensional individualized treatment rule in observational data using split-and-pooled de-correlated score.

Muxuan Liang, Young-Geun Choi, Yang Ning, Maureen A Smith, Ying-Qi Zhao

With the increasing adoption of electronic health records, there is an increasing interest in developing individualized treatment rules, which recommend treatments according to patients' characteristics, from large observational data. However, there is a lack of valid inference procedures for such rules developed from this type of data in the presence of high-dimensional covariates. In this work, we develop a penalized doubly robust method to estimate the optimal individualized treatment rule from high-dimensional data...

38098839

2022: Journal of Machine Learning Research: JMLR

#13

JOURNAL ARTICLE

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping.

Yichi Zhang, Molei Liu, Matey Neykov, Tianxi Cai

Electronic Health Record (EHR) data, a rich source for biomedical research, have been successfully used to gain novel insight into a wide range of diseases. Despite its potential, EHR is currently underutilized for discovery research due to its major limitation in the lack of precise phenotype information. To overcome such difficulties, recent efforts have been devoted to developing supervised algorithms to accurately predict phenotypes based on relatively small training datasets with gold standard labels extracted via chart review...

37974910

2022: Journal of Machine Learning Research: JMLR

#14

JOURNAL ARTICLE

Generalized Sparse Additive Models.

Asad Haris, Noah Simon, Ali Shojaie

We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm for this class is presented that easily scales to thousands of observations and features. We prove minimax optimal convergence bounds for this class under a weak compatibility condition. In addition, we characterize the rate of convergence when this compatibility condition is not met...

37873545

2022: Journal of Machine Learning Research: JMLR

#15

JOURNAL ARTICLE

Bayesian Covariate-Dependent Gaussian Graphical Models with Varying Structure.

Yang Ni, Francesco C Stingo, Veerabhadran Baladandayuthapani

We introduce Bayesian Gaussian graphical models with covariates (GGMx), a class of multivariate Gaussian distributions with covariate-dependent sparse precision matrix. We propose a general construction of a functional mapping from the covariate space to the cone of sparse positive definite matrices, which encompasses many existing graphical models for heterogeneous settings. Our methodology is based on a novel mixture prior for precision matrices with a non-local component that admits attractive theoretical and empirical properties...

37799290

2022: Journal of Machine Learning Research: JMLR

#16

JOURNAL ARTICLE

The Importance of Being Correlated: Implications of Dependence in Joint Spectral Inference across Multiple Networks.

Konstantinos Pantazis, Avanti Athreya, Jesús Arroyo, William N Frost, Evan S Hill, Vince Lyzinski

Spectral inference on multiple networks is a rapidly-developing subfield of graph statistics. Recent work has demonstrated that joint, or simultaneous, spectral embedding of multiple independent networks can deliver more accurate estimation than individual spectral decompositions of those same networks. Such inference procedures typically rely heavily on independence assumptions across the multiple network realizations, and even in this case, little attention has been paid to the induced network correlation that can be a consequence of such joint embeddings...

37645242

2022: Journal of Machine Learning Research: JMLR

#17

JOURNAL ARTICLE

Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials.

Daiqi Gao, Yufeng Liu, Donglin Zeng

Learning optimal individualized treatment rules (ITRs) has become increasingly important in the modern era of precision medicine. Many statistical and machine learning methods for learning optimal ITRs have been developed in the literature. However, most existing methods are based on data collected from traditional randomized controlled trials and thus cannot take advantage of the accumulative evidence when patients enter the trials sequentially. It is also ethically important that future patients should have a high probability to be treated optimally based on the updated knowledge so far...

37576335

2022: Journal of Machine Learning Research: JMLR

#18

JOURNAL ARTICLE

Interpretable Classification of Categorical Time Series Using the Spectral Envelope and Optimal Scalings.

Zeda Li, Scott A Bruce, Tian Cai

This article introduces a novel approach to the classification of categorical time series under the supervised learning paradigm. To construct meaningful features for categorical time series classification, we consider two relevant quantities: the spectral envelope and its corresponding set of optimal scalings. These quantities characterize oscillatory patterns in a categorical time series as the largest possible power at each frequency, or spectral envelope , obtained by assigning numerical values, or scalings , to categories that optimally emphasize oscillations at each frequency...

37234236

2022: Journal of Machine Learning Research: JMLR

#19

JOURNAL ARTICLE

Extensions to the Proximal Distance Method of Constrained Optimization.

Alfonso Landeros, Oscar Hernan Madrid Padilla, Hua Zhou, Kenneth Lange

The current paper studies the problem of minimizing a loss f ( x ) subject to constraints of the form Dx ∈ S , where S is a closed set, convex or not, and D is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method of optimization with the proximal distance principle. The latter is driven by minimization of penalized objectives <mml:math xmlns:mml="https://www...

37205013

2022: Journal of Machine Learning Research: JMLR

#20

JOURNAL ARTICLE

D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data.

Hai Shu, Zhe Qu, Hongtu Zhu

Modern biomedical studies often collect multi-view data, that is, multiple types of data measured on the same set of objects. A popular model in high-dimensional multi-view data analysis is to decompose each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views, a low-rank distinctive-source matrix corresponding to each view, and an additive noise matrix. We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA)...

35983506

2022: Journal of Machine Learning Research: JMLR

Use the journals feature with a free QxMD account.

Journal of Machine Learning Research : JMLR

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips