Publications

You can also find my articles on my Google Scholar profile.

Modern Adaptive Experiment Design: Machine Learning Perspective

Mutny, Mojmir, Modern Adaptive Experiment Design: Machine Learning Perspective.”, 2024. PhD Thesis, ETH Zurich

Abstract: A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest. Optimal experimental design is a branch of statistics addressing the most informative allocation of experiments in order to infer an unknown quantity of interest. In this thesis, we fundamentally revisit and extend the methods and approaches of optimal design of experiments when our unknown quantity is a member of a reproducing kernel Hilbert space (RKHS). Estimation in RKHS is a versatile, yet sample-efficient non-parametric statistical method that allows us to capture non-linear natural phenomena using the concept of a similarity also known as the kernel. The process to set up an statistical experiment design pipeline begins by establishing a mathematical model and defining the experiment’s goal as a utility. We consider the available experiments and adapt to uncertain outcomes. This thesis specifically tackles adaptive experiment design, where previous experiment outcomes inform future utility design and selection strategies. We first outline a general methodology for adaptive experiment design, addressing uncertainty and combinatorial issues through relaxation techniques. We introduce a master algorithm and its linearized form based on the Frank-Wolfe algorithm, applicable to general experimental design problems. This algorithm is demonstrated in applications involving various utilities, including reward minimization and information gathering. Subsequent sections delve into the impact of structural assumptions about the unknown quantity in RKHS, such as additivity and projection pursuit structures. We analyze these models and additionally introduce novel mathematical structures in RKHS, aiming to enhance the richness and resource efficiency of experiment design. Adaptive design requires statistical estimation and confidence estimates for the unknown quantity in the presence of randomness. We thoroughly analyze this issue from a worst-case perspective, developing confidence sets for probability distributions from parametrized families. Departing from the worst-case perspective, we provide likelihood-based confidence sets for experiments with well-specified likelihood models that provide significant improvement over the worst-case analysis. Additionally, we explore complex experiment design scenarios, where the design involves executing a policy in a sequence of steps generating trajectories. We provide a tractable reformulation of this problem using Markov chains, providing examples and modifications to traditional experimental design methods. Lastly, we demonstrate the application and versatility of modern adaptive experiment design in various domains, including enzyme optimization, adaptive Poisson sensing for spatio-temporal events, learning differential equations, and classical problems related to inferring functionals of unknown quantities.

Use Google Scholar for full citation

Transition Constrained Bayesian Optimization via Markov Decision Processes

Folch, Jose Pablo and Tsay, Calvin and Lee, Robert M and Shafei, Behrang and Ormaniec, Weronika and Krause, Andreas and van der Wilk, Mark and Misener, Ruth and Mutny, Mojmir, Transition Constrained Bayesian Optimization via Markov Decision Processes.”, 2024. arXiv:2402.08406

Abstract: Bayesian optimization is a methodology to optimize black-box functions. Traditionally, it focuses on the setting where you can arbitrarily query the search space. However, many real-life problems do not offer this flexibility; in particular, the search space of the next query may depend on previous ones. Example challenges arise in the physical sciences in the form of local movement constraints, required monotonicity in certain variables, and transitions influencing the accuracy of measurements. Altogether, such transition constraints necessitate a form of planning. This work extends Bayesian optimization via the framework of Markov Decision Processes, iteratively solving a tractable linearization of our objective using reinforcement learning to obtain a policy that plans ahead over long horizons. The resulting policy is potentially history-dependent and non-Markovian. We showcase applications in chemical reactor optimization, informative path planning, machine calibration, and other synthetic examples.

Full text here

Submodular Reinforcement Learning

Prajapat, Manish and Mutny, Mojmir and Zeilinger, Melanie N and Krause, Andreas, Submodular Reinforcement Learning.”, 2024. ICLR 2024

Abstract: In reinforcement learning (RL), rewards of states are typically considered additive, and following the Markov assumption, they are of states visited previously. In many important applications, such as coverage control, experiment design and informative path planning, rewards naturally have diminishing returns, i.e., their value decreases in light of similar states visited previously. To tackle this, we propose (SubRL), a paradigm which seeks to optimize more general, non-additive (and history-dependent) rewards modelled via submodular set functions which capture diminishing returns. Unfortunately, in general, even in tabular settings, we show that the resulting optimization problem is hard to approximate. On the other hand, motivated by the success of greedy algorithms in classical submodular optimization, we propose SubPO, a simple policy gradient-based algorithm for SubRL that handles non-additive rewards by greedily maximizing marginal gains. Indeed, under some assumptions on the underlying Markov Decision Process (MDP), SubPO recovers optimal constant factor approximations of submodular bandits. Moreover, we derive a natural policy gradient approach for locally optimizing SubRL instances even in large state- and action- spaces. We showcase the versatility of our approach by applying SubPO to several applications, such as biodiversity monitoring, Bayesian experiment design, informative path planning, and coverage maximization. Our results demonstrate sample efficiency, as well as scalability to high-dimensional state-action spaces.

Full text here

Sequential Experimental Design and Optimization with Structural Assumptions: Additivity and Subspaces

Mutny, Mojmir and Krause, Andreas, Sequential Experimental Design and Optimization with Structural Assumptions: Additivity and Subspaces.”, 2024. (in preparation for JMLR)

Abstract: Experiment design in high-dimensional problems is an important practical problem in many fields of science. Without further structural assumptions, the experiment design algorithms based on reproducing kernel hilbert spaces with classical kernels suffer from curse of dimensionality. In this work we explore additional structural assumptions of additive structure or variability in low dimensional linear subspace. We present the prior works in unified framework and analyze the projection pursuit regression models in the context of experiment design, which strictly generalize additive models. We combine them with potentially non-linear effective dimension via manifold projection pursuit regression model. We contrast to other methods developed and we address practical issues such as acquisition function optimization.

Use Google Scholar for full citation

Enhanced Sequence-Activity Mapping and Evolution of Artificial Metalloenzymes by Active Learning

Vornholt, Tobias and Mutny, Mojmir and Schmidt, Gregor and Schellhaas, Christian and Tachibana, Ryo and Panke, Sven and Ward, Thomas R and Krause, Andreas and Jeschek, Markus, Enhanced Sequence-Activity Mapping and Evolution of Artificial Metalloenzymes by Active Learning.”, 2024. preprint

Abstract: Tailored enzymes hold great potential to accelerate the transition to a sustainable bioeconomy. Yet, enzyme engineering remains challenging as it relies largely on serendipity and is, therefore, highly laborious and prone to failure. The efficiency and success rates of engineering campaigns may be improved substantially by applying machine learning to construct a comprehensive representation of the sequence-activity landscape from small sets of experimental data. However, it often proves challenging to reliably model a large protein sequence space while keeping the experimental effort tractable. To address this challenge, we present an integrated pipeline combining large-scale screening with active machine learning and model-guided library design. We applied this strategy to efficiently engineer an artificial metalloenzyme (ArM) catalysing a new-to-nature hydroamination reaction. By combining lab automation and next-generation sequencing, we acquired sequence-activity data for several thousand ArM variants. We then used Gaussian process regression to model the activity landscape and guide further screening rounds according to user-defined objectives. Crucial characteristics of our enhanced enzyme engineering pipeline include i) the cost-effective generation of information-rich experimental data sets, ii) the integration of an explorative round to improve the performance of the model, as well as iii) the consideration of experimental noise during modelling. Our approach led to an order-of-magnitude boost in the hit rate of screening while making efficient use of experimental resources. Smart search strategies like this should find broad utility in enzyme engineering and accelerate the development of novel biocatalysts.

Full text here

Data Summarization via Bilevel Optimization

Borsos, Zalan and Mutny, Mojmir and Tagliasacchi, Marco and Krause, Andreas, Data Summarization via Bilevel Optimization.”, 2024. JMLR

Abstract: The increasing availability of massive data sets poses a series of challenges for machine learning. Prominent among these is the need to learn models under hardware or human resource constraints. In such resource-constrained settings, a simple yet powerful approach is to operate on small subsets of the data. Coresets are weighted subsets of the data that provide approximation guarantees for the optimization objective. However, existing coreset constructions are highly model-specific and are limited to simple models such as linear regression, logistic regression, and k-means. In this work, we propose a generic coreset construction framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem. In contrast to existing approaches, our framework does not require model-specific adaptations and applies to any twice differentiable model, including neural networks. We show the effectiveness of our framework for a wide range of models in various settings, including training non-convex models online and batch active learning.

Use Google Scholar for full citation

Optimistic Games for Combinatorial Bayesian Optimization with Applications to Protein Design

Bal, Melis Ilayda and Sessa, Pier Giuseppe and Mutny, Mojmir and Krause, Andreas, Optimistic Games for Combinatorial Bayesian Optimization with Applications to Protein Design.”, 2023. 3rd ReALML Workshop at NeurIPS 2023

Abstract: Bayesian optimization (BO) is a powerful framework to optimize black box expensive-to-evaluate functions via sequential interactions. In several important problems (e.g. drug discovery, circuit design, neural architecture search, etc.), though, such functions are defined over $\textit{combinatorial and unstructured}$ spaces. This makes existing BO algorithms not feasible due to the intractable maximization of the acquisition function to find informative evaluation points. To address this issue, we propose $\textbf{GameOpt}$, a novel game-theoretical approach to combinatorial BO. $\textbf{GameOpt}$ establishes a cooperative game between the different optimization variables and computes informative points to be game $\textit{equilibria}$ of the acquisition function. These are stable configurations from which no variable has an incentive to deviate – analogous to local optima in continuous domains. Crucially, this allows us to efficiently break down the complexity of the combinatorial domain into individual decision sets, making $\textbf{GameOpt}$ scalable to large combinatorial spaces. We demonstrate the application of $\textbf{GameOpt}$ to the challenging $\textit{protein design}$ problem and validate its performance on two real-world protein datasets. Each protein can take up to $20^{X}$ possible configurations, where $X$ is the length of a protein, making standard BO methods unusable. Instead, our approach iteratively selects informative protein configurations and very quickly discovers highly active protein variants compared to other baselines.

Full text here

Likelihood Ratio Confidence Sets for Sequential Decision Making

Emmenegger, Nicolas and Mutny, Mojmir and Krause, Andreas, Likelihood Ratio Confidence Sets for Sequential Decision Making.”, 2023. Proc. Neural Information Processing Systems (NeurIPS)

Abstract: Certifiable, adaptive uncertainty estimates for unknown quantities are an essential ingredient of sequential decision-making algorithms. Standard approaches rely on problem-dependent concentration results and are limited to a specific combination of parameterization, noise family, and estimator. In this paper, we revisit the likelihood-based inference principle and propose to use\emph {likelihood ratios} to construct\emph {any-time valid} confidence sequences without requiring specialized treatment in each application scenario. Our method is especially suitable for problems with well-specified likelihoods, and the resulting sets always maintain the prescribed coverage in a model-agnostic manner. The size of the sets depends on a choice of estimator sequence in the likelihood ratio. We discuss how to provably choose the best sequence of estimators and shed light on connections to online convex optimization with algorithms such as Follow-the-Regularized-Leader. To counteract the initially large bias of the estimators, we propose a reweighting scheme that also opens up deployment in non-parametric settings such as RKHS function classes. We provide a\emph {non-asymptotic} analysis of the likelihood ratio confidence sets size for generalized linear models, using insights from convex duality and online learning. We showcase the practical strength of our method on generalized linear bandit problems, survival analysis, and bandits with various additive noise distributions.

Full text here

Active Exploration via Experiment Design in Markov Chains

Mutny, Mojmir and Janik, Tadeusz and Krause, Andreas, Active Exploration via Experiment Design in Markov Chains.”, 2023. Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS)

Abstract: A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest. Classical experimental design optimally allocates the experimental budget to maximize a notion of utility (e.g., reduction in uncertainty about the unknown quantity). We consider a rich setting, where the experiments are associated with states in a {\em Markov chain}, and we can only choose them by selecting a {\em policy} controlling the state transitions. This problem captures important applications, from exploration in reinforcement learning to spatial monitoring tasks. We propose an algorithm – \textsc{markov-design} – that efficiently selects policies whose measurement allocation \emph{provably converges to the optimal one}. The algorithm is sequential in nature, adapting its choice of policies (experiments) informed by past measurements. In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.

Full text here

Experimental Design of Linear Functionals in Reproducing Kernel Hilbert Spaces

Mutny, Mojmir and Krause, Andreas, Experimental Design of Linear Functionals in Reproducing Kernel Hilbert Spaces.”, 2022. Proc. Neural Information Processing Systems (NeurIPS)

Abstract: Optimal experimental design seeks to determine the most informative allocation of experiments to infer an unknown statistical quantity. In this work, we investigate optimal design of experiments for {\em estimation of linear functionals in reproducing kernel Hilbert spaces (RKHSs)}. This problem has been extensively studied in the linear regression setting under an estimability condition, which allows estimating parameters without bias. We generalize this framework to RKHSs, and allow for the linear functional to be only approximately inferred, i.e., with a fixed bias. This scenario captures many important modern applications such as estimation of gradient maps, integrals and solutions to differential equations. We provide algorithms for constructing bias-aware designs for linear functionals. We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise, enabling us to certify estimation with bounded error with high probability.

Full text here

Tuning particle accelerators with safety constraints using Bayesian optimization

Kirschner, Johannes and Mutny, Mojmir and Krause, Andreas and Coello de Portugal, Jaime and Hiller, Nicole and Snuverink, Jochem, Tuning particle accelerators with safety constraints using Bayesian optimization.”, 2022. Phys. Rev. Accel. Beams

Abstract: Tuning machine parameters of particle accelerators is a repetitive and time-consuming task that is challenging to automate. While many off-the-shelf optimization algorithms are available, in practice their use is limited because most methods do not account for safety-critical constraints in each iteration, such as loss signals or step-size limitations. One notable exception is safe Bayesian optimization, which is a data-driven tuning approach for global optimization with noisy feedback. We propose and evaluate a step-size limited variant of safe Bayesian optimization on two research facilities of the PSI:(a) the SwissFEL and (b) HIPA. We report promising experimental results on both machines, tuning up to 16 parameters subject to 224 constraints.

Full text here

Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes

Elvis Nava, Mojm{'\i}r Mutn{'y}, Andreas Krause, Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes.”, 2022. Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS)

Abstract: In this work we introduced DPP-BBO, a natural and easily applicable framework for enhancing batch diversity in BBO algorithms which works in more settings than previous diversification strategies: it is directly applicable to the continuous domain case, when due to approximation and non-standard models we are unable to compute hallucinations or confidence intervals (as in the Cox process example), or more generally when used in combination with any randomized BBO sampling scheme or arbitrary diversity kernel. Moreover, for DPP-TS we show improved theoretical guarantees and strong practical performance on simple regret.

Full text here

Sensing Cox Processes via Posterior Sampling and Positive Bases

Mutny, Mojmir and Krause, Andreas, Sensing Cox Processes via Posterior Sampling and Positive Bases.”, 2022. Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS)

Abstract: We study adaptive sensing of Cox point processes, a widely used model from spatial statistics. We introduce three tasks: maximization of captured events, search for the maximum of the intensity function and learning level sets of the intensity function. We model the intensity function as a sample from a truncated Gaussian process, represented in a specially constructed positive basis. In this basis, the positivity constraint on the intensity function has a simple form. We show how the \emph{minimal description positive basis} can be adapted to the covariance kernel, to non-stationarity and make connections to common positive bases from prior works. Our adaptive sensing algorithms use Langevin dynamics and are based on posterior sampling (\textsc{Cox-Thompson}) and top-two posterior sampling (\textsc{Top2}) principles. With latter, the difference between samples serves as a surrogate to the uncertainty. We demonstrate the approach using examples from environmental monitoring and crime rate modeling, and compare it to the classical Bayesian experimental design approach.

Full text here

Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes

Nava, Elvis and Mutny, Mojmir and Krause, Andreas, Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes.”, 2022. Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS)

Abstract: In this work we introduced DPP-BBO, a natural and easily applicable framework for enhancing batch diversity in BBO algorithms which works in more settings than previous diversification strategies: it is directly applicable to the continuous domain case, when due to approximation and non-standard models we are unable to compute hallucinations or confidence intervals (as in the Cox process example), or more generally when used in combination with any randomized BBO sampling scheme or arbitrary diversity kernel. Moreover, for DPP-TS we show improved theoretical guarantees and strong practical performance on simple regret.

Full text here

No-regret Algorithms for Capturing Events in Poisson Point Processes

Mutny, Mojmir and Krause, Andreas, No-regret Algorithms for Capturing Events in Poisson Point Processes.”, 2021. Proc. International Conference for Machine Learning (ICML)

Abstract: Inhomogeneous Poisson point processes are widely used models of event occurrences. We address \emph{adaptive sensing of Poisson Point processes}, namely, maximizing the number of captured events subject to sensing costs. We encode prior assumptions on the rate function by modeling it as a member of a known \emph{reproducing kernel Hilbert space} (RKHS). By partitioning the domain into separate small regions, and using heteroscedastic linear regression, we propose a tractable estimator of Poisson process rates for two feedback models: \emph{count-record}, where exact locations of events are observed, and \emph{histogram} feedback, where only counts of events are observed. We derive provably accurate anytime confidence estimates for our estimators for sequentially acquired Poisson count data. Using these, we formulate algorithms based on optimism that provably incur sublinear count-regret. We demonstrate the practicality of the method on problems from crime modeling, revenue maximization as well as environmental monitoring.

Full text here

Learning Controllers for Unstable Linear Quadratic Regulators from a Single Trajectory

Treven, Lenart and Curi, Sebastian and Mutny, Mojmir and Krause, Andreas, Learning Controllers for Unstable Linear Quadratic Regulators from a Single Trajectory.”, 2021. In proceedings of Learning for Dynamics \& Control Conference

Abstract: We present the first approach for learning–from a single trajectory–a linear quadratic regulator (LQR), even for unstable systems, without knowledge of the system dynamics and without requiring an initial stabilizing controller. Our central contribution is an efficient algorithm–\emph {eXploration}–that quickly identifies a stabilizing controller. Our approach utilizes robust System Level Synthesis (SLS), and we prove that it succeeds in a constant number of iterations. Our approach can be used to initialize existing algorithms that require a stabilizing controller as input. When used in this way, it yields a method for learning LQRs from a single trajectory and even for unstable systems, while suffering at most regret.

Full text here

Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

Jourdan, Marc and Mutny, Mojmir and Kirschner, Johannes and Krause, Andreas, Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback.”, 2021. Proceedings of the 32nd International Conference on Algorithmic Learning Theory (ALT)

Abstract: Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.

Full text here

Data Summarization via Bilevel Optimization

Abstract: The increasing availability of massive data sets poses a series of challenges for machine learning. Prominent among these is the need to learn models under hardware or human resource constraints. In such resource-constrained settings, a simple yet powerful approach is to operate on small subsets of the data. Coresets are weighted subsets of the data that provide approximation guarantees for the optimization objective. However, existing coreset constructions are highly model-specific and are limited to simple models such as linear regression, logistic regression, and k-means. In this work, we propose a generic coreset construction framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem. In contrast to existing approaches, our framework does not require model-specific adaptations and applies to any twice differentiable model, including neural networks. We show the effectiveness of our framework for a wide range of models in various settings, including training non-convex models online and batch active learning.

MakeSense: Automated Sensor Design for Proprioceptive Soft Robots

Tapia, Javier and Knoop, Espen and Mutny, Mojmir and Otaduy, Miguel A and Bacher, Moritz, MakeSense: Automated Sensor Design for Proprioceptive Soft Robots.”, 2020. Soft Robotics

Abstract: Soft robots have applications in safe human–robot interactions, manipulation of fragile objects, and locomotion in challenging and unstructured environments. In this article, we present a computational method for augmenting soft robots with proprioceptive sensing capabilities. Our method automatically computes a minimal stretch-receptive sensor network to user-provided soft robotic designs, which is optimized to perform well under a set of user-specified deformation-force pairs. The sensorized robots are able to reconstruct their full deformation state, under interaction forces. We cast our sensor design as a subselection problem, selecting a minimal set of sensors from a large set of fabricable ones, which minimizes the error when sensing specified deformation-force pairs. Unique to our approach is the use of an analytical gradient of our reconstruction performance measure with respect to selection variables. We demonstrate our technique on a bending bar and gripper example, illustrating more complex designs with a simulated tentacle.

Full text here

Experimental Design for Optimization of Orthogonal Projection Pursuit Models

Mutny, Mojmir and Kirschner, Johannes and Krause, Andreas, Experimental Design for Optimization of Orthogonal Projection Pursuit Models.”, 2020. Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI)

Abstract: Bayesian optimization and kernelized bandit algorithms are widely used techniques for sequential black box function optimization with applications in parameter tuning, control, robotics among many others. To be effective in high dimensional settings, previous approaches make additional assumptions, for example on low-dimensional subspaces or an additive structure. In this work, we go beyond the additivity assumption and use an orthogonal projection pursuit regression model, which strictly generalizes additive models. We present a two-stage algorithm motivated by experimental design to first decorrelate the additive components. Subsequently, the bandit optimization benefits from the statistically efficient additive model. Our method provably decorrelates the fully additive model and achieves optimal sublinear simple regret in terms of the number of function evaluations. To prove the rotation recovery, we derive novel concentration inequalities for linear regression on subspaces. In addition, we specifically address the issue of acquisition function optimization and present two domain dependent efficient algorithms. We validate the algorithm numerically on synthetic as well as real-world optimization problems.

Full text here

Coresets via Bilevel Optimization for Continual Learning and Streaming

Borsos, Zalan and Mutny, Mojmir and Krause, Andreas, Coresets via Bilevel Optimization for Continual Learning and Streaming.”, 2020. Proc. Neural Information Processing Systems (NeurIPS)

Abstract: Coresets are small data summaries that are sufficient for model training. They can be maintained online, enabling efficient handling of large data streams under resource constraints. However, existing constructions are limited to simple models such as k-means and logistic regression. In this work, we propose a novel coreset construction via cardinality-constrained bilevel optimization. We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.

Full text here

Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling

Mutny, Mojmir and Derezisnki, Michal and Krause, Andreas, Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling.”, 2020. Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS)

Abstract: We analyze the convergence rate of the randomized Newton-like method introduced by Qu et. al. (2016) for smooth and convex objectives, which uses random coordinate blocks of a Hessian-over-approximation matrix $\mathbf{M}$ instead of the true Hessian. The convergence analysis of the algorithm is challenging because of its complex dependence on the structure of $\mathbf{M}$. However, we show that when the coordinate blocks are sampled with probability proportional to their determinant, the convergence rate depends solely on the eigenvalue distribution of matrix $\mathbf{M}$, and has an analytically tractable form. To do so, we derive a fundamental new expectation formula for determinantal point processes. We show that determinantal sampling allows us to reason about the optimal subset size of blocks in terms of the spectrum of $\mathbf{M}$. Additionally, we provide a numerical evaluation of our analysis, demonstrating cases where determinantal sampling is superior or on par with uniform sampling.

Full text here

Bayesian Optimization for Fast and Safe Parameter Tuning of SwissFEL

Kirschner, Johannes and Nonnenmacher, Manuel and Mutny, Mojmir and Hiller, Nicole and Adelmann, Andreas and Ischebeck, Rasmus and Krause, Andreas, Bayesian Optimization for Fast and Safe Parameter Tuning of SwissFEL.”, 2019. Proc. International Free-Electron Laser Conference (FEL2019)

Abstract: Parameter tuning is a notoriously time-consuming task in accelerator facilities. As tool for global optimization with noisy evaluations, Bayesian optimization was recently shown to outperform alternative methods. By learning a model of the underlying function using all available data, the next evaluation can be chosen carefully to find the optimum with as few steps as possible and without violating any safety constraints. However, the per-step computation time increases significantly with the number of parameters and the generality of the approach can lead to slow convergence on functions that are easier to optimize. To overcome these limitations, we divide the global problem into sequential subproblems that can be solved efficiently using safe Bayesian optimization. This allows us to trade off local and global convergence and to adapt to additional structure in the objective function. Further, we provide slice-plots of the function as user feedback during the optimization. We showcase how we use our algorithm to tune up the FEL output of SwissFEL with up to 40 parameters simultaneously, and reach convergence within reasonable tuning times in the order of 30 minutes (< 2000 steps).

Full text here

Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces

Kirschner, Johannes and Mutny, Mojmir and Hiller, Nicole and Ischebeck, Rasmus and Krause, Andreas, Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces.”, 2019. Proc. International Conference for Machine Learning (ICML)

Abstract: Bayesian optimization is known to be difficult to scale to high dimensions, because the acquisition step requires solving a non-convex optimization problem in the same search space. In order to scale the method and keep its benefits, we propose an algorithm (LineBO) that restricts the problem to a sequence of iteratively chosen one-dimensional sub-problems that can be solved efficiently. We show that our algorithm converges globally and obtains a fast local rate when the function is strongly convex. Further, if the objective has an invariant subspace, our method automatically adapts to the effective dimension without changing the algorithm. When combined with the SafeOpt algorithm to solve the sub-problems, we obtain the first safe Bayesian optimization algorithm with theoretical guarantees applicable in high-dimensional settings. We evaluate our method on multiple synthetic benchmarks, where we obtain competitive performance. Further, we deploy our algorithm to optimize the beam intensity of the Swiss Free Electron Laser with up to 40 parameters while satisfying safe operation constraints.

Use Google Scholar for full citation

Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features

Mutny, Mojmir and Krause, Andreas, Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features.”, 2018. Neural and Information Processing Systems (NeurIPS)

Abstract: We develop an efficient and provably no-regret Bayesian optimization (BO) algorithm for optimization of black-box functions in high dimensions. We assume a generalized additive model with possibly overlapping variable groups. When the groups do not overlap, we are able to provide the first provably no-regret\emph {polynomial time}(in the number of evaluations of the acquisition function) algorithm for solving high dimensional BO. To make the optimization efficient and feasible, we introduce a novel deterministic Fourier Features approximation based on numerical integration with detailed analysis for the squared exponential kernel. The error of this approximation decreases\emph {exponentially} with the number of features, and allows for a precise approximation of both posterior mean and variance. In addition, the kernel matrix inversion improves in its complexity from cubic to essentially linear in the number of data points measured in basic arithmetic operations.

Full text here

Parallel Stochastic Newton Method

Mutny, Mojmir and Richtarik, Peter, Parallel Stochastic Newton Method.”, 2018. Journal of Computational Mathematics

Abstract: We propose a parallel stochastic Newton method (PSN) for minimizing unconstrained smooth convex functions. We analyze the method in the strongly convex case, and give conditions under which acceleration can be expected when compared to its serial counter-part. We show how PSN can be applied to the large quadratic function minimization in general, and empirical risk minimization problems. We demonstrate the practical efficiency of the method through numerical experiments and models of simple matrix classes.

Full text here