Tuning machine parameters of particle accelerators is a repetitive and time-consuming task that is challenging to automate. While many off-the-shelf optimization algorithms are available, in practice their use is limited because most methods do not account for safety-critical constraints in each iteration, such as loss signals or step-size limitations. One notable exception is safe Bayesian optimization, which is a data-driven tuning approach for global optimization with noisy feedback. We propose and evaluate a step-size limited variant of safe Bayesian optimization on two research facilities of the Paul Scherrer Institut (PSI): a) the Swiss Free Electron Laser (SwissFEL) and b) the High-Intensity Proton Accelerator (HIPA). We report promising experimental results on both machines, tuning up to 16 parameters subject to 224 constraints.

Optimal experimental design seeks to determine the most informative allocation of experiments to infer an unknown statistical quantity. In this work, we investigate optimal design of experiments for em estimation of linear functionals in reproducing kernel Hilbert spaces (RKHSs). This problem has been extensively studied in the linear regression setting under an estimability condition, which allows estimating parameters without bias. We generalize this framework to RKHSs, and allow for the linear functional to be only approximately inferred, i.e., with a fixed bias. This scenario captures many important modern applications such as estimation of gradient maps, integrals and solutions to differential equations. We provide algorithms for constructing bias-aware designs for linear functionals. We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise, enabling us to certify estimation with bounded error with high probability.

In this work we introduced DPP-BBO, a natural and easily applicable framework for enhancing batch diversity in BBO algorithms which works in more settings than previous diversification strategies: it is directly applicable to the continuous domain case, when due to approximation and non-standard models we are unable to compute hallucinations or confidence intervals (as in the Cox process example), or more generally when used in combination with any randomized BBO sampling scheme or arbitrary diversity kernel. Moreover, for DPP-TS we show improved theoretical guarantees and strong practical performance on simple regret.

A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest. Classical experimental design optimally allocates the experimental budget to maximize a notion of utility (e.g., reduction in uncertainty about the unknown quantity). We consider a rich setting, where the experiments are associated with states in a em Markov chain, and we can only choose them by selecting a em policy controlling the state transitions. This problem captures important applications, from exploration in reinforcement learning to spatial monitoring tasks. We propose an algorithm – textscmarkov-design – that efficiently selects policies whose measurement allocation emphprovably converges to the optimal one. The algorithm is sequential in nature, adapting its choice of policies (experiments) informed by past measurements. In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.

Inhomogeneous Poisson point processes are widely used models of event occurrences. We address emphadaptive sensing of Poisson Point processes, namely, maximizing the number of captured events subject to sensing costs. We encode prior assumptions on the rate function by modeling it as a member of a known emphreproducing kernel Hilbert space (RKHS). By partitioning the domain into separate small regions, and using heteroscedastic linear regression, we propose a tractable estimator of Poisson process rates for two feedback models: emphcount-record, where exact locations of events are observed, and emphhistogram feedback, where only counts of events are observed. We derive provably accurate anytime confidence estimates for our estimators for sequentially acquired Poisson count data. Using these, we formulate algorithms based on optimism that provably incur sublinear count-regret. We demonstrate the practicality of the method on problems from crime modeling, revenue maximization as well as environmental monitoring.

We present the first approach for learning–from a single trajectory–a linear quadratic regulator (LQR), even for unstable systems, without knowledge of the system dynamics and without requiring an initial stabilizing controller. Our central contribution is an efficient algorithm–emph eXploration–that quickly identifies a stabilizing controller. Our approach utilizes robust System Level Synthesis (SLS), and we prove that it succeeds in a constant number of iterations. Our approach can be used to initialize existing algorithms that require a stabilizing controller as input. When used in this way, it yields a method for learning LQRs from a single trajectory and even for unstable systems, while suffering at most regret.

Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.

The increasing availability of massive data sets poses a series of challenges for machine learning. Prominent among these is the need to learn models under hardware or human resource constraints. In such resource-constrained settings, a simple yet powerful approach is to operate on small subsets of the data. Coresets are weighted subsets of the data that provide approximation guarantees for the optimization objective. However, existing coreset constructions are highly model-specific and are limited to simple models such as linear regression, logistic regression, and k-means. In this work, we propose a generic coreset construction framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem. In contrast to existing approaches, our framework does not require model-specific adaptations and applies to any twice differentiable model, including neural networks. We show the effectiveness of our framework for a wide range of models in various settings, including training non-convex models online and batch active learning.

Soft robots have applications in safe human–robot interactions, manipulation of fragile objects, and locomotion in challenging and unstructured environments. In this article, we present a computational method for augmenting soft robots with proprioceptive sensing capabilities. Our method automatically computes a minimal stretch-receptive sensor network to user-provided soft robotic designs, which is optimized to perform well under a set of user-specified deformation-force pairs. The sensorized robots are able to reconstruct their full deformation state, under interaction forces. We cast our sensor design as a subselection problem, selecting a minimal set of sensors from a large set of fabricable ones, which minimizes the error when sensing specified deformation-force pairs. Unique to our approach is the use of an analytical gradient of our reconstruction performance measure with respect to selection variables. We demonstrate our technique on a bending bar and gripper example, illustrating more complex designs with a simulated tentacle.

We study adaptive sensing of Cox point processes, a widely used model from spatial statistics. We introduce three tasks: maximization of captured events, search for the maximum of the intensity function and learning level sets of the intensity function. We model the intensity function as a sample from a truncated Gaussian process, represented in a specially constructed positive basis. In this basis, the positivity constraint on the intensity function has a simple form. We show how the emphminimal description positive basis can be adapted to the covariance kernel, to non-stationarity and make connections to common positive bases from prior works. Our adaptive sensing algorithms use Langevin dynamics and are based on posterior sampling (textscCox-Thompson) and top-two posterior sampling (textscTop2) principles. With latter, the difference between samples serves as a surrogate to the uncertainty. We demonstrate the approach using examples from environmental monitoring and crime rate modeling, and compare it to the classical Bayesian experimental design approach.

Bayesian optimization and kernelized bandit algorithms are widely used techniques for sequential black box function optimization with applications in parameter tuning, control, robotics among many others. To be effective in high dimensional settings, previous approaches make additional assumptions, for example on low-dimensional subspaces or an additive structure. In this work, we go beyond the additivity assumption and use an orthogonal projection pursuit regression model, which strictly generalizes additive models. We present a two-stage algorithm motivated by experimental design to first decorrelate the additive components. Subsequently, the bandit optimization benefits from the statistically efficient additive model. Our method provably decorrelates the fully additive model and achieves optimal sublinear simple regret in terms of the number of function evaluations. To prove the rotation recovery, we derive novel concentration inequalities for linear regression on subspaces. In addition, we specifically address the issue of acquisition function optimization and present two domain dependent efficient algorithms. We validate the algorithm numerically on synthetic as well as real-world optimization problems.

Coresets are small data summaries that are sufficient for model training. They can be maintained online, enabling efficient handling of large data streams under resource constraints. However, existing constructions are limited to simple models such as k-means and logistic regression. In this work, we propose a novel coreset construction via cardinality-constrained bilevel optimization. We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.

We analyze the convergence rate of the randomized Newton-like method introduced by Qu et. al. (2016) for smooth and convex objectives, which uses random coordinate blocks of a Hessian-over-approximation matrix $퐌$ instead of the true Hessian. The convergence analysis of the algorithm is challenging because of its complex dependence on the structure of $퐌$. However, we show that when the coordinate blocks are sampled with probability proportional to their determinant, the convergence rate depends solely on the eigenvalue distribution of matrix $퐌$, and has an analytically tractable form. To do so, we derive a fundamental new expectation formula for determinantal point processes. We show that determinantal sampling allows us to reason about the optimal subset size of blocks in terms of the spectrum of $퐌$. Additionally, we provide a numerical evaluation of our analysis, demonstrating cases where determinantal sampling is superior or on par with uniform sampling.

Parameter tuning is a notoriously time-consuming task in accelerator facilities. As tool for global optimization with noisy evaluations, Bayesian optimization was recently shown to outperform alternative methods. By learning a model of the underlying function using all available data, the next evaluation can be chosen carefully to find the optimum with as few steps as possible and without violating any safety constraints. However, the per-step computation time increases significantly with the number of parameters and the generality of the approach can lead to slow convergence on functions that are easier to optimize. To overcome these limitations, we divide the global problem into sequential subproblems that can be solved efficiently using safe Bayesian optimization. This allows us to trade off local and global convergence and to adapt to additional structure in the objective function. Further, we provide slice-plots of the function as user feedback during the optimization. We showcase how we use our algorithm to tune up the FEL output of SwissFEL with up to 40 parameters simultaneously, and reach convergence within reasonable tuning times in the order of 30 minutes (< 2000 steps).

Bayesian optimization is known to be difficult to scale to high dimensions, because the acquisition step requires solving a non-convex optimization problem in the same search space. In order to scale the method and keep its benefits, we propose an algorithm (LineBO) that restricts the problem to a sequence of iteratively chosen one-dimensional sub-problems that can be solved efficiently. We show that our algorithm converges globally and obtains a fast local rate when the function is strongly convex. Further, if the objective has an invariant subspace, our method automatically adapts to the effective dimension without changing the algorithm. When combined with the SafeOpt algorithm to solve the sub-problems, we obtain the first safe Bayesian optimization algorithm with theoretical guarantees applicable in high-dimensional settings. We evaluate our method on multiple synthetic benchmarks, where we obtain competitive performance. Further, we deploy our algorithm to optimize the beam intensity of the Swiss Free Electron Laser with up to 40 parameters while satisfying safe operation constraints.

We develop an efficient and provably no-regret Bayesian optimization (BO) algorithm for optimization of black-box functions in high dimensions. We assume a generalized additive model with possibly overlapping variable groups. When the groups do not overlap, we are able to provide the first provably no-regretemph polynomial time(in the number of evaluations of the acquisition function) algorithm for solving high dimensional BO. To make the optimization efficient and feasible, we introduce a novel deterministic Fourier Features approximation based on numerical integration with detailed analysis for the squared exponential kernel. The error of this approximation decreasesemph exponentially with the number of features, and allows for a precise approximation of both posterior mean and variance. In addition, the kernel matrix inversion improves in its complexity from cubic to essentially linear in the number of data points measured in basic arithmetic operations.

We propose a parallel stochastic Newton method (PSN) for minimizing unconstrained smooth convex functions. We analyze the method in the strongly convex case, and give conditions under which acceleration can be expected when compared to its serial counter-part. We show how PSN can be applied to the large quadratic function minimization in general, and empirical risk minimization problems. We demonstrate the practical efficiency of the method through numerical experiments and models of simple matrix classes.

A stochastic iterative algorithm approximating second-order information using von Neumann series is discussed. We present convergence guarantees for strongly-convex and smooth functions. Our analysis is much simpler in contrast to a similar algorithm and its analysis, LISSA. The algorithm is primarily suitable for training large scale linear models, where the number of data points is very large. Two novel analyses, one showing space independent linear convergence, and one showing conditional quadratic convergence are discussed. In numerical experiments, the behavior of the error is similar to the second-order algorithm L-BFGS, and improves the performance of LISSA for quadratic objective function.