# Studies series – Byzantine Multi-Agent Optimization: Part II

Continuing the Byzantine Fault-tolerant series started last Friday, here it is the second paper :

## Byzantine Multi-Agent Optimization: Part II

In Part I of this report, we introduced a Byzantine fault-tolerant distributed optimization problem whose goal is to optimize a sum of convex (cost) functions with real-valued scalar input/ouput. In this second part, we introduce a condition-based variant of the original problem over arbitrary directed graphs. Specifically, for a given collection of k input functions h1(x),,hk(x), we consider the scenario when the local cost function stored at agent j, denoted by gj(x), is formed as a convex combination of the k input functions h1(x),,hk(x). The goal of this condition-based problem is to generate an output that is an optimum of 1/∑(k)i=hi(x). Depending on the availability of side information at each agent, two slightly different variants are considered. We show that for a given graph, the problem can indeed be solved despite the presence of faulty agents. In particular, even in the absence of side information at each agent, when adequate redundancy is available in the optima of input functions, a distributed algorithm is proposed in which each agent carries minimal state across iterations.

In this paper the authors consider the case where the consensus is condition-based, where the input to the servers or clients (agents) is restricted to be within some acceptable set:
Our problem formulation is motivated by the work on condition-based consensus, where the inputs of the agents are restricted to be within some acceptable set. Each agent i maintains state xi , with xi(t) denoting the local estimate of the optimal x, computed by node i at the end of the t-th iteration of the algorithm, with xi(0) denoting its initial local estimate. At the start of the t-th iteration (t > 0), the local estimate of agent i is xi(t − 1). The algorithms of interest will require each agent i to perform the following three steps in iteration t, where t > 0. Note that the faulty agents may deviate from this specification. Since each hj (·) is convex and L-Lipschitz continuous, and Pk j=1 Aji = 1, it follows that each gi(·) is also convex and L-Lipschitz continuous. Note that the formulation allows n < k as well as n ≥ k. The matrix A is termed as a job assignment matrix. The goal here is to develop algorithms that output xi = xe at each non-faulty agent i such that:

xe ∈ argmin h(x) = 1/k ∑ (k) j=1 hj (x).                         (1)

That is, we are interested in developing algorithms in which the local estimate of each non-faulty agent will eventually reach consensus, and the consensus value is an optimum of function h(·).

## Related Work

This research concerns the implementation of Byzantine  synchronous consensus, where the focus is on condition-based consensus and iterative approximate Byzantine consensus. Interestingly it is established a connection between asynchronous consensus and error-correcting codes (ECC):

Iterative approximate consensus requires that the agents agree with each other only approximately, using local communication and maintaining minimal state across iterations. Condition-based consensus  restricts the inputs of the agents to be within some acceptable set. Reference research showed that if a condition (the set of allowable system inputs) is f–acceptable, then consensus can be achieved in the presence of up to f crash failures over complete graphs. A connection between asynchronous consensus and error-correcting codes (ECC) was established, observing that crash failures and Byzantine failures correspond to erasures and substitution errors, respectively, in ECCs. Condition-based approach can also be used in synchronous system to speed up the agreement.

In this paper the developments focused around distributed optimization of a network of communicating links that might include faulty agents, who could perform random attacks to thwart the network:

(…)  In contrast, we consider the system in which up to f agents may be Byzantine, i.e., up to f agents may be adversarial and try to mislead the system to function improperly. We are not aware of the existence of results obtained in this report.

In other related work, significant attempts have been made to solve the problem of distributed hypothesis testing in the presence of Byzantine attacks, where Byzantine sensors may transmit fictitious observations aimed at confusing the decision maker to arrive at a judgment that is in contrast with the true underlying distribution. Consensus based variant of distributed event detection, where a centralized data fusion center does not exist, is considered. In contrast, in this paper, we focus on the Byzantine attacks on the multi-agent optimization problem.

## Discussion and Conclusion

After performing proof of relevant theorems and implementing presentations of pseudo-codes such as this one:

#### Algorithm 1:

Steps to be performed by agent i ∈ V in iteration t ≥ 0.

Initialization: xi(0) ← x0

1. Transmit step: Compute g ′ i (xi(t − 1)), the gradient of gi(·) at xi(t − 1), and perform Byzantine broadcast of g ′ i (xi(t − 1)) to all agents.

2. Receive step: Receive gradients from all other agents. Let y i (t) be a n–dimensional vector of received gradients, with y i j (t) being the value received from agent j. If j ∈ V − F, then y i j (t) = g ′ j (xj (t − 1)).

3. Gradient Decoding step: Perform the decoding procedure in  to recover d(t) = [h ′ 1 (xi(t − 1)), · · · , h′ k (xi(t − 1))]^T

4. Update step: Update its local estimate as follows.

xi(t) = xi(t − 1) − α(t − 1)∑ (k) j=1  h ′ j (xi(t − 1)).                         (2)

Due to the use of Byzantine broadcast, the communication load in Algorithm 1 is high. The communication cost can be reduced by using a matrix A that has stronger error-correction ability. In general, there is some tradeoff among the communication cost, the graph structure and the error-correcting capability of A. Our main focus of this paper is the case when no side-information is available at each agent, thus we do not pursue this tradeoff further.

What follows from here is an Algorithm 2 (not presented), a more efficient version of Algorithm 1, where no side-information is available to each agent and where the job assignment matrices used are characterized by a sparsity parameter – a new property introduced in this report over matrices. This algorithm, in its current form, is restricted in the inputs allowed, though.

The main conclusions following a long string of definitions, theorem proofs and corollaries as well as an appendix section, were as follows:

In this report, we introduce the condition-based approach to Byzantine multi-agent optimization. We have shown that when there is enough redundancy in the local cost functions, or in the local optima, Problem 1 can be solved iteratively.

Two slightly different variants are considered: condition-based Byzantine multi-agent optimization with side information and condition-based Byzantine multi-agent optimization without side information. For the former, when side information is available at each agent, a decoding-based algorithm is proposed, assuming each input function is differentiable. This algorithm combines the gradient method with the decoding procedure introduced in a referenced paper by choosing proper “generator matrices” as job assignment matrices. With such a decoding subroutine, our algorithm essentially performs the gradient method, where gradient computation is processed distributedly over the multi-agent system. When side information is not available at each agent, we propose a simple consensus-based algorithm in which each agent carries minimal state across iterations. This consensus-based algorithm solves Problem 1 under the additional assumption over input functions that all input functions share at least one common optimum. Although the consensus-based algorithm can only solve Problem 1 for a restricted class of input functions, nevertheless, as each non-faulty agent does not need to store the job matrix A throughout execution and does not need to perform the decoding procedure at each iteration, the requirements on memory and computation are less stringent comparing to the decoding-based algorithm. In addition, in contrast to the decoding-based algorithm, the consensus-based algorithm also works for non-smooth input functions. Thus, the consensus-based algorithm may be more practical in some applications.

Featured Image:  Blockchain Technology, Smart Contracts and Smart Property