Computing Nash equilibria for strategic multi-agent systems is challenging
for expensive black box systems. Motivated by the ubiquity of games involving
exploitation of common resources, this paper considers the above problem for
potential games. We use the Bayesian optimization framework to obtain novel
algorithms to solve finite (discrete action spaces) and infinite (real interval
action spaces) potential games, utilizing the structure of potential games.
Numerical results illustrate the efficiency of the approach in computing the
Nash equilibria of static potential games and linear Nash equilibria of dynamic
potential games.

more |
pdf
| html
None.

DO:
A Bayesian optimization approach to compute the Nash equilibria of potential games using bandit feedback. https://t.co/sy9bF63a8J

desantis:
RT @DO: A Bayesian optimization approach to compute the Nash equilibria of potential games using bandit feedback. https://t.co/sy9bF63a8J

desantis:
RT @DO: A Bayesian optimization approach to compute the Nash equilibria of potential games using bandit feedback. https://t.co/sy9bF63a8J

desantis:
RT @DO: A Bayesian optimization approach to compute the Nash equilibria of potential games using bandit feedback. https://t.co/sy9bF63a8J

mwilcox:
RT @DO: A Bayesian optimization approach to compute the Nash equilibria of potential games using bandit feedback. https://t.co/sy9bF63a8J

MirakhorHassan:
RT @DO: A Bayesian optimization approach to compute the Nash equilibria of potential games using bandit feedback. https://t.co/sy9bF63a8J

Eschersand:
RT @DO: A Bayesian optimization approach to compute the Nash equilibria of potential games using bandit feedback. https://t.co/sy9bF63a8J

timelessdev:
RT @DO: A Bayesian optimization approach to compute the Nash equilibria of potential games using bandit feedback. https://t.co/sy9bF63a8J

RoyalDogee:
RT @DO: A Bayesian optimization approach to compute the Nash equilibria of potential games using bandit feedback. https://t.co/sy9bF63a8J

inertiahere:
RT @DO: A Bayesian optimization approach to compute the Nash equilibria of potential games using bandit feedback. https://t.co/sy9bF63a8J

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 7802

Unqiue Words: 1962

This paper tackles the distributed leader-follower (L-F) control problem for
heterogeneous mobile robots in unknown environments requiring obstacle
avoidance, inter-robot collision avoidance, and reliable robot communications.
To prevent an inter-robot collision, we employ a virtual propulsive force
between robots. For obstacle avoidance, we present a novel distributed
Negative-Imaginary (NI) variant formation tracking control approach and a
dynamic network topology methodology which allows the formation to change its
shape and the robot to switch their roles. In the case of communication or
sensor loss, a UAV, controlled by a Strictly-Negative-Imaginary (SNI)
controller with good wind resistance characteristics, is utilized to track the
position of the UGV formation using its camera. Simulations and indoor
experiments have been conducted to validate the proposed methods.

more |
pdf
| html
ComputerPapers:
Distributed Obstacle and Multi-Robot Collision Avoidance in Uncertain Environments. https://t.co/nl97Z9HGPk

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 8761

Unqiue Words: 2683

The decision to centralise or decentralise human organisations requires
quantified evidence but little is available in the literature. We provide such
data in a variant of the Multiple Travelling Salesmen Problem (MTSP) in which
we study how the allocation sub-problem may be decentralised among selfish
selfmen. Our contributions are (i) this modification of the MTSP in order to
include selfishness, (ii) the proposition of organisations to solve this
modified MTSP, and (iii) the comparison of these organisations. Our 5
organisations may be summarised as follows: (i) OptDecentr is a pure
Centralised Organisation (CO) in which a Central Authority (CA) finds the best
solution which could be found by a Decentralised Organisation (DO), (ii)
Cluster and (iii) Auction are CO/DO hybrids, and (iv) P2P and (v) CNP are pure
DO. Sixth and seventh organisations are used as benchmarks: (vi) NoRealloc is a
pure DO which ignores the allocation problem, and (vii) FullCentr is a pure CO
which solves a different problem, viz., the traditional MTSP....

more |
pdf
| html
ComputerPapers:
Cost of selfishness in the allocation of cities in the Multiple Travelling Salesmen Problem. https://t.co/RWjNrODhGV

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 12706

Unqiue Words: 2712

The movement of cooperative robots in a densely cluttered environment may not
be possible if the formation type is invariant. Hence, we investigate a new
method for time-varying formation control for a group of heterogeneous
autonomous vehicles, which may include Unmanned Ground Vehicles (UGV) and
Unmanned Aerial Vehicles (UAV). We have extended a Negative-Imaginary (NI)
consensus control approach to switch the formation shape of the robots whilst
only using the relative distance between agents and between agents and
obstacles. All agents can automatically create a new safe formation to overcome
obstacles based on a novel geometric method, then restore the prototype
formation once the obstacles are cleared. Furthermore, we improve the position
consensus at sharp corners by achieving yaw consensus between robots.
Simulation and experimental results are then analyzed to validate the
feasibility of our proposed approach.

more |
pdf
| html
ComputerPapers:
Time-Varying Formation Control of a Collaborative Multi-Agent System Using Negative-Imaginary Systems Theory. https://t.co/5EEt12hwc3

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 9409

Unqiue Words: 2574

This paper formalizes path planning problem for a group of heterogeneous
Dubins vehicles performing tasks in a remote fashion and develops a memetic
algorithm-based method to effectively produce the paths. In the setting, the
vehicles are initially located at multiple depots in a two-dimensional space
and the objective of planning is to minimize a weighted sum of the total tour
cost of the group and the largest individual tour cost amongst the vehicles.
While the presented formulation takes the form of a mixed-integer linear
program (MILP) for which off-the-shelf solvers are available, the MILP solver
easily loses the tractability as the number of tasks and agents grow.
Therefore, a memetic algorithm tailored to the presented formulation is
proposed. The algorithm features a sophisticated encoding scheme to
efficiently. In addition, a path refinement technique that optimizes on the
detailed tours with the sequence of visits fixed is proposed to finally obtain
further optimized trajectories. Comparative numerical experiments show...

more |
pdf
| html
ComputerPapers:
Memetic Algorithm-Based Path Generation for Multiple Dubins Vehicles Performing Remote Tasks. https://t.co/7nEByg9hjW

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 13207

Unqiue Words: 3052

We present a framework and algorithm for peer-to-peer teaching in cooperative
multiagent reinforcement learning. Our algorithm, Learning to Coordinate and
Teach Reinforcement (LeCTR), trains advising policies by using students'
learning progress as a teaching reward. Agents using LeCTR learn to assume the
role of a teacher or student at the appropriate moments, exchanging action
advice to accelerate the entire learning process. Our algorithm supports
teaching heterogeneous teammates, advising under communication constraints, and
learns both what and when to advise. LeCTR is demonstrated to outperform the
final performance and rate of learning of prior teaching methods on multiple
benchmark domains. To our knowledge, this is the first approach for learning to
teach in a multiagent setting.

more |
pdf
| html
None.

cbtheis:
Learning to Teach in Cooperative Multiagent Reinforcement Learning https://t.co/ucEJVuTsUP @IBMResearch

None.

None.

Sample Sizes : None.

Authors: 8

Total Words: 8359

Unqiue Words: 2565

We provide an in-depth study of the knowledge-theoretic aspects of
communication in so-called gossip protocols. Pairs of agents communicate by
means of calls in order to spread information---so-called secrets---within the
group. Depending on the nature of such calls knowledge spreads in different
ways within the group. Systematizing existing literature, we identify 18
different types of communication, and model their epistemic effects through
corresponding indistinguishability relations. We then provide a classification
of these relations and show its usefulness for an epistemic analysis in
presence of different communication types. Finally, we explain how to formalise
the assumption that the agents have common knowledge of a distributed epistemic
gossip protocol.

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 14282

Unqiue Words: 2249

For some decision processes a significant added value is achieved when
enterprises' internal Data Warehouse (DW) can be integrated and combined with
external data gained from web sites of competitors and other relevant Web
sources. In this paper we discuss the agent-based integration approach using
ontologies (DSS-MAS). In this approach data from internal DW and external
sources are scanned by coordinated group of agents, while semantically
integrated and relevant data is reported to business users according to
business rules. After data from internal DW, Web sources and business rules are
acquired, agents using these data and rules can infer new knowledge and
therefore facilitate decision making process. Knowledge represented in
enterprises' ontologies is acquired from business users without extensive
technical knowledge using user friendly user interface based on constraints and
predefined templates. The approach presented in the paper was verified using
the case study from the domain of mobile communications with the emphasis...

more |
pdf
| html
None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 7975

Unqiue Words: 2262

Decision making in large scale urban environments is critical for many
applications involving continuous distribution of resources and utilization of
infrastructure, such as ambient lighting control and traffic management.
Traditional decision making methods involve extensive human participation, are
expensive, and inefficient and unreliable for hard-to-predict situations.
Modern technology, including ubiquitous data collection though sensors,
automated analysis and prognosis, and online optimization, offers new
capabilities for developing flexible, autonomous, scalable, efficient, and
predictable control methods. This paper presents a new decision making concept
in which a hierarchy of semantically more abstract models are utilized to
perform online scalable and predictable control. The lower semantic levels
perform localized decisions based on sampled data from the environment, while
the higher semantic levels provide more global, time invariant results based on
aggregated data from the lower levels. There is a continuous...

more |
pdf
| html
ComputerPapers:
Cities of the Future: Employing Wireless Sensor Networks for Efficient Decision Making in Complex Environments. https://t.co/Uw6gbXZ7Sx

None.

None.

Sample Sizes : None.

Authors: 9

Total Words: 14777

Unqiue Words: 3676

Existing multi-agent reinforcement learning methods are limited typically to
a small number of agents. When the agent number increases largely, the learning
becomes intractable due to the curse of the dimensionality and the exponential
growth of agent interactions. In this paper, we present Mean Field
Reinforcement Learning where the interactions within the population of agents
are approximated by those between a single agent and the average effect from
the overall population or neighboring agents; the interplay between the two
entities is mutually reinforced: the learning of the individual agent's optimal
policy depends on the dynamics of the population, while the dynamics of the
population change according to the collective patterns of the individual
policies. We develop practical mean field Q-learning and mean field
Actor-Critic algorithms and analyze the convergence of the solution to Nash
equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games
justify the learning effectiveness of our mean field...

more |
pdf
| html
None.

None.

Sample Sizes : None.

Authors: 6

Total Words: 11926

Unqiue Words: 3066

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

*Tracking 58,338 papers.*

Sort results based on if they are interesting or reproducible.

Interesting

Reproducible