Some 20 years ago, Guth et al. (1982) invented an economic game known as
the Ultimatum Game. In this theoretical game, one player offers the second a
divison of a certain amount of points. The second player may choose to accept,
but when he rejects, neither player gets anything. For example, with 10 points
available for division and the first player offering 4 points, the second
player effectively chooses between getting 4 points (and letting player 1 have
6) or getting nothing at all (and denying player 1 any points as well). 
Since the first description of the Ultimatum Game, extensive research
has gone into studying the game and strategies human players employ while
playing the game. Research shows that, in many cultures, people often offer
‘fair’ deals of 50-50 splits and offers of less than 20% will most likely be
rejected (Heinrich et al., 2004; Oosterbeek et al., 2004). In contrast, a
game-theoretical analysis of the Ultimatum Game shows that the rational
strategy for the second player is to always accept an
offer, regardless of its fairness. If the second player rejects an offer, he
gets nothing while accepting always yields some reward (unless the offer is
100% for player 1, 0% for player 2). If the first player knows this and also maximises his gains, he will offer as little as
possible (Andreoni & Blanchard, 2006). 
In this study, we are interested in extending the Ultimatum Game by
using a Multi-Agent Systems approach and modelling knowledge acquisition and
reasoning within agents. This requires an adaptation of the game, since the original game is played by two persons who (anonymously) meet
once and then forget about each other.
First, we will study a situation in which multiple bidding and receiving
agents co-exist and interact with each other. Each interaction stands alone and
agents forget about the transaction after it is completed. In essence, this
closely models the original Ultimatum Game. Each single agent interaction is an
Ultimatum Game in the original definition.
Second, the agent simulation is adapted to introduce knowledge. In this
case, the agents remember past interactions and are able to model each other’s
behavior. We will study the impact of knowledge on agent strategies and
results.
Lastly, the value of knowledge is studied. In this final simulation, we
will introduce the possibility of buying knowledge in an auction setting.
Agents will be able to bid on pieces of knowledge acquired by others. We will
study the impact of auctioning knowledge and its effects on strategy.
In all models of the Ultimatum Game, agents repeatedly engage in
transactions in which one agents offers an amount
between 0 and 10, which the second agents either accepts or rejects. If the
second agent accepts the offer, it receives the amount it was offered and the
first agent gets the rest (that is, 10 minus the amount the second agent
received). If the second agent rejects the offer, neither agent gets anything.
Agents can be regarded as representations of negotiating parties, for example as companies who are negotiating a takeover price.
Therefore, the agents are not only interested in maximising their own gains, they are also bound by certain constraints. These can be regarded as the strategic framework in which they
need to operate, based on their company’s strategy. The constraints of
receiving agents are of the form: accept all offers larger than or equal to X,
with X being a number between 1 and 10. The constraints of bidding agents are
of the form: always make offers of at least X, with X being a number between 1
and 10. 
Each simulation consists of ten bidding agents and ten receiving agents.
During the simulation, the points that were earned by agents through successful
offers are stored, so as to compare performance
between different agents and strategies.
This game is played to determine the optimal
strategy in a regular Ultimatum Game and will serve as a baseline condition to
compare the other simulations with.
There are ten bidding agents, their bidding maximum ranging from 1 to
10. Also, there are ten receiving agents, their
minimal acceptance ranging from 1 to 10. Each bidding agent is
teamed up with each receiving agent exactly once, so that all agents are
engaged in an interaction exactly ten times. Since this experiment serves as a
baseline condition, the bidding agents are modelled in
a simple manner, bidding their maximum bid each time.
After 100 interactions, point scores can be compared.
The scores that were achieved are displayed in figures
1 and 2. 
 
 
Fig. 1: Bidding agent scores in the Repeated
Ultimatum Game
 
 
Fig. 2: Receiving agent scores in the Repeated
Ultimatum Game 
Clearly, offering a 50-50 split yields the most points for bidding
agents. This can be explained in terms of expected value: there is a 50% chance
that offering 5 will result in an accepted offer, giving an expected value of 2,5. On the other hand, a bid of 1
will be always accepted but only has an expected value of 1. Therefore, bidding
agents would do best to offer 5 (or their maximum bid
if their constraint forces them to bid lower than 5).
Inspection of the receiving agent scores shows that accepting as many
bids as possible yields the most points. This corresponds to the analysis by
Andreoni & Blanchard (2006), who stated that accepting any bid higher than 0 is the optimal strategy for receiving agents.
To improve on this simulation, a new simulation will be defined in which
agents always make offers which maximize their
expected value. 
The previous simulation is repeated, but this time the bidding agents
will maximise their expected value. This means that agents will always make a
bid as close to 5 as possible.
The results from this simulation can be viewed
in figures 3 and 4.

Fig. 3: Bidding agent scores in the Repeated
Ultimatum Game with Rational Bidding

Fig. 4: Receiving agent scores in the Repeated
Ultimatum Game with Rational Bidding 
The new bidding strategy clearly improves the scores of all bidding
agents, except for the agent with the maximal bid of 5, which was already
operating under an optimal strategy, and for agents with the maximum bids of 1
through 4, who are still forced to always bid 1, 2, 3 or 4 respectively.
Through the optimal strategy for bidding agents, the gains of receiving agents
have somwhat diminished and have totally vanished for agents who are unable to
accept a bid of 5. Clearly, any acceptance constraint
reduces expected value for receiving agents and any bidding constrant below 5 reduces expected value for bidding agents.
To make the Ultimatum Game more interesting from a Multi-Agent Systems
perspective, factors of knowledge can be added to the
system. In the original implementation, actors only meet each other once or at
least are unable to remember each other’s history. In this new extension of the
game, agents will meet each other multiple times and will remember previous offers and whether they were successful.
The first part of the simulation will be the same as in the previous
simulation, with ten bidding and ten receiving agents who all meet each other
exactly once. However, when an encounter takes place, the outcome of the
encounter instills knowledge in the offering agent about the constraint of the receiving
agent. For example:
When both agents meet each other at a later time,
a new situation arises:
|  | 6 | 7 | 8 | 
| Accept ≥6 | 4 | 3 | 2 | 
| Accept ≥7 | 0 | 3 | 2 | 
| Accept ≥8 | 0 | 0 | 2 | 
| Accept ≥9 | 0 | 0 | 0 | 
| Accept ≥10 | 0 | 0 | 0 | 
| Expected value | 0,8 | 1,2 | 1,2 | 
When both agents next meet, a new offer may yet again
be made:
|  | 6 | 7 | 8 | 
| Accept ≥6 | 4 | 3 | 2 | 
| Accept ≥7 | 0 | 3 | 2 | 
| Expected value | 2 | 3 | 2 | 
In the new simulation, there will be multiple rounds
in each of which all agents will meet each other once. In this manner,
the first round will play out just like in the second simulation with rational
bidding, but the development of knowledge will ensure greater gains in the
ensuing rounds.
The results from this simulation can be viewed
in figures 5 and 6. The figures display the scores achieved in each round
separately.

Fig. 5: Bidding agent scores in multiple rounds
with increasing knowledge

Fig. 6: Receiving agent
scores in multiple rounds with increasing knowledge
As expected, the first round in the knowledge-based simulation is exactly the same as in the rational simulation without
knowledge. After this first round, however, bidding agents have developed
knowledge about receiving agents and can adapt their strategy based on this
newfound knowledge. 
For the bidding agents with maximum offer constraints of 1 through 5, the new knowledge does not alter their gains in
any way. This can be explained as follows: all agents attempt to maximise their
expected value and will want to cast an initial bid of 5,
unless their maximum forbids this, in which case they will bid their maximum.
For receiving agents who are unable to accept any offer less than the bidding
agent’s constraining maximum, a deal can never be made despite of any knowledge
about each other’s strategy; the constraints simply don’t
match. For receiving agents whose minimum acceptable bid are equal to or lower
than the bidding agent’s maximum bid, the offer will always be acceptable.
Since the initial offer was accepted, the knowledge gained through this is that
the receiving agent’s mimimum accepted bid is at most the offered amount. This
changes nothing in the bidding agent’s strategy and its bids remain the same
during ensuing turns. His score per turn therefore also stays constant.
Bidding agents whose constraints are looser are able to profit from the
knowledge they develop, as can be seen in figure 5. These agents also initially
offer the optimal bid of 5, and find that it is
accepted by five agents and rejected by five others. As with
the tighter constrained agents, the knowledge that the
bid of 5 is accepted does not add to the expected value in ensuing
rounds. However, the knowledge that five agents reject a bid of 5 is very useful, since these bidding agents can cast bids
of more than 5. For example, the agent who is able to make bids of 6 now has the option of bidding 6 to the five agents who
previously rejected his offer. Though there is only one agent among these who
will accept a bid of 6, this does increase this
bidding agent’s points scored per round by 4. As more rounds are simulated,
more knowledge is gained and better bids can be cast by
agents with looser constraints. This causes an increase in points scored
by both bidding and receiving agents.
Building on the previous simulation, a new simulation was set up in
which agents were not only able to develop knowledge about each other, but are
also able to sell this knowledge. Agents will need to determine the value of knowledge
to make informed bids and to optimize their strategy.
The simulation consists of two rounds, in each of which all bidding agents interact with all receiving agents exactly
once. In between these rounds, a simulated auction is held.
In this auction, each bidding agent auctions the knowledge about each receiving
agent that was gathered in the first round. In this
manner, 100 pieces of knowledge are auctioned. After
the auction, the second round is run with the newly
bought knowledge to determine which strategy was most successful.
In order to make an informed bid, the agent needs to determine what the
auctioned knowledge is worth to it. This is determined by comparing the
expected value derived from a bid under the agent’s current knowledge with the
expected value of a bid when the auctioned knowledge is taken
into account. Of course, this assumes the agent will ‘forget’ about the
knowledge when his bid does not win the auction. This assumption might be a bit
far-fetched, but might seem more natural when the auction represents the sale
of a patent. If the patent is bought, the buyer
receives the right to make use of the patented knowledge.
To simplify things, an auction type is used in
which agents will want to bid their actual value. An auction type
in which this is the case, is the Vickrey auction, or second-price sealed-bid
auction (Vickrey, 1961). In this auction, all bidders submit bids without
knowledge of each other’s bids. The highest bidder wins, but pays the amount
bid by the second-highest bidder. This mechanism, though somewhat awkward to
the unexperienced bidder, ensures that a rational bidder will bid exactly his
or her actual value (Vickrey, 1961; Ausubel & Milgrom, 2006). 
To determine their bid in an auction of a piece of knowledge, the
bidding agent calculates the utility of that piece of knowledge to itself. The
utility is determined by comparing the expected value of a bid prior to having
the new knowledge with the expected value after acquiring the knowledge. Since
there is only one round after the auction, the utility of the knowledge is
equal to the increase in expected value. This increase value is the bid that is cast by the agent in the auction.
In initial testing of simulation runs, agents were
programmed to bid rationally during the first round. However, as can be
seen in the results of the second simulation run, this causes widespread
uniform bidding and therefore agents will develop largely the same knowledge
base. Therefore largely the same knowledge will be on
auction and bids will also be largely the same, as agents generally have the
same prior knowledge. To counteract this, agents were
programmed to behave as they did in the first simulation, bidding their
maximum bid during the first round. In the second round (after the auction),
agents returned to a rational bidding strategy while taking their constraint
and knowledge into account.
The results in the first round, with fixed (and possibly sub-optimal)
bidding yields the same results as the first simulation, as displayed in figure
1. After this initial round, each bidder has different knowledge and all
knowledge is auctioned. Figure 7 shows both spenditure
by each bidding agent during the auction round, as well as scores in the first
fixed-bid and second optimal-bid/knowledge-driven round.

Fig. 7: Bidding agent
spenditure and scores in a simulation including a knowledge auction
Also of interest is the net increase in score that was
achieved through the auction. This is displayed
in figure 8. 

Fig. 8: Agent profit from auction
Distributed knowledge
By letting bidding agents submit their maximal offer, each receiving
agent receives ten different offers, ranging from 1 to
10. Depending on the receiving agent’s constraints, part of these offers will be rejected and part accepted. Therefore, after the
first round, knowledge about each possible bid and its rejection or acceptance
by a certain agent is present in one of the bidding agents. Schematically:
 (1)
 (1)
Where x is a bidding agent, y is a receiving agent and a is the bid
amount (between 1 and 10). 
Therefore, following from (1), after round one the following is also
true:
 (2)
 (2)
Agent y’s
constraint is exactly known by agent x
when:
 (3)
 (3)
It follows from (2) that the knowledge to achieve the demands set by (3)
are known, but not necessarily by a single agent.
Therefore, all receiving agents’ constraints are not
necessarily known, but the distributed knowledge of their constraints
does exist (Van der Hoek
& Meyer, 2004). If the agent x
from (2) buys the knowledge from agent y
in (2) or vice versa, it is no longer only distributed knowledge, but also
actual knowledge of the buying agent.
This analysis shows that, after the first round, the distributed
knowledge present in the entire system is complete. Thus, through the auction,
an agent is able to acquire any knowledge that is of use to itself, as long as
the utility of the knowledge warrants a bid that is high enough.
Auction
As can be seen in figure 7, agents 1 through 4 are unable to buy any
piece of knowledge. This can be explained as follows:
for all agents, knowledge about agents that have a higher minimum acceptable
bid than the bidding agent’s own maximum bid is useless; any legal bid to these
agents is bound to fail. Agent 1 always bids the optimum bid of 1, regardless of knowledge. Agents 2 through 4 are
interested in knowledge about respectively receiving agents 1 through 3, but
their increase in expected value from this knowledge is not enough to win the
auction with a successful bid. On the other hand, agents 5 through 10 are
successful in various auction rounds. Agent 6 is able to profit most from
bidding on knowledge in terms of expected value and therefore has the highest
spenditure. Agent 10, though it has the highest flexibility in possible bids,
buys some knowledge but far less than the other agents.
Second round
Second round scores can be compared to first
round scores in figure 7. The possible optimal score is included to be able to
compare the actual increase in value gained through the auction, since the
bidding strategies vary between the simulated rounds. Comparing
the increase in score to the spenditure in the auction, it can be noted that agents with a high spenditure also have
a high increase in score. Agent 6 spent the most points in auction but is also awarded with the highest score in the second round.
Agents that did not buy any knowledge also have the same score in the second
round.
Figure 8 shows the actual profit from the auction per agent. Since the
first four agents did not win any bids, their profit is zero. The other agents
all profited from bidding: the difference between their optimal bids in round
one and round two, less the amount of points they paid to buy the extra
knowledge, was positive. The only exception to this is agent 8, who apparently
paid more for the knowledge than its increase in score. Even though agent 6 had
the highest score in round 2, its profit from the
auction was not highest. This honor goes to agent 9, who benifited most from
the auction round.
For all agents, the increase of knowledge also increased their reasoning
power and achieved scores. Profits did not rise for all agents, but the single
negative result was small. In the Ultimatum Game, therefore, knowledge is an
important factor in optimizing behaviour, even when it comes at a cost. As was
concluded earlier, less stringent agent constraints are also more benificial to
the agent. This is again the case here, since the agent whose maximum bid is 9 profited most from the auction round. However, the result
is not as clear-cut as in previous simualtions, where the least stringent
constraints also yielded the highest score. In this case, both high scores and
high profits are somewhat divided over agents 5 through 10, with agents 1
through 4 again as the clear losers. 
The goal of this paper was to investigate rational agent behavior under
various adaptations of the Ultimatum Game. In the course of the research and
simulation runs, some ideas were developed with regard
to rational behavior and how knowledge-factors intertwine in this. 
Most importantly, behavior was found to be optimal
when it was based on as much knowledge as possible. Knowledge, in this case, is
knowledge about constraints employed by other agents. Optimally, agents acquire
as much knowledge as possible so that they can base their decisions on a more
complete knowledge-based representation of other agents. In a simulation in
which knowledge was retained, this led to
better-performing agents after they had acquired more knowledge. In a simulation which was extended with an auction system, agents
improved their strategy by buying knowledge from each other and thereby
increasing their own expected value.
Also, simulations revealed insights into optimal
strategy with regard to imposed constraints. In most simulations (the auction
simulation being the only exception), more stringent constraints also led to
poorer performance. When agents were left with more
options to determine their bid or acceptance of a bid, this generally led to
higher performance. This corresponds to the conclusions drawn by Andreoni &
Blanchard (2006), who state that it is suboptimal to decline any offer higher
than 0. When an offer is declined, both agents score
zero points but could have scores points if the offer had been accepted. The
agents in our simulations are therefore rational agents that do not share the
suboptimal strategy that is often employed by human
players. Humans often strive for 50-50 deals and generally reject offers of
less than 20%, a suboptimal strategy (Heinrich et al., 2004; Oosterbeek et al.,
2004).
Andreoni, J. & Blanchard, E. (2006), Testing subgame perfection
apart from fairness in ultimatum games, Experimental Economics. 9 (4), 307-321.
Ausbel, L.M. & Milgrom, P. (2006), The
Lovely but Lonely Vickrey Auction. In Cramton, P., Yoav S.,
& Steinberg, R. (eds.), Combinatorial Auctions. MIT Press.
Guth, W., Schmittberger, R., & Schwarze, B. (1982), An experimental analysis of ultimatum bargaining, Journal of Economic Behavior &
Organization 3 (4), 367-388. 
Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., & Gintis, H.
(2004), Foundations of Human Sociality: Economic Experiments and
Ethnographic Evidence from Fifteen Small-Scale Societies. 
Van der Hoek, W. & Meyer, J.-J. 
Oosterbeek, H., Sloof, R. & Van de Kuilen, G. (2004), Differences in
Ultimatum Game Experiments: Evidence from a Meta-Analysis. Experimental
Economics 7, 171-188.
Vickrey, W. (1961), Counterspeculation, Auctions, and
Competitive Sealed Tenders. Journal of Finance 16, 8-37.