Context
I'm trying to build a social-consensus simulation involving two intelligent agents. The simulation involves a graph/network of nodes. Nearly all of these nodes (> 90%) will be green agents. The remaining nodes will consist of one red agent, one blue agent, and some number of grey agents. The red agent and blue agent will be the only intelligent agents in the simulation, and their relationship is adversarial.
I'm representing the node data structure with the following Node
class:
class Node(): def __init__(self, id, team): self.id = id self.team = team self.agent = None self.edges = [] if self.team == "green": self.agent = agents.GreenAgent() elif self.team == "blue": self.agent = agents.BlueAgent() elif self.team == "red": self.agent = agents.RedAgent()
The graph/network is instantiated to consist only of green agents. When the graph/network is instantiated, edges are added between the nodes based on some probability. So these edges represent the connections between green agents / nodes containing green agents, and they allow for interaction. These edges are undirected, and there are no loops (meaning nodes do not have edges to themselves).
I'm representing the graph/network data structure with the following Network
class:
class Network(): def __init__(self, number_of_green_nodes, probability_of_an_edge): self.number_of_green_nodes = number_of_green_nodes self.number_of_nodes_overall = number_of_green_nodes self.probability_of_an_edge = probability_of_an_edge self.green_network = [Node(id = i, team = "green") for i in range(1, number_of_green_nodes + 1, 1)] self.total_number_of_grey_nodes_inserted = 0 all_pairs_of_nodes = combinations(self.green_network, 2) for node1, node2 in all_pairs_of_nodes: if random.random() < self.probability_of_an_edge: node1.create_edge(node2)
The green agents represent ordinary people. They will have two attributes: voting
, which can be true or false, and uncertainty
, which is some number that represents how uncertain the individual is about whether they are voting (say, -1 to +1, or 0 to 1). The green agents are not intelligent, and, in each round of the simulation, when it is their turn, they simply generically interact with each other. If it is the green agents' turn to interact, then all green agents with an edge/connection interact with each other, so there is no selectivity. But, with that said, they themselves do not initiate interaction with the red, blue, or grey agents. When the green agents interact with each other, their opinion on whether or not to vote, and their uncertainty, changes based on some calculation.
My green agent class is represented partially as follows:
class GreenAgent(): def __init__(self): self.voting = random.random() < 0.5 self.uncertainty = scipy.stats.truncnorm.rvs(0, 1) def interact_with_green_agent(self, other_agent_opinion, other_agent_uncertainty): if other_agent_opinion != self.voting: if other_agent_uncertainty < self.uncertainty: self.voting = not self.voting self.uncertainty = self.uncertainty + (1 - other_agent_uncertainty/self.uncertainty)*(1 - self.uncertainty) else: if other_agent_uncertainty < self.uncertainty: self.uncertainty = self.uncertainty - (1 - other_agent_uncertainty/self.uncertainty)*(self.uncertainty - other_agent_uncertainty)
The red agent and blue agent are added to the network after the network is initialised with the green agents and their edges. The red agent and blue agent do not have an edge to each other (they do not interact with each other), but they do have an edge to every single green agent/node (so, unlike between the green agents/nodes, it is not based on some probability), so the red agent and blue agent are initially able to interact with all green nodes/agents.
The red agent and blue agent are adversaries. The red agent's goal is to convince a majority of the green agents not to vote, and to have these green agents be more certain of not voting than uncertain (based on the green agents' uncertainty
attribute). On the other hand, the blue agent's goal is to convince a majority of the green agents to vote, and to have these green agents be more certain of voting than uncertain (again, based on the green agents' uncertainty
attribute. The red/blue agents cannot have their "opinion" changed with regards to voting, so the red agent will always have attribute voting
as false, and the blue agent will always have attribute voting
as true.
When it is the red agent's turn in the simulation, the red agent can, initially, interact with all of the green agents/nodes. And, just like the case of green-green interaction, the red agent will interact with all green agents that it has an edge/connection to – there is no selectivity. When it interacts with the green agents, it can select from 5 levels of propaganda, with each level of propaganda becoming increasingly radical/extreme. The trade-off here is that, the more radical/extreme the propaganda from the red agent, the greater its uncertainty
level becomes when disseminating the propaganda, and the more likely they are to permanently alienate green agents (represented by me as the removal of an edge).
My red agent class is represented as follows:
class RedAgent(): def __init__(self): self.voting = False self.uncertainty = None def messaging(self, potency): if potency == 1: self.uncertainty = random.uniform(0.0, 0.2) elif potency == 2: self.uncertainty = random.uniform(0.2, 0.4) elif potency == 3: self.uncertainty = random.uniform(0.4, 0.6) elif potency == 4: self.uncertainty = random.uniform(0.6, 0.8) elif potency == 5: self.uncertainty = random.uniform(0.8, 1.0)
And my part of the green agent class that deals with the red agent is as follows:
def interact_with_red_agent(self, other_agent_uncertainty, this_green_agent_node, red_agent_node): if other_agent_uncertainty < 0.2: if self.voting == False and other_agent_uncertainty < self.uncertainty: self.uncertainty = self.uncertainty - (1 - other_agent_uncertainty / self.uncertainty) * ( self.uncertainty - other_agent_uncertainty) elif self.voting == True and other_agent_uncertainty < self.uncertainty: self.voting = False self.uncertainty = self.uncertainty + (1 - other_agent_uncertainty / self.uncertainty) * ( self.uncertainty - other_agent_uncertainty) if random.random() < other_agent_uncertainty/2: this_green_agent_node.remove_edge(red_agent_node) elif 0.2 < other_agent_uncertainty < 0.4: if self.voting == False and other_agent_uncertainty < self.uncertainty: self.uncertainty = self.uncertainty - (1 - other_agent_uncertainty / self.uncertainty) * ( self.uncertainty - other_agent_uncertainty) elif self.voting == True and other_agent_uncertainty < self.uncertainty: self.voting = False self.uncertainty = self.uncertainty + (1 - other_agent_uncertainty / self.uncertainty) * ( self.uncertainty - other_agent_uncertainty) if random.random() < other_agent_uncertainty/2: this_green_agent_node.remove_edge(red_agent_node) elif 0.4 < other_agent_uncertainty < 0.6: if self.voting == False and other_agent_uncertainty < self.uncertainty: self.uncertainty = self.uncertainty - (1 - other_agent_uncertainty / self.uncertainty) * ( self.uncertainty - other_agent_uncertainty) elif self.voting == True and other_agent_uncertainty < self.uncertainty: self.voting = False self.uncertainty = self.uncertainty + (1 - other_agent_uncertainty / self.uncertainty) * ( self.uncertainty - other_agent_uncertainty) if random.random() < other_agent_uncertainty/2: this_green_agent_node.remove_edge(red_agent_node) elif 0.6 < other_agent_uncertainty < 0.8: if self.voting == False and other_agent_uncertainty < self.uncertainty: self.uncertainty = self.uncertainty - (1 - other_agent_uncertainty / self.uncertainty) * ( self.uncertainty - other_agent_uncertainty) elif self.voting == True and other_agent_uncertainty < self.uncertainty: self.voting = False self.uncertainty = self.uncertainty + (1 - other_agent_uncertainty / self.uncertainty) * ( self.uncertainty - other_agent_uncertainty) if random.random() < other_agent_uncertainty/2: this_green_agent_node.remove_edge(red_agent_node) elif 0.8 < other_agent_uncertainty < 1: if self.voting == False and other_agent_uncertainty < self.uncertainty: self.uncertainty = self.uncertainty - (1 - other_agent_uncertainty / self.uncertainty) * ( self.uncertainty - other_agent_uncertainty) elif self.voting == True and other_agent_uncertainty < self.uncertainty: self.voting = False self.uncertainty = self.uncertainty + (1 - other_agent_uncertainty / self.uncertainty) * ( self.uncertainty - other_agent_uncertainty) if random.random() < other_agent_uncertainty/2: this_green_agent_node.remove_edge(red_agent_node)
When it is the blue agent's turn in the simulation, the blue agent can interact with all of the green agents/nodes. And, again, the blue agent interacts with all green agents/nodes that it has an edge/connection to – so, again, there's no selectivity. Similar to the red agent, the blue agent can select from 5 levels of potency of messaging. The trade-off, however, is that the blue agent has an "energy level", and the more potent the message the blue agent chooses during its turn, the more likely that it loses energy. And, if the blue agent loses all of its energy, then it loses the simulation/game.
Furthermore, another option that the blue agent has during its turn is to insert a grey agent into the network. The grey agent, based on some probability, can either be an ally of the blue agent, helping to convince the green agents to vote, or can actually work against the blue agent, working for the red agent in further radicalising the green agents and convincing them not to vote. Depending on whether it is on the side of the blue agent or the red agent, the grey agent's interaction with the green agents can either mimic the red agent’s interaction abilities or the blue agent’s interaction abilities. However, the difference is that the grey agent does not suffer any of the consequences of taking action that the red/blue agent does during interaction: if the grey agent is an ally of the blue agent, then it can do the same move as the blue agent, without the blue agent losing any energy, but if the grey agent ends up working for the red agent, it can do the same move as the red agent, without the red agent having the chance of alienating any green agents. The choice by the blue agent to insert a grey agent into the network, instead of disseminating messaging, will take up the turn of the blue agent, regardless of whether the grey agent proves to work for the blue agent or against it.
My blue agent class is as follows:
class BlueAgent(): def __init__(self): self.voting = True self.uncertainty = None self.energy_level = 10 def messaging(self, potency): if potency == 1: self.uncertainty = random.uniform(0.8, 1.0) if random.random() > self.uncertainty: self.energy_level -= 1 elif potency == 2: self.uncertainty = random.uniform(0.6, 0.8) if random.random() > self.uncertainty: self.energy_level -= 1 elif potency == 3: self.uncertainty = random.uniform(0.4, 0.6) if random.random() > self.uncertainty: self.energy_level -= 1 elif potency == 4: self.uncertainty = random.uniform(0.2, 0.4) if random.random() > self.uncertainty: self.energy_level -= 1 elif potency == 5: self.uncertainty = random.uniform(0.0, 0.2) if random.random() > self.uncertainty: self.energy_level -= 1
And my part of the green agent class that deals with the blue agent is as follows:
def interact_with_blue_agent(self, other_agent_uncertainty): if other_agent_uncertainty < self.uncertainty: if self.voting != True: self.voting = True self.uncertainty = self.uncertainty - (1 - other_agent_uncertainty / self.uncertainty) * ( self.uncertainty - other_agent_uncertainty)
My grey agent class is as follows:
class GreyAgent(): def __init__(self): self.spy = random.random() < 0.5 self.uncertainty = None def lifeline(self, potency): if potency == 1: self.uncertainty = random.uniform(0.8, 1.0) elif potency == 2: self.uncertainty = random.uniform(0.6, 0.8) elif potency == 3: self.uncertainty = random.uniform(0.4, 0.6) elif potency == 4: self.uncertainty = random.uniform(0.2, 0.4) elif potency == 5: self.uncertainty = random.uniform(0.0, 0.2) def misinformation(self, potency): if potency == 1: self.uncertainty = random.uniform(0.0, 0.2) elif potency == 2: self.uncertainty = random.uniform(0.2, 0.4) elif potency == 3: self.uncertainty = random.uniform(0.4, 0.6) elif potency == 4: self.uncertainty = random.uniform(0.6, 0.8) elif potency == 5: self.uncertainty = random.uniform(0.8, 1.0)
The red and blue agents have access to the number of green agents that are voting / not voting, but they do not have access to the green agents' uncertainty levels. Furthermore, the red and blue agents have access to what action the other (red/blue) agent took during their turn. And both agents also have access to the action the grey agent took.
Question
I am now trying to make the red agent and blue agent intelligent. My thought was to use reinforcement learning, whereby each agent learns from previous rounds of the simulation. In particular, I'm currently looking at applying Q-Learning. The problem is that I'm not familiar with reinforcement learning, and I'm trying to learn it by reading tutorials and going through the textbook Reinforcement Learning: An Introduction, second edition, by Sutton and Barto.
So my idea is to apply Q-Learning to train the red agent and blue agent individually. My understanding is that, in order to do this, I need to define the "environment," where the "environment" consists of the (1) "state space," (2) "actions," and (3) "rewards." The current problem is that, despite studying a ton of tutorials, I still can't figure out what my "state space" is supposed to be here (for the red agent and blue agent individually), and nor can I figure out what its Python implementation would be. What is the "state space" here, and what would its Python implementation be?
wc
.$\endgroup$