Basic Usage#
Environment objects, including agents, entities and rules, that are specified in a yaml-configfile will be loaded automatically.
Using quickstart_use creates a default config-file and another one that lists all possible options of the environment.
Also, it generates an initial script where an agent is executed in the environment specified by the config-file.
After initializing the environment using the specified configuration file, the script enters a reinforcement learning loop. The loop consists of episodes, where each episode involves resetting the environment, executing actions, and receiving feedback.
Here’s a breakdown of the key components in the provided script. Feel free to customize it based on your specific requirements:
Initialization:
>>> path = Path('marl_factory_grid/configs/default_config.yaml')
factory = Factory(path)
factory = EnvMonitor(factory)
factory = EnvRecorder(factory)
The path variable points to the location of your configuration file. Ensure it corresponds to the correct path.
Factory initializes the environment based on the provided configuration.
EnvMonitor and EnvRecorder are optional components. They add monitoring and recording functionalities to the environment, respectively.
Reinforcement Learning Loop:
>>> for episode in trange(10):
_ = factory.reset()
done = False
if render:
factory.render()
action_spaces = factory.action_space
agents = []
The loop iterates over a specified number of episodes (in this case, 10).
factory.reset() resets the environment for a new episode.
factory.render() is used for visualization if rendering is enabled.
action_spaces stores the action spaces available for the agents.
agents will store agent-specific information during the episode.
Taking Actions:
>>> while not done:
a = [randint(0, x.n - 1) for x in action_spaces]
obs_type, _, reward, done, info = factory.step(a)
if render:
factory.render()
Within each episode, the loop continues until the environment signals completion (done).
a represents a list of random actions for each agent based on their action space.
factory.step(a) executes the actions, returning observation types, rewards, completion status, and additional information.
Handling Episode Completion:
>>> if done:
print(f'Episode {episode} done...')
After each episode, a message is printed indicating its completion.
Evaluating the run#
If monitoring and recording are enabled, the environment states will be traced and recorded automatically. The EnvMonitor class acts as a wrapper for Gym environments, monitoring and logging key information during interactions, while the EnvRecorder class records state summaries during interactions in the environment. At the end of each run a plot displaying the step reward is generated. The step reward represents the cumulative sum of rewards obtained by all agents throughout the episode. Furthermore a comparative plot that shows the achieved score (step reward) over several runs with different seeds or different parameter settings can be generated using the methods provided in plotting/plot_compare_runs.py. For a more comprehensive evaluation, we recommend using the Weights and Biases (W&B) framework, with the dataframes generated by the monitor and recorder. These can be found in the run path specified in your script. W&B provides a powerful API for logging and visualizing model training metrics, enabling analysis using predefined or also custom metrics.