Basic Usage

Basic Usage#

Environment objects, including agents, entities and rules, that are specified in a yaml-configfile will be loaded automatically. Using quickstart_use creates a default config-file and another one that lists all possible options of the environment. Also, it generates an initial script where an agent is executed in the environment specified by the config-file.

After initializing the environment using the specified configuration file, the script enters a reinforcement learning loop. The loop consists of episodes, where each episode involves resetting the environment, executing actions, and receiving feedback.

Here’s a breakdown of the key components in the provided script. Feel free to customize it based on your specific requirements:

  1. Initialization:

>>> path = Path('marl_factory_grid/configs/default_config.yaml')
    factory = Factory(path)
    factory = EnvMonitor(factory)
    factory = EnvRecorder(factory)
  • The path variable points to the location of your configuration file. Ensure it corresponds to the correct path.

  • Factory initializes the environment based on the provided configuration.

  • EnvMonitor and EnvRecorder are optional components. They add monitoring and recording functionalities to the environment, respectively.

  1. Reinforcement Learning Loop:

>>> for episode in trange(10):
        _ = factory.reset()
        done = False
        if render:
            factory.render()
        action_spaces = factory.action_space
        agents = []
  • The loop iterates over a specified number of episodes (in this case, 10).

  • factory.reset() resets the environment for a new episode.

  • factory.render() is used for visualization if rendering is enabled.

  • action_spaces stores the action spaces available for the agents.

  • agents will store agent-specific information during the episode.

  1. Taking Actions:

>>> while not done:
        a = [randint(0, x.n - 1) for x in action_spaces]
        obs_type, _, reward, done, info = factory.step(a)
        if render:
            factory.render()
  • Within each episode, the loop continues until the environment signals completion (done).

  • a represents a list of random actions for each agent based on their action space.

  • factory.step(a) executes the actions, returning observation types, rewards, completion status, and additional information.

  1. Handling Episode Completion:

>>> if done:
        print(f'Episode {episode} done...')
  • After each episode, a message is printed indicating its completion.

Evaluating the run#

If monitoring and recording are enabled, the environment states will be traced and recorded automatically. The EnvMonitor class acts as a wrapper for Gym environments, monitoring and logging key information during interactions, while the EnvRecorder class records state summaries during interactions in the environment. At the end of each run a plot displaying the step reward is generated. The step reward represents the cumulative sum of rewards obtained by all agents throughout the episode. Furthermore a comparative plot that shows the achieved score (step reward) over several runs with different seeds or different parameter settings can be generated using the methods provided in plotting/plot_compare_runs.py. For a more comprehensive evaluation, we recommend using the Weights and Biases (W&B) framework, with the dataframes generated by the monitor and recorder. These can be found in the run path specified in your script. W&B provides a powerful API for logging and visualizing model training metrics, enabling analysis using predefined or also custom metrics.

Indices and tables#