==============================
ML-Agents
==============================

Summary
=========
| ``ML-Agents`` a.k.a. the `Unity Machine Learning Agents Toolkit <https://github.com/Unity-Technologies/ml-agents>`_ is an open-source project that enables games and simulations to serve as environments for training intelligent agents.
| Since it requires a good understanding of the Unity environment, the classic `Roll-a-Ball <https://learn.unity.com/project/roll-a-ball>`_ tutorial is a great first hands-on.
| This document is intended as a guide to make *smarter* (read "add ML-Agents to") any Unity scene and to train your agent in a basic setup. As a running example, we will use the scenario of a robotic arm that has to learn how to touch a target object while avoiding smashing against a table or knocking the object away. This scenario is strongly inspired by the `Articulation Robot Demo <https://github.com/Unity-Technologies/articulations-robot-demo/tree/mlagents>`_. The linked example is based on ML-Agents Release 1. However, to be relevant as long as possible, the following guide is based on release 18 (Jun 10, 2021). 

.. figure:: img/ml_agent_scene.*
   :align: center
   :alt: Articulation Robot Training Scene
   :width: 50%

   Articulation Robot Training Scene


Set up the environment
======================

General case
------------

| Please follow the instructions `here <https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Installation.md>`__ to install ML-Agent in your system. 
| The best was to *getting started* is to go through the `getting started <https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Getting-Started.md>`_ page. This document offers an overview on ML-Agents using the 3DBall example.

To test your installation:

* In an empty project (with the right ML Agents package installed): *drag & drop* the folder ``\ml-agents\Project\Assets\ML-Agents`` in your ``Assets`` folder (ml-agents is the project folder that you cloned for the unity repository, be sure to checkout on the branch of your release to get compatible examples)
* In the Unity editor navigate to ``\Assets\ML-Agents\Examples\3DBall\Scenes`` and open the scene ``3DBall``
* Run the scene in ``Default`` or ``Inference Only`` mode. You can change this setting in the ``Behavior type`` drop-down menu in the ``Behavior Parameters`` component of your agent (see figure below). If this work, it will show the behavior of the already trained agent.
* Retraining an example. Launch the training on the console (see the detailed instructions  `here <https://github.com/Unity-Technologies/ml-agents/blob/release_19_docs/docs/Getting-Started.md#training-the-environment>`__ ), then,  in ``Default`` mode, click the play button. When the training is completed, check the command line to get the path to the saved model. Copy the saved model into unity (darg & drop) and attach it to the Agent's ``Behavior Parameters`` component into the ``Model`` field.


.. figure:: img/ml_agent_behavior_param.*
   :align: center
   :alt: Behavior Parameter component
   :width: 100%

   Behavior Parameters component

If all is working, congrats you did half of the work. The next step is to actually use ml-agents in your project.

.. warning::
   Always check that all the packages that you are using are the good version for your `release <https://github.com/Unity-Technologies/ml-agents/releases>`_

.. warning::
   We had a great deal of problems working with Visual Studio Code. An alternative that worked great for us is `Jetbrains' Rider <https://www.jetbrains.com/rider/>`_. If this is your choice, follow the easy steps `here <https://www.jetbrains.com/help/rider/Unity.html>`_. Independently from the IDE you choose, similar steps are usually required.

Fixing the potential problems
-----------------------------

When setting up the environment with Ubuntu 22.04 on a gen 8 HP Zbook we ran on multiple dependency and compatibility problems. In the next two sections we expose how we managed to fix them.

.. warning::

   It is strongly advised to install everything in a Python virtual environment to avoid messing with your Python installation.

Working with python>=3.10.12
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Ubuntu 22.04 comes with a more recent version of Python than those Ml-agents supports. The workaround is to clone Ml-agents' repository and to modify the ``setup.py`` files to install the Python module manually.

First, clone the repository:

.. prompt:: bash

   git clone https://github.com/Unity-Technologies/ml-agents.git
   cd ml-agents/

Then, in ``ml-agents/ml-agents-envs/setup.py``, modify the line 62 from

.. code-block::

   python_requires=">=3.8.13,<=3.10.12",

to 

.. code-block::

   python_requires=">=3.8.13,<=3.11",

and remove line 59

.. code-block::

   "numpy==1.21.2",

and in ``ml-agents/ml-agents/setup.py``, modify the line 83 from

.. code-block::

   python_requires=">=3.8.13,<=3.10.12",

to 

.. code-block::

   python_requires=">=3.8.13,<=3.11",

When this is done, you can install ``ml-agents`` Python module with the following commands

.. prompt:: bash

   pip install -e ./ml-agents-envs
   pip install -e ./ml-agents

.. note::

   Pay attention to the order of the pip install commands, ``ml-agents-envs`` must be installed before ``ml-agents``.

PyTorch with a recent Nvidia GPU
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Sadly, ``ml-agents`` requires an old version (1.11) of PyTorch which may not work with a recent Nvidia GPU (starting from AMPERE architecture). You may encounter a similar message when trying to train a model:

.. prompt:: bash

   NVIDIA GeForce RTX A2000 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
   If you want to use the NVIDIA GeForce RTX A2000 GPU with PyTorch.

The solution is to specify the CUDA architecture in your environment before recompiling PyTorch 1.11.

First, clone pytorch on your disk and change to the right branch:

.. prompt:: bash

   git clone git clone https://github.com/pytorch/pytorch.git
   cd pytorch/
   git checkout release/1.11
   git submodule update --init --recursive

and specify the CUDA architecture in your environement:

.. prompt:: bash

   export TORCH_CUDA_ARCH_LIST="8.6"

To recompile PyTorch, you will also need to fix an error in a submodule. In ``pytorch/third_party/breakpad/src/client/linux/handler/exception_handler.cc`` change line 144 from

.. code-block::

   static const unsigned kSigStackSize = std::max(16384, SIGSTKSZ);

to

.. code-block::

   static const unsigned kSigStackSize = std::max(16384, int(SIGSTKSZ));

Then you can compile and install PyTorch with the following command

.. prompt:: bash

   python setup.py develop 

Develop (Ml-Agents)
====================
| The following steps will guide you through the main steps required to use ml-agents in a scene. 

#. **Set up the scene**. Typically you will have at least one agent (e.g., a robot arm), optionally a target (e.g., a cube to touch) and the environment (e.g., a table on which the robot and the target sit). All these elements are typical Unity's game objects. It is a good idea to have an empty game object as a common parent. This will allow later on to create a prefab and duplicate your training areas.
#. **Add the ML Agents package to the scene**. The easiest way is usually via the menu ``Windows | Package Manager``. For this guide, we used the release 18 (i.e., package 2.1.0-exp.1).
#. **Ceate the agent**. Practically, this means to create a C# script and attach the script to the Agent game object. The content of the script strongly depends on what you whant to accomplish. However, commonly, you will find the following methods (you will find more information in the steps below):

   a. ``void Start()``. Regular Unity *start()* method.
   #. ``public override void OnEpisodeBegin()``. Here, your will reset the initial state of a learning episode (e.g., replace the target object in a random position on the table and reset the agent' position)
   #. ``public override void CollectObservations(VectorSensor sensor)``. Here you will manage the observation available to your agent.
   #. ``public override void OnActionReceived(ActionBuffers vectorAction)``. Here, you will manage the possible actions available to your agent. Usually, this is also where you define some rewards.
   #. ``public override void Heuristic(in ActionBuffers actionsOut)``. This is usually used for testing. This will allow to gain manual control on your agent.

#. **Add the component Decision Requester to the Agent**. This component will request a *decision* every certain amount that corresponds to the possibility for the agent to take actions.
#. **Set up the Actions in the Behavior Parameters component**. Basically, this manages a random number generator. The ``Discrete Actions`` generates integers, the ``Continuous Action`` generates floats. It is possible to use a combination of both. These values are passed to the ``OnActionReceived()`` function and will be used to change the state of the agent (e.g., move it in the environment).

   * Configure the type of action (Discrete Vs. Continuous) by defining (per each type) the number of dimensions (i.e., numbers of rotating joints that can move simultaneously).
   * For the Discrete Actions only, define the size parameters (i.e., the min-max values for each dimension; e.g., degrees between 0-365 for rotation and 1-100 for x, y coordinates).

#. **Set up the Actions in the OnActionReceived() function**. The actions received (accordingly to the configuration of the previous point) should modify the state of the agent (e.g., move it).
#. **Set up the Observations in the Behavior Parameters component**. Set the ``Vector Observation Space Size`` accordingly to the variable observed by the Agent (e.g., set space size = 6, if your agent should observe/know the position in the space (coo x, y, z) of the agent and of the target).
#. **Set up the Observations in the CollectObservations(VectorSensor sensor) function**. Add the observations of interest in sensor (e.g., the position of the agent and the position of the target, the distance between the two objects, etc.). Use a few observations as possible and make sure that they are as relevant as possible to the goals that you want to achieve.
#. **Set the rewards**. They could be positive and negative. This is typically done in OnActionReceived(), following the actions, or whenever some events are detected (e.g., ``onTriggerEnter()``).

.. note::
   For more information, examples and good practice, about **Actions**, **Observation** and **Rewards** check the official `Agents <https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Design-Agents.md>`_ documentation.

10. **Add (override) OnEpisodBegin()**. The goal is to fix the initial conditions/reset all back to normal at the beginning of each episode (e.g., randomize the position of the target on the table).
#. **Override the Heuristic() function**. This will allow to manually control the Agent by mapping inputs from the user into the actions previously defined (in other terms, this function puts the input in the discrete/continues action vector used by OnActionReceived())

| That's it! Of course, the content of these functions and configurations strongly depends on your task. But by now you should have all you need to start the training.

Training
===================
| The training phase can be summarized in the following steps:

#. Create a configuration file. As a starting point, you can get inspired by the file in the example: ``ml-agents\config\ppo\3DBall.yaml``.
#. Launch the training before in the console than by clicking the play button in the Unity editor (you did the same steps to test your installation). In cmd console, you will have something like ``mlagents-learn articulations-robot-demo\ur3_config.yml --run-id=RoboArm --force``.
#. Check the progress made by your agents, using the command (in another cmd console) ``tensorboard --logdir results --port 6006``
#. Finally, when the training is completed, ceck the command line to get the path to the saved model. Copy the saved model into unity (darg & drop) and attach it to the Agent's ``Behavior Parameters`` component into the ``Model`` field.

.. note::
   If you want to resume the training (if interrupted before reaching may_steps): run ``mlagents-learn articulations-robot-demo\ur3_config.yml --run-id=RoboArm --resume``

.. warning::
   If you want to resume the training (if completed, i.e., after reaching may_steps), you need to increase the ``max_steps`` parameter in the configuration yml file and resume the training (see note above)