ML-Agents a.k.a. the Unity Machine Learning Agents Toolkit is an open-source project that enables games and simulations to serve as environments for training intelligent agents.
Since it requires a good understanding of the Unity environment, the classic Roll-a-Ball tutorial is a great first hands-on.
This document is intended as a guide to make smarter (read “add ML-Agents to”) any Unity scene and to train your agent in a basic setup. As a running example, we will use the scenario of a robotic arm that has to learn how to touch a target object while avoiding smashing against a table or knocking the object away. This scenario is strongly inspired by the Articulation Robot Demo. The linked example is based on ML-Agents Release 1. However, to be relevant as long as possible, the following guide is based on release 18 (Jun 10, 2021).
Articulation Robot Training Scene

Articulation Robot Training Scene#

Set up the environment#

Please follow the instructions here to install ML-Agent in your system.
The best was to getting started is to go through the getting started page. This document offers an overview on ML-Agents using the 3DBall example.

To test your installation:

  • In an empty project (with the right ML Agents package installed): drag & drop the folder \ml-agents\Project\Assets\ML-Agents in your Assets folder (ml-agents is the project folder that you cloned for the unity repository, be sure to checkout on the branch of your release to get compatible examples)

  • In the Unity editor navigate to \Assets\ML-Agents\Examples\3DBall\Scenes and open the scene 3DBall

  • Run the scene in Default or Inference Only mode. You can change this setting in the Behavior type drop-down menu in the Behavior Parameters component of your agent (see figure below). If this work, it will show the behavior of the already trained agent.

  • Retraining an example. Launch the training on the console (see the detailed instructions here ), then, in Default mode, click the play button. When the training is completed, check the command line to get the path to the saved model. Copy the saved model into unity (darg & drop) and attach it to the Agent’s Behavior Parameters component into the Model field.

Behavior Parameter component

Behavior Parameters component#

If all is working, congrats you did half of the work. The next step is to actually use ml-agents in your project.


Always check that all the packages that you are using are the good version for your release


We had a great deal of problems working with Visual Studio Code. An alternative that worked great for us is Jetbrains’ Rider. If this is your choice, follow the easy steps here. Independently from the IDE you choose, similar steps are usually required.

Develop (Ml-Agents)#

The following steps will guide you through the main steps required to use ml-agents in a scene.
  1. Set up the scene. Typically you will have at least one agent (e.g., a robot arm), optionally a target (e.g., a cube to touch) and the environment (e.g., a table on which the robot and the target sit). All these elements are typical Unity’s game objects. It is a good idea to have an empty game object as a common parent. This will allow later on to create a prefab and duplicate your training areas.

  2. Add the ML Agents package to the scene. The easiest way is usually via the menu Windows | Package Manager. For this guide, we used the release 18 (i.e., package 2.1.0-exp.1).

  3. Ceate the agent. Practically, this means to create a C# script and attach the script to the Agent game object. The content of the script strongly depends on what you whant to accomplish. However, commonly, you will find the following methods (you will find more information in the steps below):

    1. void Start(). Regular Unity start() method.

    2. public override void OnEpisodeBegin(). Here, your will reset the initial state of a learning episode (e.g., replace the target object in a random position on the table and reset the agent’ position)

    3. public override void CollectObservations(VectorSensor sensor). Here you will manage the observation available to your agent.

    4. public override void OnActionReceived(ActionBuffers vectorAction). Here, you will manage the possible actions available to your agent. Usually, this is also where you define some rewards.

    5. public override void Heuristic(in ActionBuffers actionsOut). This is usually used for testing. This will allow to gain manual control on your agent.

  4. Add the component Decision Requester to the Agent. This component will request a decision every certain amount that corresponds to the possibility for the agent to take actions.

  5. Set up the Actions in the Behavior Parameters component. Basically, this manages a random number generator. The Discrete Actions generates integers, the Continuous Action generates floats. It is possible to use a combination of both. These values are passed to the OnActionReceived() function and will be used to change the state of the agent (e.g., move it in the environment).

    • Configure the type of action (Discrete Vs. Continuous) by defining (per each type) the number of dimensions (i.e., numbers of rotating joints that can move simultaneously).

    • For the Discrete Actions only, define the size parameters (i.e., the min-max values for each dimension; e.g., degrees between 0-365 for rotation and 1-100 for x, y coordinates).

  6. Set up the Actions in the OnActionReceived() function. The actions received (accordingly to the configuration of the previous point) should modify the state of the agent (e.g., move it).

  7. Set up the Observations in the Behavior Parameters component. Set the Vector Observation Space Size accordingly to the variable observed by the Agent (e.g., set space size = 6, if your agent should observe/know the position in the space (coo x, y, z) of the agent and of the target).

  8. Set up the Observations in the CollectObservations(VectorSensor sensor) function. Add the observations of interest in sensor (e.g., the position of the agent and the position of the target, the distance between the two objects, etc.). Use a few observations as possible and make sure that they are as relevant as possible to the goals that you want to achieve.

  9. Set the rewards. They could be positive and negative. This is typically done in OnActionReceived(), following the actions, or whenever some events are detected (e.g., onTriggerEnter()).


For more information, examples and good practice, about Actions, Observation and Rewards check the official Agents documentation.

  1. Add (override) OnEpisodBegin(). The goal is to fix the initial conditions/reset all back to normal at the beginning of each episode (e.g., randomize the position of the target on the table).

  2. Override the Heuristic() function. This will allow to manually control the Agent by mapping inputs from the user into the actions previously defined (in other terms, this function puts the input in the discrete/continues action vector used by OnActionReceived())

That’s it! Of course, the content of these functions and configurations strongly depends on your task. But by now you should have all you need to start the training.


The training phase can be summarized in the following steps:
  1. Create a configuration file. As a starting point, you can get inspired by the file in the example: ml-agents\config\ppo\3DBall.yaml.

  2. Launch the training before in the console than by clicking the play button in the Unity editor (you did the same steps to test your installation). In cmd console, you will have something like mlagents-learn articulations-robot-demo\ur3_config.yml --run-id=RoboArm --force.

  3. Check the progress made by your agents, using the command (in another cmd console) tensorboard --logdir results --port 6006

  4. Finally, when the training is completed, ceck the command line to get the path to the saved model. Copy the saved model into unity (darg & drop) and attach it to the Agent’s Behavior Parameters component into the Model field.


If you want to resume the training (if interrupted before reaching may_steps): run mlagents-learn articulations-robot-demo\ur3_config.yml --run-id=RoboArm --resume


If you want to resume the training (if completed, i.e., after reaching may_steps), you need to increase the max_steps parameter in the configuration yml file and resume the training (see note above)