ML-Agentsa.k.a. the Unity Machine Learning Agents Toolkit is an open-source project that enables games and simulations to serve as environments for training intelligent agents.
Set up the environment#
To test your installation:
In an empty project (with the right ML Agents package installed): drag & drop the folder
Assetsfolder (ml-agents is the project folder that you cloned for the unity repository, be sure to checkout on the branch of your release to get compatible examples)
In the Unity editor navigate to
\Assets\ML-Agents\Examples\3DBall\Scenesand open the scene
Run the scene in
Inference Onlymode. You can change this setting in the
Behavior typedrop-down menu in the
Behavior Parameterscomponent of your agent (see figure below). If this work, it will show the behavior of the already trained agent.
Retraining an example. Launch the training on the console (see the detailed instructions here ), then, in
Defaultmode, click the play button. When the training is completed, check the command line to get the path to the saved model. Copy the saved model into unity (darg & drop) and attach it to the Agent’s
Behavior Parameterscomponent into the
If all is working, congrats you did half of the work. The next step is to actually use ml-agents in your project.
Always check that all the packages that you are using are the good version for your release
Set up the scene. Typically you will have at least one agent (e.g., a robot arm), optionally a target (e.g., a cube to touch) and the environment (e.g., a table on which the robot and the target sit). All these elements are typical Unity’s game objects. It is a good idea to have an empty game object as a common parent. This will allow later on to create a prefab and duplicate your training areas.
Add the ML Agents package to the scene. The easiest way is usually via the menu
Windows | Package Manager. For this guide, we used the release 18 (i.e., package 2.1.0-exp.1).
Ceate the agent. Practically, this means to create a C# script and attach the script to the Agent game object. The content of the script strongly depends on what you whant to accomplish. However, commonly, you will find the following methods (you will find more information in the steps below):
void Start(). Regular Unity start() method.
public override void OnEpisodeBegin(). Here, your will reset the initial state of a learning episode (e.g., replace the target object in a random position on the table and reset the agent’ position)
public override void CollectObservations(VectorSensor sensor). Here you will manage the observation available to your agent.
public override void OnActionReceived(ActionBuffers vectorAction). Here, you will manage the possible actions available to your agent. Usually, this is also where you define some rewards.
public override void Heuristic(in ActionBuffers actionsOut). This is usually used for testing. This will allow to gain manual control on your agent.
Add the component Decision Requester to the Agent. This component will request a decision every certain amount that corresponds to the possibility for the agent to take actions.
Set up the Actions in the Behavior Parameters component. Basically, this manages a random number generator. The
Discrete Actionsgenerates integers, the
Continuous Actiongenerates floats. It is possible to use a combination of both. These values are passed to the
OnActionReceived()function and will be used to change the state of the agent (e.g., move it in the environment).
Configure the type of action (Discrete Vs. Continuous) by defining (per each type) the number of dimensions (i.e., numbers of rotating joints that can move simultaneously).
For the Discrete Actions only, define the size parameters (i.e., the min-max values for each dimension; e.g., degrees between 0-365 for rotation and 1-100 for x, y coordinates).
Set up the Actions in the OnActionReceived() function. The actions received (accordingly to the configuration of the previous point) should modify the state of the agent (e.g., move it).
Set up the Observations in the Behavior Parameters component. Set the
Vector Observation Space Sizeaccordingly to the variable observed by the Agent (e.g., set space size = 6, if your agent should observe/know the position in the space (coo x, y, z) of the agent and of the target).
Set up the Observations in the CollectObservations(VectorSensor sensor) function. Add the observations of interest in sensor (e.g., the position of the agent and the position of the target, the distance between the two objects, etc.). Use a few observations as possible and make sure that they are as relevant as possible to the goals that you want to achieve.
Set the rewards. They could be positive and negative. This is typically done in OnActionReceived(), following the actions, or whenever some events are detected (e.g.,
For more information, examples and good practice, about Actions, Observation and Rewards check the official Agents documentation.
Add (override) OnEpisodBegin(). The goal is to fix the initial conditions/reset all back to normal at the beginning of each episode (e.g., randomize the position of the target on the table).
Override the Heuristic() function. This will allow to manually control the Agent by mapping inputs from the user into the actions previously defined (in other terms, this function puts the input in the discrete/continues action vector used by OnActionReceived())
Create a configuration file. As a starting point, you can get inspired by the file in the example:
Launch the training before in the console than by clicking the play button in the Unity editor (you did the same steps to test your installation). In cmd console, you will have something like
mlagents-learn articulations-robot-demo\ur3_config.yml --run-id=RoboArm --force.
Check the progress made by your agents, using the command (in another cmd console)
tensorboard --logdir results --port 6006
Finally, when the training is completed, ceck the command line to get the path to the saved model. Copy the saved model into unity (darg & drop) and attach it to the Agent’s
Behavior Parameterscomponent into the
If you want to resume the training (if interrupted before reaching may_steps): run
mlagents-learn articulations-robot-demo\ur3_config.yml --run-id=RoboArm --resume
If you want to resume the training (if completed, i.e., after reaching may_steps), you need to increase the
max_steps parameter in the configuration yml file and resume the training (see note above)