Overview

A game-development course at IILM AHL (Greater Noida) required building and demonstrating AI agent behaviour inside a real-time engine. Rather than using Unity Asset Store plugins, each agent was implemented from scratch to understand the mechanics of each algorithm directly. The project covers five agents: a Q-Learning grid agent, an A* pathfinding agent with runtime graph construction, a waypoint/steering agent, a PPO-trained neural network agent, and a hybrid Imitation Learning + Reinforcement Learning self-driving car — the flagship agent.

The self-driving car was the most complex. A human drove the car to produce demonstration data; Behavioural Cloning gave the network a warm-start policy from that data. PPO then fine-tuned the warmed-up policy with a custom reward signal — positive for forward track progress, negative for boundary violations and collisions — generalising the agent beyond the recorded demos to handle unseen track sections. The trained .onnx policy runs purely in C# at game time with no Python dependency.

The Challenge

Training a car agent from random initialisation with pure RL converges slowly because the agent must randomly stumble onto the track before discovering that forward movement is rewarded. The project also imposed a constraint of modest hardware — the same laptop that had made Blender rendering painful in 2021 — ruling out long unconstrained training runs. Every algorithm also had to be self-researched and applied without formal ML coursework.

Architecture

The project is a single Unity 3D workspace with separate C# scripts per agent and shared scene objects. The Q-Learning agent persists its Q-table across episode resets using DontDestroyOnLoad on its parent GameObject. The A* agent constructs a node graph from collider positions at runtime using a min-heap priority queue and Euclidean distance heuristic, re-routing dynamically when obstacles move. The PPO and car agents use Unity ML-Agents Toolkit as the bridge between the Unity simulation and the Python training backend — the Agent base class, sensor components, and PPO/SAC trainer. Observation spaces use fan-pattern raycasts measuring distances to boundaries, plus velocity and heading. Trained policies are serialised as .onnx files and loaded back into Unity via the Sentis/Barracuda runtime for inference.

Key Decisions

Imitation Learning before RL for the car agent. Recording human driving demonstrations and pre-training with Behavioural Cloning gave the policy a prior that already roughly knew to stay on the track, dramatically accelerating PPO convergence versus cold random initialisation — a curriculum learning approach that now informs how any fine-tuning pipeline is approached.

A with runtime graph construction rather than Unity NavMesh.* NavMesh is faster and easier but hides the algorithm. Building the graph from collider positions at startup forced a real understanding of the data structures — priority queue, open/closed sets, heuristic selection — that the built-in tool abstracts away.

PPO over simpler RL algorithms. PPO's clipped surrogate objective keeps policy updates within a trust region so training does not destabilise after a bad experience batch, making it robust without requiring experience replay buffers or fine-tuned learning-rate scheduling.

No asset-store AI plugins. Every algorithm — Q-table update via the Bellman equation, A* graph construction, PPO hyperparameter tuning — was implemented and understood directly, consistent with the principle of learning by building rather than by plugging in.

Results

Five working agents across six AI/ML paradigms were delivered for the course at IILM AHL. The self-driving car agent converged to stable track navigation significantly faster than a cold-RL baseline by combining Imitation Learning warm-start with PPO fine-tuning. The project built foundational RL knowledge — observation space design, reward shaping, policy update constraints, .onnx inference integration — that now informs AI tooling and prompt/reward design across subsequent projects. The 14-year arc from a hand-coded car game in 3D RAD (2009) to a self-driving car NPC trained with reinforcement learning (2023) is one of the clearest through-lines in the portfolio narrative.

Overview

The Challenge

Architecture

Key Decisions

Results

Unity AI Agents

Overview

The Challenge

Architecture

Key Decisions

Results

Related projects

3D RAD Car Racing Game

Blender Animation Series — 2021

Custom MCP Servers

Comments

Unity AI Agents

Overview

The Challenge

Architecture

Key Decisions

Results

Related projects

3D RAD Car Racing Game

Blender Animation Series — 2021

Custom MCP Servers

Comments