Six AI paradigms, one game engine, from scratch
A game-development course at IILM AHL (Greater Noida) required building and demonstrating AI agent behaviour inside a real-time engine. Rather than using Unity Asset Store plugins, each agent was implemented from scratch to understand the mechanics of each algorithm directly. The project covers five agents: a Q-Learning grid agent, an A* pathfinding agent with runtime graph construction, a waypoint/steering agent, a PPO-trained neural network agent, and a hybrid Imitation Learning + Reinforcement Learning self-driving car — the flagship agent.
The self-driving car was the most complex. A human drove the car to produce demonstration data; Behavioural Cloning gave the network a warm-start policy from that data. PPO then fine-tuned the warmed-up policy with a custom reward signal — positive for forward track progress, negative for boundary violations and collisions — generalising the agent beyond the recorded demos to handle unseen track sections. The trained .onnx policy runs purely in C# at game time with no Python dependency.
Training a car agent from random initialisation with pure RL converges slowly because the agent must randomly stumble onto the track before discovering that forward movement is rewarded. The project also imposed a constraint of modest hardware — the same laptop that had made Blender rendering painful in 2021 — ruling out long unconstrained training runs. Every algorithm also had to be self-researched and applied without formal ML coursework.
The project is a single Unity 3D workspace with separate C# scripts per agent and shared scene objects. The Q-Learning agent persists its Q-table across episode resets using DontDestroyOnLoad on its parent GameObject. The A* agent constructs a node graph from collider positions at runtime using a min-heap priority queue and Euclidean distance heuristic, re-routing dynamically when obstacles move. The PPO and car agents use Unity ML-Agents Toolkit as the bridge between the Unity simulation and the Python training backend — the Agent base class, sensor components, and PPO/SAC trainer. Observation spaces use fan-pattern raycasts measuring distances to boundaries, plus velocity and heading. Trained policies are serialised as .onnx files and loaded back into Unity via the Sentis/Barracuda runtime for inference.
Imitation Learning before RL for the car agent. Recording human driving demonstrations and pre-training with Behavioural Cloning gave the policy a prior that already roughly knew to stay on the track, dramatically accelerating PPO convergence versus cold random initialisation — a curriculum learning approach that now informs how any fine-tuning pipeline is approached.
A with runtime graph construction rather than Unity NavMesh.* NavMesh is faster and easier but hides the algorithm. Building the graph from collider positions at startup forced a real understanding of the data structures — priority queue, open/closed sets, heuristic selection — that the built-in tool abstracts away.
PPO over simpler RL algorithms. PPO's clipped surrogate objective keeps policy updates within a trust region so training does not destabilise after a bad experience batch, making it robust without requiring experience replay buffers or fine-tuned learning-rate scheduling.
No asset-store AI plugins. Every algorithm — Q-table update via the Bellman equation, A* graph construction, PPO hyperparameter tuning — was implemented and understood directly, consistent with the principle of learning by building rather than by plugging in.
Five working agents across six AI/ML paradigms were delivered for the course at IILM AHL. The self-driving car agent converged to stable track navigation significantly faster than a cold-RL baseline by combining Imitation Learning warm-start with PPO fine-tuning. The project built foundational RL knowledge — observation space design, reward shaping, policy update constraints, .onnx inference integration — that now informs AI tooling and prompt/reward design across subsequent projects. The 14-year arc from a hand-coded car game in 3D RAD (2009) to a self-driving car NPC trained with reinforcement learning (2023) is one of the clearest through-lines in the portfolio narrative.
First game built at age 9 — skybox, water terrain, enemy AI, and the origin of the Xczer identity
Original 3D animations made from imagination in Blender during college Year 2 — no tutorials, published on Instagram
Ten tools that gave Claude Code a brain
Did this resonate?