Skip to main content
Control

RL-Based Low-Level Motor Control: Beyond PID/ADRC

In the domain of 1kHz real-time motor control that LLMs cannot replace, we introduce WIM's approach to replacing traditional PID/ADRC manual tuning with reinforcement learning (RL).

WRWIM Robotics Team
·
roboticsreinforcement-learningmotor-controlsim-to-realreal-time

RL-Based Low-Level Motor Control: Beyond PID/ADRC

"Can't you just build a robot controller with an LLM?"

With the recent advances in AI code generation tools like Claude Code and Cursor, we get this question a lot. The short answer is: the domains LLMs can and cannot handle are clearly distinct. This article explains how WIM applies reinforcement learning (RL) in the low-level control domain that LLMs can never replace.

Bottom Line Up Front

CategoryLLM / Code Generation ToolsWIM RL Motor Controller
Control Cycle500--2,000ms1ms (1kHz)
OutputText / CodeMotor torque/velocity/position commands
New Robot AdaptationRewrite codeRe-train policy (automated)
Adaptation MethodModify promptsSim-to-Real training
Safety MechanismNonePID/ADRC fallback + Safety Layer

The key point: LLMs generate text at 500ms intervals. Motor control must output torque at 1ms intervals. LLMs cannot replace a time domain that differs by a factor of 1,000.

The Hierarchical Structure of Robot Control

Robot control is not a single layer. There is a clear hierarchy based on time scale.

The top two layers (natural language interface, high-level control policy) are already being actively developed by big tech. Frameworks like Google's Gemini Robotics, NVIDIA's Isaac ROS, and MoveIt2 address this domain.

WIM focuses on what lies beneath -- the low-level layer that directly controls motors at 1kHz.

Limitations of Traditional Approaches: Manual PID/ADRC Tuning

Motor control in industrial robots has traditionally relied on PID or ADRC (Active Disturbance Rejection Control) controllers.

τPID=Kpe(t)+Kie(t)dt+Kde˙(t)\tau_{PID} = K_p \cdot e(t) + K_i \int e(t) \, dt + K_d \cdot \dot{e}(t)

Here, KpK_p, KiK_i, and KdK_d are the proportional, integral, and derivative gains, respectively. The problem is that these gain values must be manually tuned for each robot, each joint, and each load condition.

ProblemDescription
Manual Tuning CostEngineers spend days to weeks per robot
Nonlinear DynamicsLinear controllers cannot fully compensate for friction, backlash, gravity, etc.
Multi-Model SupportTuning must restart from scratch when switching robot models
Environmental ChangesPerformance degrades with changes in load, temperature, and wear

In our previous work on acceleration feedforward optimization, we had to compare six different methods just to improve a single controller parameter. Automating this manual process is the core motivation for adopting RL.

WIM's Approach: Direct Motor Control with RL

WIM replaces PID/ADRC with an RL-based neural network. A policy trained in simulation directly outputs motor torque on real robots.

WIMPACK Architecture

Sim-to-Real Training Pipeline

The key is that simulation training and real robot validation proceed simultaneously. The Sim-to-Real gap is continuously measured, and the simulator is calibrated using feedback from the real robot.

PID/ADRC Retained as a Safety Fallback

While the RL policy serves as the primary controller, PID/ADRC is not entirely removed. A safety fallback structure is maintained that immediately switches to the classical controller upon anomaly detection.

This structure is important for compliance with industrial robot safety standards (ISO 10218). While certification may be difficult with a neural network controller alone, having a proven classical controller as a fallback makes it significantly easier to demonstrate safety.

WIMPACK Software Stack

The RL controller does not operate in isolation. It runs on a full-stack software platform that includes real-time communication, hardware abstraction, and safety layers.

WIMPACK Software Stack

LayerComponentsRole
ApplicationSample Code, SDKUser applications
IntelligenceAI CodeLLM integration, natural language command processing
Control & ExecutionRobot Control CodeRL Motor Controller + PID Fallback
Framework & PlatformROS2, Robot HAL, Communication (EtherCAT/CAN), Sensor DriverMiddleware and hardware abstraction
Operating SystemCustom RT Kernel, RTOSReal-time guarantees
Physical HardwareSoC (Jetson Orin AGX)Compute platform

On top of the SoC (Jetson Orin AGX), the RTOS guarantees real-time performance, ROS2/HAL abstracts the hardware, and the RL controller performs inference at 1kHz. Everything runs within a single embedded device.

Why LLMs and Code Generation Tools Cannot Replace This

Let us address this question head-on.

1. The Time Domain Gap

LLMs generate text token by token. Even the fastest LLMs take hundreds of milliseconds to produce their first response. Motor control must compute a new torque command every 1ms. This gap is a fundamental architectural limitation that cannot be solved by reducing model size.

2. The Nature of the Output Is Fundamentally Different

The output of an LLM is code (text). The output of an RL motor controller is neural network weights trained over hundreds of thousands of episodes in simulation. It is not a matter of "writing better code" -- it is knowledge acquired through interaction with physical environments.

3. Scalability Across Robot Models

When building controllers with code generation tools, the code must be rewritten every time the robot changes. With an RL-based approach, you simply load the new robot model in the simulator and re-train. As the number of robot models grows, this difference scales exponentially.

Key Takeaways

  • High-level control (natural language to commands, task planning) is big tech's domain, with LLMs and VLAs advancing rapidly
  • Low-level motor control (1kHz real-time torque output) is a domain that LLMs cannot replace
  • WIM replaces PID/ADRC with RL at the low-level control layer, while retaining classical controllers as a fallback for safety
  • A Sim-to-Real pipeline runs simulation training and real robot validation in parallel
  • When the robot model changes, automated policy re-training handles adaptation -- a platform that eliminates manual tuning

References