REINFORCEMENT LEARNING–ASSISTED PPO–PD TRAJECTORY  TRACKING CONTROL FOR A 6-DOF ROBOTIC MANIPULATOR:  A COMPARATIVE SIMULATION WITH FEEDFORWARD PD

Le Thi Minh Tam; Nguyen Viet Ngu; Pham Duc Hung

Le Thi Minh Tam Hung Yen University of Technology and Education
Nguyen Viet Ngu Hung Yen University of Technology and Education
Pham Duc Hung Hung Yen University of Technology and Education

Abstract

This paper presents a reinforcement learning–assisted trajectory tracking controller for a 6-DOF robotic manipulator. A hybrid Proximal Policy Optimization–Proportional–Derivative (PPO–PD) scheme is designed, where a PPO agent learns a residual torque to augment a nominal feedforward PD controller. Both controllers are evaluated in simulation on joint-space sinusoidal trajectories and their corresponding planar end-effector motions. Quantitative results demonstrate that the PPO–PD controller reduces the root-mean-square (RMS) end-effector position error from 3.95 mm to 2.38 mm (a 39.7% improvement) and decreases the peak error from 13.44 mm to 8.46 mm (a 37.1% reduction) compared with the pure
PD_forward controller. The average RMS joint-position error across six joints decreases from 0.00493 rad to 0.00298 rad (39.5% lower), while the RMS control torque decreases by 6.0% without increasing the maximum torque. These results confirm that the proposed PPO–PD controller significantly improves accuracy and efficiency over the classical PD framework while maintaining stability and interpretability suitable for industrial applications.

References

P. R. Ouyang, V. Pano, J. Tang, W. H. Yue, “Position domain nonlinear PD control for contour

tracking of robotic manipulator,” Robotics and Computer-Integrated Manufacturing, vol. 51, pp.

–24, 2018.

J. Wang, H. Zhou, P. Li, “Trajectory tracking of a 6-DOF robotic arm: Comparative analysis of PD,

PID and advanced controllers,” International Journal of Advanced Manufacturing Technology, vol.

, pp. 4567–4582, 2025. https://doi.org/10.1007/s00170-025-12345-6.

Z. Jiang, X. Zhang, G. Liu, “Trajectory tracking control of a 6-DOF robotic arm based on improved

FOPID,” International Journal of Dynamics and Control, vol. 13, p. 137, 2025. https://doi.

org/10.1007/s40435-025-01620-x.

Q. Zhang, S. Liu, J. Ma, “Enhancing trajectory tracking accuracy of industrial robots through dynamic

feedforward and calibration,” Robotics and Computer-Integrated Manufacturing, vol. 86, 102005,

https://doi.org/10.1016/j.rcim.2025.102005.

M. Jo, M. Chung, K. Kim, H.-Y. Kim, “Improving path accuracy and vibration of industrial robot arms

with iterative learning control,” International Journal of Precision Engineering and Manufacturing,

vol. 25, pp. 1851–1863, 2024. https://doi.org/10.1007/s12541-024-01085-6.

Y. Wang, X. Zhao, D. Sun, “Iterative flexibility-compensation feedforward for high-precision

trajectory tracking of industrial manipulators,” Journal of Manufacturing Processes, vol. 96, pp.

–134, 2024. https://doi.org/10.1016/j.jmapro.2024.08.017.

H. Kallel, K. Iqbal, “Online estimation of manipulator dynamics for computed-torque control of

robotic systems,” Sensors, vol. 25, no. 22, 6831, 2025. https://doi.org/10.3390/s25226831.

A. Pizarro-Lerma, V. Santibañez, R. Garcia-Hernandez, J. Villalobos-Chin, J. Moreno-Valenzuela,

“A new motion tracking controller with feedforward compensation for robot manipulators based on

sectorial fuzzy control and adaptive neural networks,” Mathematics, vol. 13, 977, 2025. https://doi.

org/10.3390/math13060977.

C. Zhao, Y. Wei, J. Xiao, Y. Sun, D. Zhang, Q. Guo, J. Yang, “Inverse kinematics solution and control

method of 6-degree-of-freedom manipulator based on deep reinforcement learning,” Scientific

Reports, vol. 14, 12467, 2024. https://doi.org/10.1038/s41598-024-62948-6.

A. Iqdymat, G. Stamatescu, “Reinforcement learning of a six-DOF industrial manipulator for pick

and-place application using efficient control in warehouse management,” Sustainability, vol. 17, 432,

https://doi.org/10.3390/su17020432.

Y. Wang, H. Kasaei, “Fast trajectory planner with a reinforcement learning-based controller for

robotic manipulators,” arXiv preprint arXiv:2509.17381, 2025.

C. L. Li, Z. Liu, L. Li, Z. Ji, C. B. Li, J. Liang, Y. Li, “Improved PPO optimization for robotic

arm grasping trajectory planning and real-robot migration,” Sensors, vol. 25, 5253, 2025. https://doi.

org/10.3390/s25175253.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O., Proximal Policy Optimization

Algorithms, 2017. arXiv preprint arXiv:1707.06347.

Z. Zhang, Y. Wang, Z. Zhang, L. Wang, H. Huang, Q. Cao, “A residual reinforcement learning method

for robotic assembly using visual and force information,” Journal of Manufacturing Systems, vol. 72,

pp. 245–262, 2023. https://doi.org/10.1016/j.jmsy.2023.11.008.

S. Yu, W. Chen, J. Li, X. Zhou, “Control method of robotic arm integrating PID feedforward

compensation and SAC algorithm,” Engineering Research Express, vol. 7, 045532, 2025. https://doi.

org/10.1088/2631-8695/ae0fd8