Embodied Robot Task Planning Training Platform RAI-P4

Embodied Robotics Task Planning Comprehensive Training Platform

RAI-P4

An all-in-one training platform for embodied interactive agents, integrating foundation models, voice, vision, robotic arms, and gimbals. Compatible with mainstream models such as DeepSeek, Qwen, and Doubao, and supports OpenCV, YOLO, and multimodal VLM workflows. It includes practical course modules covering agent design from concept to deployment, robot control, agent development, and vision integration for professional university training.

Applicable Audience/Scenarios

Ideal for university programs in artificial intelligence, robotics, automation, and computer science that span LLM applications, computer vision, machine learning, deep learning, embedded development, sensing and control, ROS, robotics, simulation, and intelligent system integration.

Highlights

Unified platform for AI speech, AI vision, and manipulator control
Desktop-friendly footprint (60 cm × 60 cm) for rapid deployment
Progressive path supporting 4-DOF through 6-DOF manipulators

Product Features

Integrated AI and robotics stack

Built around the requirements of an intelligent manipulator, the platform combines AI speech interaction, AI vision recognition, an AI edge board, and sensors such as color recognition and IMU modules—supporting the full chain from perception to decision and execution.

Ready-to-run teaching deployment

Hardware and software are pre-aligned at the factory. No extra PCs or tooling are needed; a 60 cm × 60 cm desktop is enough to launch experiments in labs, innovation studios, or mobile workshops.

Progressive manipulator training

Starts with a 4-DOF serial manipulator and scales to typical 6-DOF configurations. Complementary exercises cover kinematics, motion control, simulation, and ROS so students can advance step by step.

Lab Scenarios

Configuration

Sensor Configuration

Provides multimodal inputs required for embodied task planning, covering speech, vision, and motion feedback.

AI speech interaction microphone array
Vision pan-tilt camera module
Posture sensing IMU
Manipulator camera module

Controller Configuration

An AI edge board with open I/O delivers both LLM/vision inference and manipulator/peripheral control, ensuring tight hardware–software integration.

Software Configuration

Ships with Ubuntu and ROS2 (roscore, RViz, MoveIt), plus Jupyter, VS Code, and Python 3.9 so classes can start deploying algorithms immediately.

Compatible with mainstream AI/robotics ecosystems such as OpenCV, YOLO, LLM SDKs, and MoveIt, supporting both teaching and research.

Ubuntu / ROS / RViz / VS Code / Python software suite

Experiments

More than 40 sub-projects span manipulator control, sensing, computer vision, LLM voice dialogue, system integration, ROS, and embedded development, enabling cross-disciplinary skill building.

Manipulator control fundamentals

Manipulator kinematics control：4 class hours | Build forward/inverse kinematics and joint trajectory planning for the 4-DOF arm.
Linear interpolation control：2 class hours | Execute end-effector linear trajectories and manage velocity/acceleration profiles.
Circular interpolation control：2 class hours | Generate spatial arc trajectories while maintaining attitude control.
Stacking and handling tasks：4 class hours | Combine coordinate calibration and grasp strategies to plan multi-point handling.
Drawing geometric patterns：4 class hours | Produce planar geometric figures through custom trajectory generation.

Sensor acquisition & control

IMU data acquisition：2 class hours | Read posture sensor data, complete orientation estimation, and apply filtering.
Gesture-controlled manipulator：2 class hours | Drive the manipulator via posture sensor input for embodied interaction.

Computer vision basics (OpenCV)

Color recognition：2 class hours | Convert color spaces and segment targets with OpenCV.
Shape recognition：2 class hours | Extract contours and match geometric features to classify shapes.

AI vision (YOLO)

YOLO deployment：2 class hours | Deploy YOLO on the embedded board for real-time inference.
Face detection：2 class hours | Load pretrained weights to detect faces and output bounding boxes.
Face tracking：2 class hours | Combine the pan-tilt unit and vision feedback for dynamic face tracking.
Dataset annotation：2 class hours | Annotate detection datasets and handle format conversions.
Model training & deployment：2 class hours | Fine-tune, quantize, and deploy YOLO models.
Workpiece inspection：2 class hours | Build application-specific detection to classify and localize workpieces.

AI vision (Tongyi Qianwen multimodal)

Qwen multimodal API deployment：2 class hours | Call the Tongyi Qianwen API for image understanding and text generation.
Fruit detection & annotation：2 class hours | Use Tongyi Qianwen to recognize fruit targets and generate semantic labels.

LLM applications (AI voice dialogue)

ASR deployment：2 class hours | Configure the Tongyi Qianwen ASR service to parse voice input.
LLM semantic planner：2 class hours | Deploy DeepSeek to handle intent understanding and task planning.
TTS deployment：2 class hours | Integrate Volcano Engine TTS for natural voice responses.
End-to-end voice dialogue：2 class hours | Chain ASR, LLM, and TTS to build a full conversational loop.
Function-call voice calculator：2 class hours | Implement voice-driven calculations via LLM function calls.
Function-call music playback：2 class hours | Control music retrieval and playback through voice commands.
Function-call pan-tilt task planner：4 class hours | Use voice instructions to drive pan-tilt tracking and target search.
Function-call manipulator task planner：4 class hours | Trigger vision-based positioning and grasping through voice commands.

Robotic system integration

Socket communication：2 class hours | Build a socket channel and exchange commands between subsystems.
Vision-driven manipulator tracking：4 class hours | Map vision data to the manipulator coordinate frame for dynamic tracking.
Vision–manipulator hand–eye calibration：2 class hours | Complete hand–eye calibration to map pixels to poses.
Vision-based sorting：4 class hours | Combine perception, planning, and execution to complete sorting tasks.

ROS (Robot Operating System)

Run a ROS2 project quickly：2 class hours | Create, build, and run ROS2 workspaces.
Build and port ROS2 packages：2 class hours | Create packages, manage dependencies, and port functionality.
MoveIt configuration：2 class hours | Configure MoveIt scenes, import collision models, and validate planning.
4-DOF MoveIt/RViz simulation：2 class hours | Control the 4-DOF arm in RViz and verify trajectories.

Embedded system development

Ubuntu filesystem essentials：1 class hour | Learn common directory structures and file commands.
Editor familiarization (vi / nano)：1 class hour | Practice terminal editor basics and configuration.
Remote access setup (SSH / PuTTY)：2 class hours | Configure remote connections for collaborative development.
Linux file I/O programming：2 class hours | Implement file read/write with proper exception handling.
Serial communication：2 class hours | Exchange serial data and design simple protocols.
Process / thread management：2 class hours | Understand Linux processes and threads and write sample programs.
Interface design：2 class hours | Quickly build human–machine interfaces with Python/Qt.

Knowledge Base

Get more technical documentation, tutorials, and FAQs about this product.

View Details

Key Questions

Q3: Which products support LLM integration and what can they do?

Embodied Composite Robot System Design Training Platform RAI-M4Embodied Robot Task Planning Training Platform RAI-P4Embodied Vision Perception & Decision Training Platform RAI-Q2

Answer: There are three products that support large-model integration:

RAI-P4: integrates Qwen, DeepSeek, and Volcano Engine; supports ASR (Qwen), LLM (DeepSeek), TTS (Volcano Engine), and function calling (such as voice-dialog calculators, music playback, and gimbal / robotic arm task planning), and also supports integrated applications with YOLO, face tracking, and robotic arm control.

RAI-M4: connects to DeepSeek (LLM) and Qwen (ASR + multimodal); supports converting natural language into robot task workflows (voice commands for chassis / robotic arm control) and multimodal object detection (Qwen), combining a mecanum chassis and a 4-axis robotic arm to achieve generalized manipulation.