Embodied Robot Task Planning Training Platform RAI-P4

Embodied Robot Task Planning Training Platform RAI-P4

Embodied Robotics Task Planning Comprehensive Training Platform
RAI-P4

An all-in-one training platform for embodied interactive agents, integrating foundation models, voice, vision, robotic arms, and gimbals. Compatible with mainstream models such as DeepSeek, Qwen, and Doubao, and supports OpenCV, YOLO, and multimodal VLM workflows. It includes practical course modules covering agent design from concept to deployment, robot control, agent development, and vision integration for professional university training.

Applicable Audience/Scenarios

Ideal for university programs in artificial intelligence, robotics, automation, and computer science that span LLM applications, computer vision, machine learning, deep learning, embedded development, sensing and control, ROS, robotics, simulation, and intelligent system integration.

Highlights

  • Unified platform for AI speech, AI vision, and manipulator control
  • Desktop-friendly footprint (60 cm × 60 cm) for rapid deployment
  • Progressive path supporting 4-DOF through 6-DOF manipulators

Product Features

Integrated AI and robotics stack

Built around the requirements of an intelligent manipulator, the platform combines AI speech interaction, AI vision recognition, an AI edge board, and sensors such as color recognition and IMU modules—supporting the full chain from perception to decision and execution.

Ready-to-run teaching deployment

Hardware and software are pre-aligned at the factory. No extra PCs or tooling are needed; a 60 cm × 60 cm desktop is enough to launch experiments in labs, innovation studios, or mobile workshops.

Progressive manipulator training

Starts with a 4-DOF serial manipulator and scales to typical 6-DOF configurations. Complementary exercises cover kinematics, motion control, simulation, and ROS so students can advance step by step.

Lab Scenarios

Lab Scenarios

Configuration

Sensor Configuration

Provides multimodal inputs required for embodied task planning, covering speech, vision, and motion feedback.

  • AI speech interaction microphone array
  • Vision pan-tilt camera module
  • Posture sensing IMU
  • Manipulator camera module

Controller Configuration

An AI edge board with open I/O delivers both LLM/vision inference and manipulator/peripheral control, ensuring tight hardware–software integration.

Controller Overview

Software Configuration

Ships with Ubuntu and ROS2 (roscore, RViz, MoveIt), plus Jupyter, VS Code, and Python 3.9 so classes can start deploying algorithms immediately.

Compatible with mainstream AI/robotics ecosystems such as OpenCV, YOLO, LLM SDKs, and MoveIt, supporting both teaching and research.

Ubuntu / ROS / RViz / VS Code / Python software suite

Experiments

More than 40 sub-projects span manipulator control, sensing, computer vision, LLM voice dialogue, system integration, ROS, and embedded development, enabling cross-disciplinary skill building.

Manipulator control fundamentals

  • Manipulator kinematics control4 class hours | Build forward/inverse kinematics and joint trajectory planning for the 4-DOF arm.
  • Linear interpolation control2 class hours | Execute end-effector linear trajectories and manage velocity/acceleration profiles.
  • Circular interpolation control2 class hours | Generate spatial arc trajectories while maintaining attitude control.
  • Stacking and handling tasks4 class hours | Combine coordinate calibration and grasp strategies to plan multi-point handling.
  • Drawing geometric patterns4 class hours | Produce planar geometric figures through custom trajectory generation.

Sensor acquisition & control

  • IMU data acquisition2 class hours | Read posture sensor data, complete orientation estimation, and apply filtering.
  • Gesture-controlled manipulator2 class hours | Drive the manipulator via posture sensor input for embodied interaction.

Computer vision basics (OpenCV)

  • Color recognition2 class hours | Convert color spaces and segment targets with OpenCV.
  • Shape recognition2 class hours | Extract contours and match geometric features to classify shapes.

AI vision (YOLO)

  • YOLO deployment2 class hours | Deploy YOLO on the embedded board for real-time inference.
  • Face detection2 class hours | Load pretrained weights to detect faces and output bounding boxes.
  • Face tracking2 class hours | Combine the pan-tilt unit and vision feedback for dynamic face tracking.
  • Dataset annotation2 class hours | Annotate detection datasets and handle format conversions.
  • Model training & deployment2 class hours | Fine-tune, quantize, and deploy YOLO models.
  • Workpiece inspection2 class hours | Build application-specific detection to classify and localize workpieces.

AI vision (Tongyi Qianwen multimodal)

  • Qwen multimodal API deployment2 class hours | Call the Tongyi Qianwen API for image understanding and text generation.
  • Fruit detection & annotation2 class hours | Use Tongyi Qianwen to recognize fruit targets and generate semantic labels.

LLM applications (AI voice dialogue)

  • ASR deployment2 class hours | Configure the Tongyi Qianwen ASR service to parse voice input.
  • LLM semantic planner2 class hours | Deploy DeepSeek to handle intent understanding and task planning.
  • TTS deployment2 class hours | Integrate Volcano Engine TTS for natural voice responses.
  • End-to-end voice dialogue2 class hours | Chain ASR, LLM, and TTS to build a full conversational loop.
  • Function-call voice calculator2 class hours | Implement voice-driven calculations via LLM function calls.
  • Function-call music playback2 class hours | Control music retrieval and playback through voice commands.
  • Function-call pan-tilt task planner4 class hours | Use voice instructions to drive pan-tilt tracking and target search.
  • Function-call manipulator task planner4 class hours | Trigger vision-based positioning and grasping through voice commands.

Robotic system integration

  • Socket communication2 class hours | Build a socket channel and exchange commands between subsystems.
  • Vision-driven manipulator tracking4 class hours | Map vision data to the manipulator coordinate frame for dynamic tracking.
  • Vision–manipulator hand–eye calibration2 class hours | Complete hand–eye calibration to map pixels to poses.
  • Vision-based sorting4 class hours | Combine perception, planning, and execution to complete sorting tasks.

ROS (Robot Operating System)

  • Run a ROS2 project quickly2 class hours | Create, build, and run ROS2 workspaces.
  • Build and port ROS2 packages2 class hours | Create packages, manage dependencies, and port functionality.
  • MoveIt configuration2 class hours | Configure MoveIt scenes, import collision models, and validate planning.
  • 4-DOF MoveIt/RViz simulation2 class hours | Control the 4-DOF arm in RViz and verify trajectories.

Embedded system development

  • Ubuntu filesystem essentials1 class hour | Learn common directory structures and file commands.
  • Editor familiarization (vi / nano)1 class hour | Practice terminal editor basics and configuration.
  • Remote access setup (SSH / PuTTY)2 class hours | Configure remote connections for collaborative development.
  • Linux file I/O programming2 class hours | Implement file read/write with proper exception handling.
  • Serial communication2 class hours | Exchange serial data and design simple protocols.
  • Process / thread management2 class hours | Understand Linux processes and threads and write sample programs.
  • Interface design2 class hours | Quickly build human–machine interfaces with Python/Qt.

Knowledge Base

Get more technical documentation, tutorials, and FAQs about this product.

View Details

Key Questions

Q3: Which products support LLM integration and what can they do?
Embodied Composite Robot System Design Training Platform RAI-M4Embodied Robot Task Planning Training Platform RAI-P4Embodied Vision Perception & Decision Training Platform RAI-Q2

Answer: There are three products that support large-model integration:

RAI-P4: integrates Qwen, DeepSeek, and Volcano Engine; supports ASR (Qwen), LLM (DeepSeek), TTS (Volcano Engine), and function calling (such as voice-dialog calculators, music playback, and gimbal / robotic arm task planning), and also supports integrated applications with YOLO, face tracking, and robotic arm control.

RAI-M4: connects to DeepSeek (LLM) and Qwen (ASR + multimodal); supports converting natural language into robot task workflows (voice commands for chassis / robotic arm control) and multimodal object detection (Qwen), combining a mecanum chassis and a 4-axis robotic arm to achieve generalized manipulation.