v3.0.1 Stable Release

We Give Robots
Vision

The open-source vision framework for edge devices. Runs DeepStream pipelines, YOLO detection, and world models at 60 FPS on NVIDIA Jetson, Intel NPU, and Hailo.

Initialize Engine View Repository
openeyes-engine — bash
$ python -m src.main --camera 0 --enable-face --enable-gesture --enable-pose
[INFO] Initializing OpenEyes Vision Engine v3.0.1
[SYSTEM] Hardware detected: Jetson Orin Nano (8GB)
[SYSTEM] CUDA Available: True | TensorRT: True
[INFO] Loading YOLOv10n INT8 engine... DONE (1.2s)
[INFO] Initializing MediaPipe FaceMesh (max_faces=3)... DONE
[INFO] Initializing MediaPipe Hands... DONE
[INFO] Starting DeepStream pipeline via appsink...

FPS: 60 | Obj: 3 | Face: 1 | Hand: thumbs_up | Pose: 1
FPS: 60 | Obj: 3 | Face: 1 | Hand: thumbs_up | Pose: 1
FPS: 60 | Obj: 4 | Face: 1 | Hand: open_palm | Pose: 1
FPS: 59 | Obj: 4 | Face: 1 | Hand: open_palm | Pose: 1

Core Architecture

DeepStream Pipeline

Hardware-accelerated processing via GStreamer/DeepStream. Runs TensorRT YOLO engines directly on GPU, passing frames via appsink for zero-copy Python manipulation.

👤

Multi-Modal Inference

Simultaneous FaceMesh (up to 3 faces), Hand tracking (8 defined gestures), and full-body Pose estimation running alongside primary object detection.

📡

ROS2 Native

Publishes telemetry across 10 specialized topics (/vision/detections, /vision/depth, /vision/pose) using MultiThreadedExecutor.

🧠

World Models

Predictive intelligence at 200 Hz with LeWM (15M params) and V-JEPA 2 for spatiotemporal awareness.

Performance Benchmarks

Tested on NVIDIA Jetson Orin Nano (8GB) in MAXN mode.

Configuration Models Active Frame Rate Latency
Detection Only (INT8) YOLOv10n TensorRT 60 FPS 16ms
Minimal Pipeline Detection + Depth + Tracking 35-40 FPS 28ms
Full Pipeline (v3.0.1) Detection + Face + Gesture + Pose 25-30 FPS 38ms
World Model Planning LeWM 15M (Inference Only) 200 Hz 5ms