Lab 04-02: Camera Image Processing for Robotics
Objectives​
By the end of this lab, you will be able to:
- Render camera images from MuJoCo simulation
- Process RGB and depth images for robotic perception
- Implement basic object detection using color segmentation
- Extract 3D point clouds from depth images
Prerequisites​
- Completed Lab 04-01 (IMU data reading)
- Understanding of camera projection models
- Familiarity with NumPy array operations
- Basic OpenCV knowledge (helpful but not required)
Materials​
| Type | Name | Tier | Notes |
|---|---|---|---|
| software | MuJoCo 3.0+ | required | Physics simulation with rendering |
| software | Python 3.10+ | required | Programming environment |
| software | NumPy, OpenCV | required | Image processing |
| software | Open3D | recommended | Point cloud visualization |
| simulation | humanoid-camera.xml | required | Model with camera sensors |
Background​
Cameras are essential sensors for robot perception, providing rich visual information about the environment. In robotics, we commonly use:
- RGB Cameras: Color images for object recognition, tracking
- Depth Cameras: Distance measurements for 3D reconstruction
- Stereo Cameras: Two cameras for depth estimation
MuJoCo supports rendering from arbitrary camera viewpoints, providing both color and depth buffers.
Instructions​
Step 1: Camera Setup and Rendering​
Configure and render from a robot-mounted camera:
import mujoco
import numpy as np
import cv2
# Load model with camera
model = mujoco.MjModel.from_xml_path("textbook/assets/robot-models/mujoco/humanoid-camera.xml")
data = mujoco.MjData(model)
# Camera parameters
camera_name = "head_camera"
width, height = 640, 480
# Create renderer
renderer = mujoco.Renderer(model, height, width)
def render_camera(model, data, renderer, camera_name):
"""
Render RGB and depth images from a named camera.
Returns:
Tuple of (rgb_image, depth_image)
"""
# Update scene
renderer.update_scene(data, camera=camera_name)
# Render RGB
rgb = renderer.render()
# Render depth
renderer.enable_depth_rendering(True)
depth = renderer.render()
renderer.enable_depth_rendering(False)
return rgb, depth
# Step simulation and render
mujoco.mj_step(model, data)
rgb_image, depth_image = render_camera(model, data, renderer, camera_name)
print(f"RGB shape: {rgb_image.shape}")
print(f"Depth shape: {depth_image.shape}")
print(f"Depth range: [{depth_image.min():.3f}, {depth_image.max():.3f}]")
Expected: RGB image shape (480, 640, 3), depth image shape (480, 640). Depth values represent distance in meters.
Step 2: Camera Intrinsics​
Extract camera parameters for projection/unprojection:
def get_camera_intrinsics(model, camera_name, width, height):
"""
Compute camera intrinsic matrix from MuJoCo camera parameters.
Returns:
K: 3x3 intrinsic matrix
fov_y: Vertical field of view in radians
"""
camera_id = mujoco.mj_name2id(model, mujoco.mjtObj.mjOBJ_CAMERA, camera_name)
# Get field of view (vertical)
fov_y = model.cam_fovy[camera_id] * np.pi / 180.0
# Compute focal lengths
fy = height / (2.0 * np.tan(fov_y / 2.0))
fx = fy # Assuming square pixels
# Principal point (image center)
cx = width / 2.0
cy = height / 2.0
# Intrinsic matrix
K = np.array([
[fx, 0, cx],
[0, fy, cy],
[0, 0, 1]
])
return K, fov_y
K, fov_y = get_camera_intrinsics(model, camera_name, width, height)
print(f"Camera intrinsic matrix K:\n{K}")
print(f"Vertical FOV: {np.degrees(fov_y):.1f} degrees")
Expected: Valid intrinsic matrix with focal lengths around 500-800 pixels for typical FOV.
Step 3: Depth Image Processing​
Convert depth buffer to metric depth and visualize:
def process_depth(depth_raw, near=0.01, far=10.0):
"""
Process MuJoCo depth buffer to metric depth.
MuJoCo returns normalized depth in [0, 1] where:
- 0 = near plane
- 1 = far plane (or infinity for distant objects)
Args:
depth_raw: Raw depth buffer from renderer
near: Near clipping plane distance
far: Far clipping plane distance
Returns:
Metric depth image in meters
"""
# Handle edge cases
depth_raw = np.clip(depth_raw, 0, 1)
# Convert to linear depth
# z_ndc = 2 * depth_raw - 1
# z_linear = 2 * near * far / (far + near - z_ndc * (far - near))
# Simplified for MuJoCo's depth convention
depth_metric = near + depth_raw * (far - near)
return depth_metric
def visualize_depth(depth_metric, max_depth=5.0):
"""Create colorized depth visualization."""
# Normalize for visualization
depth_viz = np.clip(depth_metric / max_depth, 0, 1)
depth_viz = (depth_viz * 255).astype(np.uint8)
# Apply colormap
depth_colored = cv2.applyColorMap(depth_viz, cv2.COLORMAP_JET)
return depth_colored
# Process and visualize depth
depth_metric = process_depth(depth_image)
depth_viz = visualize_depth(depth_metric)
# Save images
cv2.imwrite('rgb_view.png', cv2.cvtColor(rgb_image, cv2.COLOR_RGB2BGR))
cv2.imwrite('depth_view.png', depth_viz)
print(f"Metric depth range: [{depth_metric.min():.3f}, {depth_metric.max():.3f}] meters")
Expected: Depth visualization showing objects colored by distance (blue=close, red=far).
Step 4: Color-Based Object Segmentation​
Implement simple object detection using color:
def segment_by_color(rgb_image, target_hsv_range):
"""
Segment objects by HSV color range.
Args:
rgb_image: Input RGB image
target_hsv_range: Tuple of ((h_min, s_min, v_min), (h_max, s_max, v_max))
Returns:
Binary mask where target color is detected
"""
# Convert to HSV
hsv = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2HSV)
# Create mask
lower = np.array(target_hsv_range[0])
upper = np.array(target_hsv_range[1])
mask = cv2.inRange(hsv, lower, upper)
# Clean up with morphological operations
kernel = np.ones((5, 5), np.uint8)
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)
return mask
def find_object_centroid(mask):
"""Find centroid of segmented object."""
# Find contours
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
if not contours:
return None
# Find largest contour
largest = max(contours, key=cv2.contourArea)
# Compute centroid
M = cv2.moments(largest)
```python
```python
if M["m00"] == 0:
return None
cx = int(M["m10"] / M["m00"])
cy = int(M["m01"] / M["m00"])
return (cx, cy)
# Example: Detect red objects
# HSV range for red (note: red wraps around in HSV)
red_range = ((0, 100, 100), (10, 255, 255)) # Lower red
red_mask = segment_by_color(rgb_image, red_range)
# Find centroid
centroid = find_object_centroid(red_mask)
if centroid:
print(f"Object centroid in image: {centroid}")
# Draw on image
result = rgb_image.copy()
cv2.circle(result, centroid, 10, (0, 255, 0), 2)
cv2.imwrite('detection_result.png', cv2.cvtColor(result, cv2.COLOR_RGB2BGR))
Expected: If red objects are in the scene, the centroid should be detected and visualized.
Step 5: Depth-Based Point Cloud Generation​
Convert depth image to 3D point cloud:
def depth_to_pointcloud(depth_metric, K, rgb_image=None):
"""
Convert depth image to 3D point cloud.
Args:
depth_metric: Depth image in meters
K: Camera intrinsic matrix
rgb_image: Optional RGB image for coloring points
Returns:
points: Nx3 array of 3D points
colors: Nx3 array of RGB colors (if rgb_image provided)
"""
height, width = depth_metric.shape
# Create pixel coordinate grids
u = np.arange(width)
v = np.arange(height)
u, v = np.meshgrid(u, v)
# Unproject to 3D
fx, fy = K[0, 0], K[1, 1]
cx, cy = K[0, 2], K[1, 2]
z = depth_metric
x = (u - cx) * z / fx
y = (v - cy) * z / fy
# Stack into point cloud
points = np.stack([x, y, z], axis=-1)
points = points.reshape(-1, 3)
# Filter invalid points (too far or too close)
```python
```python
valid = (z.flatten() > 0.01) & (z.flatten() < 10.0)
points = points[valid]
colors = None
if rgb_image is not None:
colors = rgb_image.reshape(-1, 3)[valid] / 255.0
return points, colors
# Generate point cloud
points, colors = depth_to_pointcloud(depth_metric, K, rgb_image)
print(f"Point cloud: {points.shape[0]} points")
print(f"Bounding box: X[{points[:, 0].min():.2f}, {points[:, 0].max():.2f}]")
print(f" Y[{points[:, 1].min():.2f}, {points[:, 1].max():.2f}]")
print(f" Z[{points[:, 2].min():.2f}, {points[:, 2].max():.2f}]")
Expected: Point cloud with thousands of valid 3D points forming the visible scene geometry.
Step 6: Point Cloud Visualization (Optional)​
If Open3D is available, visualize the point cloud:
try:
import open3d as o3d
def visualize_pointcloud(points, colors=None):
"""Visualize point cloud using Open3D."""
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(points)
if colors is not None:
pcd.colors = o3d.utility.Vector3dVector(colors)
# Add coordinate frame
coord_frame = o3d.geometry.TriangleMesh.create_coordinate_frame(size=0.5)
# Visualize
o3d.visualization.draw_geometries([pcd, coord_frame])
# Save
o3d.io.write_point_cloud("scene_pointcloud.ply", pcd)
print("Saved point cloud to scene_pointcloud.ply")
visualize_pointcloud(points, colors)
except ImportError:
print("Open3D not installed. Skipping 3D visualization.")
# Alternative: Save as simple text format
np.savetxt('pointcloud.xyz', points, fmt='%.4f')
print("Saved point cloud to pointcloud.xyz")
Expected: 3D visualization showing the scene reconstructed from depth data.
Expected Outcomes​
After completing this lab, you should have:
-
Code artifacts:
camera_utils.py: Functions for camera rendering and intrinsicsdepth_processing.py: Depth image processing utilitiescolor_segmentation.py: Object detection using colorpointcloud.py: Depth to 3D conversion
-
Visual outputs:
rgb_view.png: RGB camera imagedepth_view.png: Colorized depth visualizationdetection_result.png: Object detection overlayscene_pointcloud.ply: 3D point cloud file
-
Understanding:
- Camera intrinsic parameters and projection
- Depth image interpretation
- Basic computer vision for robotics
- 3D reconstruction from depth
Rubric​
| Criterion | Points | Description |
|---|---|---|
| Camera rendering | 15 | Correctly renders RGB and depth from simulation |
| Intrinsics computation | 15 | Accurate camera matrix from model parameters |
| Depth processing | 20 | Proper conversion to metric depth |
| Color segmentation | 20 | Object detection with morphological cleanup |
| Point cloud generation | 20 | Valid 3D reconstruction from depth |
| Code quality | 10 | Well-documented, modular functions |
| Total | 100 |
Common Errors​
Error: Renderer returns black image
Solution: Ensure mujoco.mj_forward() is called before rendering to update the scene state.
Error: Depth values all near 1.0 Solution: Check camera position - it may be inside an object or facing away from the scene.
Error: Point cloud appears stretched or distorted Solution: Verify intrinsic matrix matches the actual rendering parameters.
Extensions​
For advanced students:
- Stereo Matching: Implement depth from stereo using two cameras
- Visual Odometry: Track camera motion using feature matching
- Object Recognition: Use deep learning for object classification
- SLAM Integration: Combine with IMU for simultaneous localization and mapping
Related Content​
- Theory: Module 04 theory.md, Section 4.2 (Vision Systems)
- Previous Lab: Lab 04-01 (IMU Data Reading)
- Next Lab: Lab 04-03 (Sensor Fusion with Kalman Filter)
- Case Study: See Berkeley BAIR case study for vision-based manipulation