Skip to main content

Lab 04-02: Camera Image Processing for Robotics

Objectives​

By the end of this lab, you will be able to:

  • Render camera images from MuJoCo simulation
  • Process RGB and depth images for robotic perception
  • Implement basic object detection using color segmentation
  • Extract 3D point clouds from depth images

Prerequisites​

  • Completed Lab 04-01 (IMU data reading)
  • Understanding of camera projection models
  • Familiarity with NumPy array operations
  • Basic OpenCV knowledge (helpful but not required)

Materials​

TypeNameTierNotes
softwareMuJoCo 3.0+requiredPhysics simulation with rendering
softwarePython 3.10+requiredProgramming environment
softwareNumPy, OpenCVrequiredImage processing
softwareOpen3DrecommendedPoint cloud visualization
simulationhumanoid-camera.xmlrequiredModel with camera sensors

Background​

Cameras are essential sensors for robot perception, providing rich visual information about the environment. In robotics, we commonly use:

  1. RGB Cameras: Color images for object recognition, tracking
  2. Depth Cameras: Distance measurements for 3D reconstruction
  3. Stereo Cameras: Two cameras for depth estimation

MuJoCo supports rendering from arbitrary camera viewpoints, providing both color and depth buffers.

Instructions​

Step 1: Camera Setup and Rendering​

Configure and render from a robot-mounted camera:

import mujoco
import numpy as np
import cv2

# Load model with camera
model = mujoco.MjModel.from_xml_path("textbook/assets/robot-models/mujoco/humanoid-camera.xml")
data = mujoco.MjData(model)

# Camera parameters
camera_name = "head_camera"
width, height = 640, 480

# Create renderer
renderer = mujoco.Renderer(model, height, width)

def render_camera(model, data, renderer, camera_name):
"""
Render RGB and depth images from a named camera.

Returns:
Tuple of (rgb_image, depth_image)
"""
# Update scene
renderer.update_scene(data, camera=camera_name)

# Render RGB
rgb = renderer.render()

# Render depth
renderer.enable_depth_rendering(True)
depth = renderer.render()
renderer.enable_depth_rendering(False)

return rgb, depth

# Step simulation and render
mujoco.mj_step(model, data)
rgb_image, depth_image = render_camera(model, data, renderer, camera_name)

print(f"RGB shape: {rgb_image.shape}")
print(f"Depth shape: {depth_image.shape}")
print(f"Depth range: [{depth_image.min():.3f}, {depth_image.max():.3f}]")

Expected: RGB image shape (480, 640, 3), depth image shape (480, 640). Depth values represent distance in meters.

Step 2: Camera Intrinsics​

Extract camera parameters for projection/unprojection:

def get_camera_intrinsics(model, camera_name, width, height):
"""
Compute camera intrinsic matrix from MuJoCo camera parameters.

Returns:
K: 3x3 intrinsic matrix
fov_y: Vertical field of view in radians
"""
camera_id = mujoco.mj_name2id(model, mujoco.mjtObj.mjOBJ_CAMERA, camera_name)

# Get field of view (vertical)
fov_y = model.cam_fovy[camera_id] * np.pi / 180.0

# Compute focal lengths
fy = height / (2.0 * np.tan(fov_y / 2.0))
fx = fy # Assuming square pixels

# Principal point (image center)
cx = width / 2.0
cy = height / 2.0

# Intrinsic matrix
K = np.array([
[fx, 0, cx],
[0, fy, cy],
[0, 0, 1]
])

return K, fov_y

K, fov_y = get_camera_intrinsics(model, camera_name, width, height)
print(f"Camera intrinsic matrix K:\n{K}")
print(f"Vertical FOV: {np.degrees(fov_y):.1f} degrees")

Expected: Valid intrinsic matrix with focal lengths around 500-800 pixels for typical FOV.

Step 3: Depth Image Processing​

Convert depth buffer to metric depth and visualize:

def process_depth(depth_raw, near=0.01, far=10.0):
"""
Process MuJoCo depth buffer to metric depth.

MuJoCo returns normalized depth in [0, 1] where:
- 0 = near plane
- 1 = far plane (or infinity for distant objects)

Args:
depth_raw: Raw depth buffer from renderer
near: Near clipping plane distance
far: Far clipping plane distance

Returns:
Metric depth image in meters
"""
# Handle edge cases
depth_raw = np.clip(depth_raw, 0, 1)

# Convert to linear depth
# z_ndc = 2 * depth_raw - 1
# z_linear = 2 * near * far / (far + near - z_ndc * (far - near))

# Simplified for MuJoCo's depth convention
depth_metric = near + depth_raw * (far - near)

return depth_metric

def visualize_depth(depth_metric, max_depth=5.0):
"""Create colorized depth visualization."""
# Normalize for visualization
depth_viz = np.clip(depth_metric / max_depth, 0, 1)
depth_viz = (depth_viz * 255).astype(np.uint8)

# Apply colormap
depth_colored = cv2.applyColorMap(depth_viz, cv2.COLORMAP_JET)

return depth_colored

# Process and visualize depth
depth_metric = process_depth(depth_image)
depth_viz = visualize_depth(depth_metric)

# Save images
cv2.imwrite('rgb_view.png', cv2.cvtColor(rgb_image, cv2.COLOR_RGB2BGR))
cv2.imwrite('depth_view.png', depth_viz)

print(f"Metric depth range: [{depth_metric.min():.3f}, {depth_metric.max():.3f}] meters")

Expected: Depth visualization showing objects colored by distance (blue=close, red=far).

Step 4: Color-Based Object Segmentation​

Implement simple object detection using color:

def segment_by_color(rgb_image, target_hsv_range):
"""
Segment objects by HSV color range.

Args:
rgb_image: Input RGB image
target_hsv_range: Tuple of ((h_min, s_min, v_min), (h_max, s_max, v_max))

Returns:
Binary mask where target color is detected
"""
# Convert to HSV
hsv = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2HSV)

# Create mask
lower = np.array(target_hsv_range[0])
upper = np.array(target_hsv_range[1])
mask = cv2.inRange(hsv, lower, upper)

# Clean up with morphological operations
kernel = np.ones((5, 5), np.uint8)
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)

return mask

def find_object_centroid(mask):
"""Find centroid of segmented object."""
# Find contours
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

if not contours:
return None

# Find largest contour
largest = max(contours, key=cv2.contourArea)

# Compute centroid
M = cv2.moments(largest)
```python
```python
if M["m00"] == 0:
        return None

cx = int(M["m10"] / M["m00"])
cy = int(M["m01"] / M["m00"])

return (cx, cy)

# Example: Detect red objects
# HSV range for red (note: red wraps around in HSV)
red_range = ((0, 100, 100), (10, 255, 255)) # Lower red
red_mask = segment_by_color(rgb_image, red_range)

# Find centroid
centroid = find_object_centroid(red_mask)
if centroid:
print(f"Object centroid in image: {centroid}")

# Draw on image
result = rgb_image.copy()
cv2.circle(result, centroid, 10, (0, 255, 0), 2)
cv2.imwrite('detection_result.png', cv2.cvtColor(result, cv2.COLOR_RGB2BGR))

Expected: If red objects are in the scene, the centroid should be detected and visualized.

Step 5: Depth-Based Point Cloud Generation​

Convert depth image to 3D point cloud:

def depth_to_pointcloud(depth_metric, K, rgb_image=None):
"""
Convert depth image to 3D point cloud.

Args:
depth_metric: Depth image in meters
K: Camera intrinsic matrix
rgb_image: Optional RGB image for coloring points

Returns:
points: Nx3 array of 3D points
colors: Nx3 array of RGB colors (if rgb_image provided)
"""
height, width = depth_metric.shape

# Create pixel coordinate grids
u = np.arange(width)
v = np.arange(height)
u, v = np.meshgrid(u, v)

# Unproject to 3D
fx, fy = K[0, 0], K[1, 1]
cx, cy = K[0, 2], K[1, 2]

z = depth_metric
x = (u - cx) * z / fx
y = (v - cy) * z / fy

# Stack into point cloud
points = np.stack([x, y, z], axis=-1)
points = points.reshape(-1, 3)

# Filter invalid points (too far or too close)
```python
```python
valid = (z.flatten() > 0.01) & (z.flatten() < 10.0)
    points = points[valid]

colors = None
if rgb_image is not None:
colors = rgb_image.reshape(-1, 3)[valid] / 255.0

return points, colors

# Generate point cloud
points, colors = depth_to_pointcloud(depth_metric, K, rgb_image)
print(f"Point cloud: {points.shape[0]} points")
print(f"Bounding box: X[{points[:, 0].min():.2f}, {points[:, 0].max():.2f}]")
print(f" Y[{points[:, 1].min():.2f}, {points[:, 1].max():.2f}]")
print(f" Z[{points[:, 2].min():.2f}, {points[:, 2].max():.2f}]")

Expected: Point cloud with thousands of valid 3D points forming the visible scene geometry.

Step 6: Point Cloud Visualization (Optional)​

If Open3D is available, visualize the point cloud:

try:
import open3d as o3d

def visualize_pointcloud(points, colors=None):
"""Visualize point cloud using Open3D."""
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(points)

if colors is not None:
pcd.colors = o3d.utility.Vector3dVector(colors)

# Add coordinate frame
coord_frame = o3d.geometry.TriangleMesh.create_coordinate_frame(size=0.5)

# Visualize
o3d.visualization.draw_geometries([pcd, coord_frame])

# Save
o3d.io.write_point_cloud("scene_pointcloud.ply", pcd)
print("Saved point cloud to scene_pointcloud.ply")

visualize_pointcloud(points, colors)

except ImportError:
print("Open3D not installed. Skipping 3D visualization.")

# Alternative: Save as simple text format
np.savetxt('pointcloud.xyz', points, fmt='%.4f')
print("Saved point cloud to pointcloud.xyz")

Expected: 3D visualization showing the scene reconstructed from depth data.

Expected Outcomes​

After completing this lab, you should have:

  1. Code artifacts:

    • camera_utils.py: Functions for camera rendering and intrinsics
    • depth_processing.py: Depth image processing utilities
    • color_segmentation.py: Object detection using color
    • pointcloud.py: Depth to 3D conversion
  2. Visual outputs:

    • rgb_view.png: RGB camera image
    • depth_view.png: Colorized depth visualization
    • detection_result.png: Object detection overlay
    • scene_pointcloud.ply: 3D point cloud file
  3. Understanding:

    • Camera intrinsic parameters and projection
    • Depth image interpretation
    • Basic computer vision for robotics
    • 3D reconstruction from depth

Rubric​

CriterionPointsDescription
Camera rendering15Correctly renders RGB and depth from simulation
Intrinsics computation15Accurate camera matrix from model parameters
Depth processing20Proper conversion to metric depth
Color segmentation20Object detection with morphological cleanup
Point cloud generation20Valid 3D reconstruction from depth
Code quality10Well-documented, modular functions
Total100

Common Errors​

Error: Renderer returns black image Solution: Ensure mujoco.mj_forward() is called before rendering to update the scene state.

Error: Depth values all near 1.0 Solution: Check camera position - it may be inside an object or facing away from the scene.

Error: Point cloud appears stretched or distorted Solution: Verify intrinsic matrix matches the actual rendering parameters.

Extensions​

For advanced students:

  1. Stereo Matching: Implement depth from stereo using two cameras
  2. Visual Odometry: Track camera motion using feature matching
  3. Object Recognition: Use deep learning for object classification
  4. SLAM Integration: Combine with IMU for simultaneous localization and mapping
  • Theory: Module 04 theory.md, Section 4.2 (Vision Systems)
  • Previous Lab: Lab 04-01 (IMU Data Reading)
  • Next Lab: Lab 04-03 (Sensor Fusion with Kalman Filter)
  • Case Study: See Berkeley BAIR case study for vision-based manipulation