Lab 04-02: Camera Image Processing for Robotics

Objectives

By the end of this lab, you will be able to:

Render camera images from MuJoCo simulation
Process RGB and depth images for robotic perception
Implement basic object detection using color segmentation
Extract 3D point clouds from depth images

Prerequisites

Completed Lab 04-01 (IMU data reading)
Understanding of camera projection models
Familiarity with NumPy array operations
Basic OpenCV knowledge (helpful but not required)

Materials

Type	Name	Tier	Notes
software	MuJoCo 3.0+	required	Physics simulation with rendering
software	Python 3.10+	required	Programming environment
software	NumPy, OpenCV	required	Image processing
software	Open3D	recommended	Point cloud visualization
simulation	humanoid-camera.xml	required	Model with camera sensors

Background

Cameras are essential sensors for robot perception, providing rich visual information about the environment. In robotics, we commonly use:

RGB Cameras: Color images for object recognition, tracking
Depth Cameras: Distance measurements for 3D reconstruction
Stereo Cameras: Two cameras for depth estimation

MuJoCo supports rendering from arbitrary camera viewpoints, providing both color and depth buffers.

Instructions

Step 1: Camera Setup and Rendering

Configure and render from a robot-mounted camera:

import mujoco
import numpy as np
import cv2

# Load model with camera
model = mujoco.MjModel.from_xml_path("textbook/assets/robot-models/mujoco/humanoid-camera.xml")
data = mujoco.MjData(model)

# Camera parameters
camera_name = "head_camera"
width, height = 640, 480

# Create renderer
renderer = mujoco.Renderer(model, height, width)

def render_camera(model, data, renderer, camera_name):
    """
    Render RGB and depth images from a named camera.

    Returns:
        Tuple of (rgb_image, depth_image)
    """
    # Update scene
    renderer.update_scene(data, camera=camera_name)

    # Render RGB
    rgb = renderer.render()

    # Render depth
    renderer.enable_depth_rendering(True)
    depth = renderer.render()
    renderer.enable_depth_rendering(False)

    return rgb, depth

# Step simulation and render
mujoco.mj_step(model, data)
rgb_image, depth_image = render_camera(model, data, renderer, camera_name)

print(f"RGB shape: {rgb_image.shape}")
print(f"Depth shape: {depth_image.shape}")
print(f"Depth range: [{depth_image.min():.3f}, {depth_image.max():.3f}]")

Expected: RGB image shape (480, 640, 3), depth image shape (480, 640). Depth values represent distance in meters.

Step 2: Camera Intrinsics

Extract camera parameters for projection/unprojection:

def get_camera_intrinsics(model, camera_name, width, height):
    """
    Compute camera intrinsic matrix from MuJoCo camera parameters.

    Returns:
        K: 3x3 intrinsic matrix
        fov_y: Vertical field of view in radians
    """
    camera_id = mujoco.mj_name2id(model, mujoco.mjtObj.mjOBJ_CAMERA, camera_name)

    # Get field of view (vertical)
    fov_y = model.cam_fovy[camera_id] * np.pi / 180.0

    # Compute focal lengths
    fy = height / (2.0 * np.tan(fov_y / 2.0))
    fx = fy  # Assuming square pixels

    # Principal point (image center)
    cx = width / 2.0
    cy = height / 2.0

    # Intrinsic matrix
    K = np.array([
        [fx, 0, cx],
        [0, fy, cy],
        [0, 0, 1]
    ])

    return K, fov_y

K, fov_y = get_camera_intrinsics(model, camera_name, width, height)
print(f"Camera intrinsic matrix K:\n{K}")
print(f"Vertical FOV: {np.degrees(fov_y):.1f} degrees")

Expected: Valid intrinsic matrix with focal lengths around 500-800 pixels for typical FOV.

Step 3: Depth Image Processing

Convert depth buffer to metric depth and visualize:

def process_depth(depth_raw, near=0.01, far=10.0):
    """
    Process MuJoCo depth buffer to metric depth.

    MuJoCo returns normalized depth in [0, 1] where:
    - 0 = near plane
    - 1 = far plane (or infinity for distant objects)

    Args:
        depth_raw: Raw depth buffer from renderer
        near: Near clipping plane distance
        far: Far clipping plane distance

    Returns:
        Metric depth image in meters
    """
    # Handle edge cases
    depth_raw = np.clip(depth_raw, 0, 1)

    # Convert to linear depth
    # z_ndc = 2 * depth_raw - 1
    # z_linear = 2 * near * far / (far + near - z_ndc * (far - near))

    # Simplified for MuJoCo's depth convention
    depth_metric = near + depth_raw * (far - near)

    return depth_metric

def visualize_depth(depth_metric, max_depth=5.0):
    """Create colorized depth visualization."""
    # Normalize for visualization
    depth_viz = np.clip(depth_metric / max_depth, 0, 1)
    depth_viz = (depth_viz * 255).astype(np.uint8)

    # Apply colormap
    depth_colored = cv2.applyColorMap(depth_viz, cv2.COLORMAP_JET)

    return depth_colored

# Process and visualize depth
depth_metric = process_depth(depth_image)
depth_viz = visualize_depth(depth_metric)

# Save images
cv2.imwrite('rgb_view.png', cv2.cvtColor(rgb_image, cv2.COLOR_RGB2BGR))
cv2.imwrite('depth_view.png', depth_viz)

print(f"Metric depth range: [{depth_metric.min():.3f}, {depth_metric.max():.3f}] meters")

Expected: Depth visualization showing objects colored by distance (blue=close, red=far).

Step 4: Color-Based Object Segmentation

Implement simple object detection using color:

def segment_by_color(rgb_image, target_hsv_range):
    """
    Segment objects by HSV color range.

    Args:
        rgb_image: Input RGB image
        target_hsv_range: Tuple of ((h_min, s_min, v_min), (h_max, s_max, v_max))

    Returns:
        Binary mask where target color is detected
    """
    # Convert to HSV
    hsv = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2HSV)

    # Create mask
    lower = np.array(target_hsv_range[0])
    upper = np.array(target_hsv_range[1])
    mask = cv2.inRange(hsv, lower, upper)

    # Clean up with morphological operations
    kernel = np.ones((5, 5), np.uint8)
    mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)

    return mask

def find_object_centroid(mask):
    """Find centroid of segmented object."""
    # Find contours
    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    if not contours:
        return None

    # Find largest contour
    largest = max(contours, key=cv2.contourArea)

    # Compute centroid
    M = cv2.moments(largest)
```python
```python
    if M["m00"] == 0:

        return None

    cx = int(M["m10"] / M["m00"])
    cy = int(M["m01"] / M["m00"])

    return (cx, cy)

# Example: Detect red objects
# HSV range for red (note: red wraps around in HSV)
red_range = ((0, 100, 100), (10, 255, 255))  # Lower red
red_mask = segment_by_color(rgb_image, red_range)

# Find centroid
centroid = find_object_centroid(red_mask)
if centroid:
    print(f"Object centroid in image: {centroid}")

    # Draw on image
    result = rgb_image.copy()
    cv2.circle(result, centroid, 10, (0, 255, 0), 2)
    cv2.imwrite('detection_result.png', cv2.cvtColor(result, cv2.COLOR_RGB2BGR))

Expected: If red objects are in the scene, the centroid should be detected and visualized.

Step 5: Depth-Based Point Cloud Generation

Convert depth image to 3D point cloud:

def depth_to_pointcloud(depth_metric, K, rgb_image=None):
    """
    Convert depth image to 3D point cloud.

    Args:
        depth_metric: Depth image in meters
        K: Camera intrinsic matrix
        rgb_image: Optional RGB image for coloring points

    Returns:
        points: Nx3 array of 3D points
        colors: Nx3 array of RGB colors (if rgb_image provided)
    """
    height, width = depth_metric.shape

    # Create pixel coordinate grids
    u = np.arange(width)
    v = np.arange(height)
    u, v = np.meshgrid(u, v)

    # Unproject to 3D
    fx, fy = K[0, 0], K[1, 1]
    cx, cy = K[0, 2], K[1, 2]

    z = depth_metric
    x = (u - cx) * z / fx
    y = (v - cy) * z / fy

    # Stack into point cloud
    points = np.stack([x, y, z], axis=-1)
    points = points.reshape(-1, 3)

    # Filter invalid points (too far or too close)
```python
```python
    valid = (z.flatten() > 0.01) & (z.flatten() < 10.0)

    points = points[valid]

    colors = None
    if rgb_image is not None:
        colors = rgb_image.reshape(-1, 3)[valid] / 255.0

    return points, colors

# Generate point cloud
points, colors = depth_to_pointcloud(depth_metric, K, rgb_image)
print(f"Point cloud: {points.shape[0]} points")
print(f"Bounding box: X[{points[:, 0].min():.2f}, {points[:, 0].max():.2f}]")
print(f"             Y[{points[:, 1].min():.2f}, {points[:, 1].max():.2f}]")
print(f"             Z[{points[:, 2].min():.2f}, {points[:, 2].max():.2f}]")

Expected: Point cloud with thousands of valid 3D points forming the visible scene geometry.

Step 6: Point Cloud Visualization (Optional)

If Open3D is available, visualize the point cloud:

try:
    import open3d as o3d

    def visualize_pointcloud(points, colors=None):
        """Visualize point cloud using Open3D."""
        pcd = o3d.geometry.PointCloud()
        pcd.points = o3d.utility.Vector3dVector(points)

        if colors is not None:
            pcd.colors = o3d.utility.Vector3dVector(colors)

        # Add coordinate frame
        coord_frame = o3d.geometry.TriangleMesh.create_coordinate_frame(size=0.5)

        # Visualize
        o3d.visualization.draw_geometries([pcd, coord_frame])

        # Save
        o3d.io.write_point_cloud("scene_pointcloud.ply", pcd)
        print("Saved point cloud to scene_pointcloud.ply")

    visualize_pointcloud(points, colors)

except ImportError:
    print("Open3D not installed. Skipping 3D visualization.")

    # Alternative: Save as simple text format
    np.savetxt('pointcloud.xyz', points, fmt='%.4f')
    print("Saved point cloud to pointcloud.xyz")

Expected: 3D visualization showing the scene reconstructed from depth data.

Expected Outcomes

After completing this lab, you should have:

Code artifacts:
- camera_utils.py: Functions for camera rendering and intrinsics
- depth_processing.py: Depth image processing utilities
- color_segmentation.py: Object detection using color
- pointcloud.py: Depth to 3D conversion
Visual outputs:
- rgb_view.png: RGB camera image
- depth_view.png: Colorized depth visualization
- detection_result.png: Object detection overlay
- scene_pointcloud.ply: 3D point cloud file
Understanding:
- Camera intrinsic parameters and projection
- Depth image interpretation
- Basic computer vision for robotics
- 3D reconstruction from depth

Rubric

Criterion	Points	Description
Camera rendering	15	Correctly renders RGB and depth from simulation
Intrinsics computation	15	Accurate camera matrix from model parameters
Depth processing	20	Proper conversion to metric depth
Color segmentation	20	Object detection with morphological cleanup
Point cloud generation	20	Valid 3D reconstruction from depth
Code quality	10	Well-documented, modular functions
Total	100

Common Errors

Error: Renderer returns black image Solution: Ensure mujoco.mj_forward() is called before rendering to update the scene state.

Error: Depth values all near 1.0 Solution: Check camera position - it may be inside an object or facing away from the scene.

Error: Point cloud appears stretched or distorted Solution: Verify intrinsic matrix matches the actual rendering parameters.

Extensions

For advanced students:

Stereo Matching: Implement depth from stereo using two cameras
Visual Odometry: Track camera motion using feature matching
Object Recognition: Use deep learning for object classification
SLAM Integration: Combine with IMU for simultaneous localization and mapping

Theory: Module 04 theory.md, Section 4.2 (Vision Systems)
Previous Lab: Lab 04-01 (IMU Data Reading)
Next Lab: Lab 04-03 (Sensor Fusion with Kalman Filter)
Case Study: See Berkeley BAIR case study for vision-based manipulation

Objectives​

Prerequisites​

Materials​

Background​

Instructions​

Step 1: Camera Setup and Rendering​

Step 2: Camera Intrinsics​

Step 3: Depth Image Processing​

Step 4: Color-Based Object Segmentation​

Step 5: Depth-Based Point Cloud Generation​

Step 6: Point Cloud Visualization (Optional)​

Expected Outcomes​

Rubric​

Common Errors​

Extensions​

Related Content​