Build Your First Modality Emulator: Step-by-Step Guide for DevelopersA modality emulator is a software component that mimics the behavior of a particular sensory or data modality (e.g., vision, audio, touch, sensor streams) so that multimodal systems can be developed, tested, and integrated without requiring the actual hardware or live data sources. This guide walks you through designing, building, and testing a basic modality emulator suitable for developers building multimodal AI systems, robotics controllers, or sensor-fusion pipelines.
Why build a modality emulator?
- Speeds development by allowing front-end, model, and system integration work to proceed before hardware is available.
- Enables reproducible testing with deterministic or configurable inputs.
- Reduces cost and risk by avoiding wear on physical sensors and enabling safe testing of edge cases.
- Supports continuous integration by allowing automated tests that include simulated modalities.
Overview and architecture
At a high level, a modality emulator has these components:
- Input generator — creates synthetic data frames/events representing the modality. This can be deterministic (scripts, prerecorded files) or stochastic (procedural generation, noise models).
- Emulator core — formats data into the same API/protocol as the real modality (e.g., timestamps, message formats, headers).
- Transport layer — delivers emulated data to the target system (e.g., REST/gRPC/WebSocket, ROS topics, MQTT, files).
- Controller/config — runtime controls (start/stop, playback speed, parameterization like noise level).
- Monitor/recorder — logs and visualizes emulator output for debugging and for replay.
Design considerations
- Fidelity vs. complexity: Higher fidelity (physics-based simulation, realistic noise) increases development time. Match fidelity to your use case.
- Determinism: For CI and debugging, provide options for deterministic seeds.
- Time synchronization: Ensure timestamps align with the target system’s clock or provide a simulated clock.
- Scalability: Allow multiple concurrent emulators (e.g., stereo cameras, multi-microphone arrays).
- Extensibility: Design modularly so new modalities or transports are plug-and-play.
- Safety: For robotics, include “emergency stop” and safe-mode scenarios to avoid commanding dangerous actions from emulated sensors.
Example project: Emulating a depth camera for a robotics pipeline
This example shows how to build a modest emulator that produces depth frames and camera intrinsics, exposes them over a ROS2 topic and a WebSocket for debugging UIs, and supports playback of prerecorded scenes or procedurally generated content.
Prerequisites:
- Python 3.11+
- ROS2 (Humble or later) for topic integration — optional if using only WebSocket.
- Open3D or NumPy for data handling.
- WebSocket library (websockets or FastAPI + WebSocket).
- Optional: prerecorded depth frames (PNG or NumPy .npy).
Step 1 — Project layout
Suggested structure:
modality_emulator/ ├─ emulator/ │ ├─ __init__.py │ ├─ core.py │ ├─ generators.py │ ├─ transports.py │ ├─ config.py │ └─ monitor.py ├─ scripts/ │ └─ run_emulator.py ├─ tests/ │ └─ test_core.py ├─ requirements.txt └─ README.md
Step 2 — Define the emulator API
Design a small API so the rest of your system can consume data without caring whether it’s real or emulated.
Key parts:
- Frame object: timestamp, width, height, data (uint16 for depth), intrinsics.
- Emulator service: start(), stop(), set_rate(fps), set_mode(mode), inject_noise(level), register_subscriber(callback).
Example Frame dataclass:
from dataclasses import dataclass import numpy as np from datetime import datetime @dataclass class DepthFrame: timestamp: float # epoch seconds or simulated clock width: int height: int data: np.ndarray # shape (H, W), dtype=np.uint16 or float32 intrinsics: dict # fx, fy, cx, cy, distortion
Step 3 — Implement generators
Provide at least two generator modes:
- Playback generator: reads frames from disk (PNG/.npy) and emits them at configured FPS.
- Procedural generator: creates synthetic scenes (planes, boxes, noise, moving objects).
Procedural example using NumPy:
import numpy as np def generate_plane(depth_m=2.0, width=640, height=480): z = np.full((height, width), depth_m, dtype=np.float32) # add small sensor noise noise = np.random.normal(scale=0.01, size=z.shape).astype(np.float32) return z + noise
For moving objects, render simple shapes by altering per-frame depth in regions.
Step 4 — Implement transport layers
Expose frames through multiple transports:
- ROS2 publisher:
- Convert DepthFrame to sensor_msgs.msg.Image with appropriate encoding.
- Publish on /camera/depth/image_raw and /camera/info for intrinsics.
- WebSocket:
- Send JSON metadata and binary depth payload (e.g., via base64 or binary frames).
- REST endpoint:
- Provide single-frame fetch and status endpoints.
ROS2 publisher sketch:
import rclpy from sensor_msgs.msg import Image, CameraInfo import numpy as np def depth_to_image_msg(depth_frame: DepthFrame) -> Image: img = Image() img.header.stamp = rclpy.clock.Clock().now().to_msg() img.height = depth_frame.height img.width = depth_frame.width img.encoding = '16UC1' # or '32FC1' img.step = depth_frame.width * 2 # bytes per row for uint16 img.data = depth_frame.data.tobytes() return img
WebSocket sketch using FastAPI:
from fastapi import FastAPI, WebSocket import base64 app = FastAPI() @app.websocket("/ws/depth") async def ws_depth(ws: WebSocket): await ws.accept() while True: frame = await get_next_frame_async() meta = { "timestamp": frame.timestamp, "width": frame.width, "height": frame.height, "intrinsics": frame.intrinsics } await ws.send_json(meta) await ws.send_bytes(frame.data.tobytes())
Step 5 — Control and configuration
Provide a runtime config (JSON or YAML) with fields:
- mode: playback | procedural
- fps: 30
- noise_std: 0.01
- transport: ros2 | websocket | rest | all
- seed: 42 (for deterministic procedural generation)
Implement an HTTP control API or CLI flags to change parameters at runtime, or use ROS2 service calls for live control.
Step 6 — Monitoring and logging
- Add a simple UI (HTML + JS) that connects via WebSocket, displays depth frames as heatmaps and logs timestamps to check jitter.
- Log frame send time, generation time, and transport time for latency measurement.
- Add toggles for showing raw vs. filtered depth and for injecting faults (drop frames, duplicate frames, delay).
Step 7 — Testing and CI
- Unit tests for generators (statistics, shape, value ranges).
- Integration tests that start the emulator (in a test mode) and subscribe via transport to verify frame rate, content integrity, and timestamp sequencing.
- Use deterministic seeds for reproducibility and snapshots of small frames for regression tests.
Example pytest check:
def test_generate_plane_shape(): z = generate_plane(2.0, 64, 48) assert z.shape == (48, 64) assert np.all(z > 0)
Step 8 — Extending to multiple modalities
The same pattern applies to other modalities:
- RGB camera: generate color images, add exposure/noise models, provide MJPEG/ROS image transport.
- Audio: synthesize sine sweeps, white noise, or playback WAV files; expose via WebSocket or RTP.
- IMU: simulate accelerometer/gyro with proper units and bias, support configurable drift.
- LIDAR: generate point clouds with angular patterns and range noise; publish as PointCloud2 or binary frames.
Design generators and transports to be modular so you can mix modalities for sensor fusion testing.
Example: End-to-end usage
- Start emulator in procedural mode with fps=30 and transport=all.
- Start robot stack but point its camera topic to the emulator’s ROS2 topic.
- Use the UI to inject a moving obstacle and verify perception pipeline correctly detects it.
- Run CI integration test that starts emulator in playback mode and validates detection outputs against ground truth.
Best practices and tips
- Start simple: validate the pipeline with basic synthetic data before increasing fidelity.
- Keep the interface identical to the real sensor’s API so swapping emulator/real-world is trivial.
- Provide seedable randomness and deterministic modes for tests.
- Offer fault-injection controls for robustness testing (stale frames, jitter, outliers).
- Document intrinsics and units clearly to avoid unit mismatches.
- Version your emulator and provide compatibility matrix with real sensors/ROS message versions.
Conclusion
A modality emulator accelerates development, testing, and integration of multimodal systems by decoupling software from hardware availability. Build it modularly: generative engines produce data, transports expose it through the same interfaces as the real sensors, and control/monitoring tools let you operate and validate behavior. Start with a simple depth camera emulator as shown here, then extend to other modalities and richer physics models as needed.