In our previous article, Introduction to OpenTelemetry, we explored signal generation using OpenTelemetry, with a focus on signal types and how to instrument applications. Now, we aim to take it a step further by diving into the management of signals from different sources and offering practical insights into how to analyze them efficiently.
With several distributed components running in an application, manually reading logs, traces, and metrics from different file locations would be highly inefficient. Centralizing the data would provide a much better solution.
Additionally, to analyze and correlate this data efficiently, we need the ability to query and create visualizations easily. This significantly reduces the time taken to understand the system and focus on what matters. Observability tools like Jaeger, Prometheus, and OpenSearch play a crucial role in facilitating this process.
However, to use these platforms, we need to ensure that the data reaches them. Let's start detailing the steps involved in designing and implementing the telemetry pipeline.
The OpenTelemetry Collector: The Telemetry Pipeline Building Block
Suppose there are OpenTelemetry agents running side by side with our applications (containers, functions, virtual servers, etc.), generating signals in the form of traces, logs, or metrics. Due to the ephemeral nature of some components, we cannot store the signal data within them. In the event of an issue, the data required for the analysis would be lost along with the component. Instead, we need to send this data elsewhere and the OpenTelemetry agents' exporters can handle it. These agents provide an implementation based on the OpenTelemetry Protocol (OTLP) to communicate with other elements.
Now, imagine that an environment is constituted by hundreds of applications and, therefore, a lot of telemetry data is produced. In this situation, even though centralizing the data is the goal, we may not want it flowing directly to our observability backend. Different components of the system may have different requirements for storing and processing logs, traces, and metrics (e.g., filtering all the debug-level logs from some high-throughput applications).
Since the scenario described in the example occurs so frequently, OpenTelemetry offers a tool to handle this concern—the OpenTelemetry Collector. It is a standalone application that can receive OTLP messages, process the data, and then export it to another component (which may be another Collector).
The OpenTelemetry Collector is a flexible application, and we can configure how each type of signal should flow within it. This flow configuration is defined as part of a pipeline, and each pipeline can specify three types of elements: receivers, processors, and exporters.
The receivers define where the input data of this pipeline comes from. One interesting aspect is that the receiver does not mean the collector must be passive and listen to other components. It can also fetch the data from the components. After receiving the data, the pipeline can define processors to transform it. This transformation can be adding a field to identify where the data is coming from, filtering the data based on specific tags, and much more. This transformed data can be sent to multiple destinations through the exporters. Each supported telemetry backend has an exporter, and the list grows with each new version.
The last component is the connector, which facilitates data transfer from one pipeline to another within the same collector, acting as an internal exporter with a receiver. This enables some interesting use cases, such as generating metrics from your traces and counting the traces based on specific conditions.
Figure 1 has the internal structure of the collector, as described in the OpenTelemetry documentation.
Figure 1 – OpenTelemetry Collector Internal structure (source: https://opentelemetry.io/docs/collector/)
Extending the OpenTelemetry Collector
If you need functionality beyond what the OpenTelemetry Collector provides, it can be extended. For instance, let us say your project currently relies on a custom metric that captures and measures data within your application. The collector offers a robust and versatile solution thanks to its broad provider integration and support. Thus, to handle your custom metric code, you can implement a simple interface to send metrics through the OpenTelemetry receiver. Once this implementation is in place, you can easily switch to a new metrics display provider using the OpenTelemetry Collector and route your metrics to their new destination.
Leveraging the Collector: The Orb Observability Platform
Orb is an observability platform designed to enhance the experience of both operators and developers by providing comprehensive network observability through metrics. This means it helps teams monitor and understand the performance and health of their systems more effectively.
The platform utilizes MQTT (Message Queuing Telemetry Transport), a lightweight messaging protocol, to facilitate communication between devices at the edge (like sensors or IoT devices) and the cloud. Due to its lightweight nature and efficient use of bandwidth, it is ideal for resource-constrained environments. Additionally, it enhances security through features like mutual authentication and end-to-end encryption, ensuring secure data transmission between the devices and the cloud. The solution allows efficient data transmission by leveraging MQTT, which is crucial for the real-time monitoring and management of distributed systems.
Moreover, the platform's control plane employs policies and groups based on agent tags. This means that different agents (software components that collect and send data) can be assigned specific policies that dictate how they operate and what data they collect. By organizing agents into groups based on tags, Orb can manage a vast number of agents effectively, ensuring that each one adheres to the appropriate policies for data collection and reporting.
A key feature of Orb is its ability to work with OpenTelemetry, which is done by running an OpenTelemetry exporter and receiver within the org-agent. That way, the agent can use the receiver to fetch data, apply policies using its custom logic, and then forward it using OTLP over MQTT with its custom MQTT exporter. The reason for this integration is that if the data from the agent is in the OTLP format, it can be sent through various exporters available in the OpenTelemetry ecosystem. This flexibility allows Orb to leverage one of the most popular standards in telemetry and observability, ensuring that it remains relevant and effective in managing complex systems.
Key Takeaways
- Efficient analysis and correlation from each service can be achieved by centrally managing the throughput of logs, traces, and metrics and their correlation using OpenTelemetry. Observability tools like Jaeger, Prometheus, and OpenSearch leverage this and allow easy querying and visualization.
- The OpenTelemetry Collector is a powerful option for handling custom metrics and diverse telemetry needs. It allows companies to integrate their pipelines with many popular backend systems.
- The OpenTelemetry Collector extensibility is also a great feature and can be explored to implement advanced use cases, like supporting specific protocols. The Orb Observability Platform is built on that feature to enhance network observability using MQTT for efficient real-time data transmission between edge devices and the cloud. The Collector offers a vast number of integrations, like sending data to Jaeger and Prometheus, so the development effort to have an end-to-end solution is smaller.
Acknowledgement
This piece was written by Marcelo Mergulhão - Innovation Expert, Luiz Pegoraro - Systems Architect, and João Longo - Innovation Leader. Thanks to João Caleffi and André Scandaroli for reviews and insights.