OpenCost Compatibility Patch for Red Hat OpenShift

February 12, 2025
Tags:
Kubernetes
OpenCost
Red Hat OpenShift

OpenCost is a CNCF-hosted open-source project designed to measure cost allocation and efficiency in Kubernetes clusters. By leveraging Prometheus metrics, it provides cost breakdowns at multiple levels—namespaces, containers, pods, nodes, and labels—helping teams track expenses and optimize resource allocation. Randoli App Insights integrates OpenCost to provide a multi-cluster view of cost efficiency, chargebacks, and workload cost optimizations.

However, OpenCost recently faced a major compatibility issue on OpenShift due to changes in how the platform's managed Prometheus instance handles metric labels. This article explores why OpenCost stopped working on OpenShift, the technical challenges involved, and how we developed a patch to resolve them.

The Challenge – OpenCost not working on OpenShift

OpenCost worked fine on earlier versions of OpenShift, but recent updates to OpenShift’s managed Prometheus instance introduced a breaking change. The issue arose from immutable labels used in OpenCost’s Prometheus queries.

In OpenShift’s new Prometheus setup, certain metric labels were automatically modified:

  • "namespace" → "exported_namespace"
  • "pod" → "exported_pod"

Since OpenCost relies on immutable label names to fetch cost data, this unexpected modification caused all cost calculations to fail, preventing OpenCost from functioning correctly on OpenShift.

This isn’t the first time this issue has been encountered—Google Cloud’s managed Prometheus instances have similar label modifications. The typical workaround is setting honorLabels: true in Prometheus, which prevents label rewriting. However, OpenShift explicitly disables this option for user workload metrics scraped via PodMonitors and ServiceMonitors, making that fix impossible in this case.

The Solution – Making OpenCost Compatible with OpenShift

Since OpenCost didn’t originally support custom label configurations, we had to modify its codebase to introduce configurable label mappings. This ensured that OpenCost could adapt to different Prometheus setups, making it flexible beyond just OpenShift.

Step 1: Adding Configurable Label Support

To solve the issue, we introduced configurable environment variables in OpenCost, allowing users to define the label names used for cost queries. If no custom values are provided, OpenCost falls back to its default label names.

The changes were implemented as follows:

func GetPromNamespaceLabel() string { 
    return env.Get(PromNamespaceLabelEnvVar, "namespace") 
}

func GetPromPodLabel() string { 
    return env.Get(PromPodLabelEnvVar, "pod") 
}

func GetPromContainerLabel() string { 
    return env.Get(PromContainerLabelEnvVar, "container") 
}

This approach ensures that OpenCost can now handle modified labels in any managed Prometheus environment, not just OpenShift.

Step 2: Modifying OpenCost’s Query Logic

Once we introduced custom label support, we modified all Prometheus queries in OpenCost to dynamically use the configured labels.

For example, instead of hardcoding "pod" in queries, we changed the logic to first attempt using the configured label and fall back to the default if necessary:

podName, err := res.GetString(env.GetPromPodLabel()) 
if err != nil { 
    podName, err = res.GetString("pod") 
    if err != nil { 
        log.Warnf("CostModel.ComputeAllocation: missing field: %s", err)
        continue 
    } 
}

This ensured that OpenCost could handle different Prometheus setups without breaking existing functionality.

Step 3: Handling OpenShift’s Multi-Prometheus Architecture

Another challenge was OpenShift’s dual Prometheus instance setup:

  1. Core workload metrics (for OpenShift system services)
  2. User workload metrics (for customer applications)

OpenShift aggregates data from both instances into a Thanos Querier endpoint. However, only the user workload Prometheus instance applied modified labels, while the core Prometheus instance retained default labels.

This meant that simply replacing label names wasn’t enough—we had to modify our queries to support both the default labels and the configurable ones simultaneously, ensuring OpenCost could correctly aggregate costs from both Prometheus instances.

queryFmtRAMBytesAllocated = 
`avg(avg_over_time(container_memory_allocation_bytes{container!="", container!="POD", 
node!="", %s}[%s])) by (container, %s, pod, %s, namespace, %s, node, %s, provider_id)`

By doing this, cost queries could now retrieve metrics from both core and user workload Prometheus instances, solving the final compatibility issue.

Future Improvements & Next Steps

This patch successfully restores OpenCost functionality on OpenShift without requiring any changes to OpenShift’s managed Prometheus setup. However, there are a few ways we can further improve OpenCost’s label-handling logic:

1. Expanding Custom Label Support Beyond the Allocation API

One key enhancement is expanding custom label support beyond just the OpenCost’s Allocation API. Currently, the patch ensures that cost calculations remain accurate within this API, but other OpenCost APIs still rely on hardcoded label structures. 

Applying a similar approach across all APIs would allow OpenCost to function seamlessly in various managed Prometheus environments without requiring further modifications.

2. Making Additional Labels Configurable

Another area for improvement is increasing configurability for additional labels beyond just namespace, pod, and container. These three were the primary focus because they were essential for resolving the OpenShift-specific issue, but OpenCost still uses other fixed labels that might also be affected by managed Prometheus modifications. Expanding label customization would provide greater flexibility, ensuring OpenCost remains resilient across different Kubernetes setups. 

This idea has already been discussed in OpenCost community meetings, indicating that a broader label-handling overhaul could be a valuable next step.

3. Testing in Other Managed Kubernetes Setups

Further testing in other managed Kubernetes environments like Google Kubernetes Engine (GKE), Amazon EKS, and Azure AKS would also help validate the flexibility of this patch. Since platforms like GKE modify metric labels in a manner similar to OpenShift, extending testing beyond OpenShift would confirm whether this solution can be applied more broadly. 

If successful, it would reinforce OpenCost’s ability to work across various managed Kubernetes monitoring configurations, making it an even more robust cost monitoring tool.

Final Thoughts

By adding custom label support and modifying OpenCost’s Prometheus queries, we successfully restored OpenCost functionality on OpenShift’s managed Prometheus setup. This patch not only fixes the issue but also makes OpenCost more adaptable to different Kubernetes environments.

Looking ahead, there’s potential to expand this approach across OpenCost’s entire codebase, making it even more configurable and resilient to platform-specific variations.

Check out the full patch and contribute to the discussion!

The changes discussed in this blog have been implemented in OpenCost’s codebase. You can view the pull request. If you're working with OpenShift or other managed Kubernetes environments, we’d love your feedback! 

Feel free to join the discussion, test the changes, or contribute improvements.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Rosa Lopes
Linked In

Receive blog & product updates

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.