Workiva takes observability to a new level
Workiva developer Steven Osbourne shares his views on taking observability to new level in the New Relic blog shared on MarketScreener, June 14, 2021.
At Workiva, data is the core of everything we do. Our platforms and applications must be up and performing for Workiva customers to get the benefits and value they expect from our business data management and reporting solutions. Ensuring that we meet that objective starts deep within our platform operations organization where we are responsible for delivering the applications our customers depend on every day.
Nearly every service and application in our portfolio now runs on AWS, and we've built a lot of tooling to help our engineers interface with AWS more easily. We had a homegrown tool for synthetic monitoring, tracing logic lived in Google Stackdriver, and we used a third-party tool for metrics monitoring. However, we wanted to further simplify our toolset so our engineers could get a more complete picture across all our systems.
So, about 18 months ago I formed an observability team with the goal of adopting a unified platform for observability. We knew we couldn't just give people a new tool and, like magic, they would become good at observability. That's why having a dedicated observability team was important to guide the adoption process, help people work through changes in the way we monitor and report, and make sure we realize the full value of our investment. This approach worked out well for us.
We selected New Relic One as our observability platform. We like that it provides everything in one place, including traces, application performance metrics, and more. Plus, it allows us to build custom apps and extend its capabilities to fit our unique needs. This was important for successful cross-team adoption.
Going beyond dashboards
We still rely on dashboards, but by building apps with New Relic One, we can get more granular views with a lot less time and effort than manually building dashboards. For example, we built an app that provides a very granular view of service availability, which we use every morning at standup to ensure we are meeting our SLO thresholds.
We have another app to view details on AWS resources and one for Kubernetes resources that allows engineers to see any service events an application has gone through. We also roll this data into a separate view in New Relic that feeds into the KPI reports we provide our executive leadership, so they can see the overall health of the platform.
These SLO apps are helpful to our internal teams and align with our overall goal of increasing uptime for our customers. The SLO app enables us to quickly identify any issues a host may have, so we can investigate and quickly resolve the problem. Additionally, we used New Relic to build several smoke tests, which monitor our production environments for a number of different critical features.
We're now working on feeding this information into a system that will automatically alert our on-call engineers if something went wrong. This will help us get ahead of the issue and maintain uptime for our customers.
Increasing operational efficiency and reducing toil
As we continue to deliver new tools to our teams, we will rely heavily on New Relic One's programmable platform. We're exploring a number of additional apps, including New Relic One Cloud Optimize, which tracks infrastructure costs.
I'm also planning to leverage New Relic One to build more maturity into our CI/CD pipeline. We've integrated the New Relic Terraform Provider into pretty much everything we're building. For example, if we have seven different environments that we run, a developer wants to have the exact same dashboard across all seven environments. With the New Relic Terraform Provider, we're able to iterate through several different versions, just changing a single variable.
Additionally, we've started working on a testing suite built in New Relic One that exercises core functionality of the Workiva Platform. These are business-critical tests and require proper checks and balances, so everyone from R&D to our customer support team and senior leadership has assurance of the integrity of the process.
We continue to ramp up adoption of New Relic One across our organization. Looking ahead, I see opportunities to increase efficiency for our customer experience by extending New Relic One and going beyond its core capabilities.