The transformative power of taxonomies
Beyond the spreadsheet
This blog is second in a two-part series on the benefits of adopting standardized, structured data for financial reporting by public sector agencies. In the previous post, I wrote about how tagging your data makes it something that you can use and trust, making it more valuable than just something you can see. In this post, I will cover how individual tags combine to create a "taxonomy" (i.e., a type of structured data dictionary) that shows the relationships between the individual values in your report.
A taxonomy is similar to an index in a book. Consider each entry in the index a tag to information stored elsewhere in the book.
A taxonomy, in its simplest form, is a hierarchical arrangement of terms, with subentries more specialized perspectives on the parent entry. The invention of a book (as opposed to a scroll) permitted reference content by page number. Indexing is a method of organizing information based on the invention of the book. Think of a taxonomy like an index, and tags as entries into your data (rather than your book), organized hierarchically.
Tagging data against a taxonomy transforms the way you interact with your financial reporting data. The big deal of all this is to represent the relationships you know exist between values, and document these relationship, so that computer programs may leverage this knowledge. It is a description of how your numbers work without the gridlock of a spreadsheet.
Let's take something as simple as the fundamental assertion that Assets = Liabilities + Equity. With this asserted relationship, data that is tagged as Assets, in US dollars, as of Dec. 31, 2016 can be validated as "consistent" with the sum of the values reported for Liabilities and Equity, also in US dollars, and also measured as of that date. This kind of data quality verification—based on rules rather than cell addresses in a spreadsheet—should provide a glimpse of what is possible with data beyond the spreadsheet.
With taxonomies you:
- Reduce data errors.
- Extract greater meaning from data.
- Give people access to data values with context.
- Compare apples to apples i.e., prevent you from mistakenly comparing apples to oranges.
Data models and error reduction
Data models substantially reduce errors because you can create rules that apply to the model as a whole, rather than just to individual data points. It is the difference between copying a formula across 10,000 rows of a spreadsheet versus entering it once and having it automatically applied correctly in every case.
As another example, you may create a rule that all the sources of revenue from individual programs in a municipality must add up to the full amount of program revenue in that municipality. If the numbers don't add up, you can examine the numbers for keying or decimal errors, or you can check to make sure that all sources of program revenue have been collected. Imagine defining how numbers should foot and having your computer automate this check on data quality, rather than doing this by hand in a PDF document or defining lots of formulas, one at a time, in a spreadsheet.
Small accounting errors can cost small and local governments billions of dollars. Recently, California came under fire for an "almost $1.5-billion accounting error in California's healthcare program for the poor". This error was the result of two fairly simple mistakes—the revenue from one cost-saving program was counted twice, and the related financial report failed to count costs from two major counties. These accounting errors typically would have been caught by an interactive data system that validated internal data consistency. For example, a software-as-a-service (SaaS) system that uses XBRL (the syntax chosen by the SEC to represent in a machine-readable form the US GAAP accounting standard) could run a check to detect duplicate data or implement business rules for completeness of reported data (e.g., require accountants to enter information for all programs).
Such accounting errors not only hurt these governments financially, they also damage the relationship between a government and its constituents. This can lead to public mistrust of the government's use of tax dollars. Public mistrust is then compounded by the lack of timely, publicly available, machine-readable data on government spending.
Modeling information and making meaning
A taxonomy shows the relationships between individual pieces of data and the system as a whole, and it allows us to move from individual pieces of data to data sets. These data sets can then be used to create statistical models and interactive visualizations of the data. Information models allow humans to interact with data in a more meaningful way, to identify trends and outliers in the data and to understand its relationship to the everyday functioning of our institutions.
We talk about the importance of transparency in data, but what is important here is what cannot be seen—the model that works behind the scenes, creating meaning and structure in your data.
The relationship between an information model and the data set behind it is similar to the relationship between a website and the code that makes it function. A website is built to highlight its content—it is structured for communication and clarity, and often for enjoyment. Behind the scenes, the code works quietly away, structuring everything on the screen. We don't want to see the code; rather, we just want the code to disappear so that we can view the content. Similarly, a well-built information model, such as a taxonomy for a balance sheet, can be hidden from sight, structuring your data so that it can interact with both humans and machines.
Anyone who has sat through a meeting where someone displays spreadsheet pivot tables for an hour understands the mind-numbing effect of scanning lists of numbers and trying to make meaning out of them. As humans, we understand our world through stories, not lists, and data needs to be presented in ways that tell those stories. Moreover, anyone should have easy, quick access to that storytelling power; it shouldn't just be the purview of statisticians and data scientists.
Interactive data closes the gap between humans and computers, allowing people to see how data relates to their everyday lives. In recent interviews about what he learned from creating USAFacts.org, Steve Ballmer talked how data allowed him to understand the effects of government in his everyday life. He noted that many people think of government employees as "Those damn bureaucrats," but looking at the data allowed him to see that most of the 24 million government employees are teachers and professors, active-duty military personnel, and people who work in public hospitals. Data analysis allowed him to understand that these faceless bureaucrats were actually indispensable members of his neighborhood and, as he says, "most of these people you like".
We think of data as something abstract and cold, but interactive data models help us to tell stories that help us to better understand our communities and to trust the government institutions who work to support them. If PDFs and spreadsheets are a series of cubicles, stretching out under a flickering florescent light, then interactive data models are more like a bustling farmer's market. You might haggle with one vendor by comparing the prices paid at another booth. You might talk to another vendor about his soil and another about her labor practices. New connections are formed. Data emerges from its silo and begins to speak.
Reusing data models
Though building a taxonomy does require some expertise and investment on the front end, once that model is created, you can reuse it for each subsequent disclosures and reports, thereby streamlining the process. Since the SEC has already kick-started the move towards interactive data reporting, you don't need to create (or recreate) this process yourself; rather, you can partner with other people who are already creating the necessary taxonomic systems.
Once the first taxonomy is created, institutions can begin building on it, creating a system that is both more trustworthy and more complete. Here is one example: Some of the annual reports produced by utility companies for the Federal Energy Regulatory Commission (FERC) include a lot of financial data that already exists in a utility's annual report. In cases where the utility is a public company, it already captures this information as interactive data based on a US GAAP financial reporting taxonomy. As FERC moves to interactive data (a project already under way) the Commission could build upon the US GAAP taxonomy that already captures that data. Reuse creates efficiency and reinforces the value of the data because it becomes obvious that it's of interests to more than just the original report recipient.
Just as in any other engineering problem, precedent and experience creates a more robust system. The first time that you build a bridge, it might just be a log across a stream. You might then improve upon that system by using a flat log to make it easier to walk on it or by anchoring each end. If you record those improvements, each generation can improve upon them, using and building on your bridge design to build stronger bridges that span larger rivers. That original design engenders the designs that follow. The taxonomies that we build now we allow us to create a stronger and more fully descriptive interactive data systems.
Data comparisons and market leverage
Though institutions can create idiosyncratic tags within a taxonomic system, the goal is to fit all of the data into a meaningful organizational system (a taxonomy) that corresponds to the system used by other people in the same field. Using the same taxonomy allows you to make meaningful comparisons between your data and the data of your competitors and peers.
Publicly accessible interactive data models allow institutions to make informed decisions and to have leverage in their markets. For example, when a municipality goes to an underwriter to negotiate fees to underwrite a municipal bond, it would be useful for them to know what other municipalities of the same size and circumstances have paid. This information is even more useful if they can access that information quickly and in a format that is easy to understand. If municipalities were consumers of financial data from other municipalities, they could discover their respective bond issuance costs, arming them with data to use as they negotiate fees with underwriters. This is similar to the process of shopping for a used car. If you know what what kinds of deals other people received for the same car in your region, you can negotiate with the salesperson. Access to information gives you market advantage and leverage. Governments can use data to increase resource efficiency. A job well done is a job to be proud of.
When there is data asymmetry, those who have access to more data and the resources to analyze that data, whether they be car salespeople or large underwriting firms, have the market advantage. But if you create a common taxonomy and publicly available interactive data, the whole data ecosystem benefits—including you.
Data models create the foundation on which we build our analysis systems. They help us to describe the world and decide what should happen next. It's how we move from measurement to performance. What's not to love?