Internal Data Management: The Issue of Generating Too Much Data
When I’m speaking with companies about their data management challenges, I usually find that they have two major challenges: internal and external. The external challenge has to do with collection, validation, and normalization of all different kinds of market data. For many companies, this includes commodity prices, weather, forward curves, and other market information. All of this data needs to be collected at different intervals, is found in different formats, and needs to be normalized for analysis and integration.
The other data challenge is internal data. For many companies this comes after tackling the market data challenge. It is the result of capturing and analyzing hundreds of market data feeds. Each model might create millions of data points each time its run – all of which need to be stored and managed for future use and collaboration amongst business units.
The Creation Problem
The first major challenge of internal data is the issue of creating it. Typically this is done in a variety of tools. It might include models built in Matlab, Microsoft Excel, VBA or any other Business Intelligence tool. These models range in robustness but many companies find that much of their critical intellectual property is being created in fragile spreadsheets (often where the original creators are no longer with the company or of the added risk that only one person knows who the file was created).
Data creation has a very high cost but it’s amazing how many businesses aren’t in complete control of this process. Analysts in every department are creating both scheduled and ad-hoc analysis with results that can be best used if they’re stored for future use. This problem of how to control the creation of data is replicated across teams, departments, divisions, and regions of a company. Further, these data points might be used to make multi-million dollar trades and decisions.
Taking Control of Data Creation
So how can companies take control of this process? It all starts with how the analysis is performed. It needs to be performed on a “golden copy” of data and the analysis needs to be able to handle changing date ranges and understand business logic, such as business days, exchange holidays, and peak and off-peak hours. Further to that, data series must be easy to align over different granularities, have the ability to interpolate, and understand multiple contract methods of analyzing forward contracts. All of this can be very easily accomplished through our data management software solution, the ZEMA Suite, and its Market Analyzer product.
Once you have control of your analysis, the challenge is to create – and recreate – the results in an automated controlled fashion. The control over this process needs to come in the form of validation of new data, ability to substitute data sources, and ability to control the data generation process. Curve Manager is a workflow automation tool that handles this complex data creation process. All of the data created through Curve Manager has a complete audit trail, can be scheduled on a time or event driven basis, and can be written to a database or packaged for a downstream system.
For a full walkthrough of this process, reach out to us for a demonstration.