Clean data plays a vital role in businesses’ decision making processes, as it helps organizations save time and improve efficiency.
Data cleansing or data scrubbing is the process of detecting and correcting or removing corrupt or inaccurate records from a record set, table, or database. The inconsistencies detected or removed may have been originally caused by user entry errors, corruption in transmission or storage, or different data definitions from similar entities in different stores. After cleansing, a data set should be consistent with other similar data sets in a system.
“Garbage in, garbage out” is a famous phrase in the world of IT that describes how it is impossible to expect a good output (such as a sound trading decision) from bad input (inaccurate data). One thing is for sure: bad data is definitely costly!
One significant example of the high cost of bad data is the trading loss incurred in 2012 by JP Morgan’s Chief Investment Office in London—an event known as the “London Whale.” In April and May of this year, a series of derivatives transactions involving credit default swaps (CDS) were entered into JP Morgan’s risk management systems, but reported as part of the bank’s “hedging strategy.” Due to a lack of internal control and monitoring systems, this data entry error accumulated outsized CDS positions in the market, causing an estimated trading loss of $2 billion or more. On the company’s emergency conference call, JP Morgan CEO Jamie Dimon said the data management strategy was “flawed, complex, poorly reviewed, poorly executed, and poorly monitored.” Bad data causes unimaginable repercussions.
ZEMA not only collects data—it takes charge of all the steps involved in enterprise data management, from data mining and validation to data integration with users’ preferred downstream systems. ZEMA’s metadata and validation functionalities ensure that users working with anomalous data receive correct information with ease. Key ZEMA functionalities that facilitate this process are discussed below (Figure 1).
Figure 1: ZEMA Solution–Data
ZEMA possesses a centralization tool used by a wide range of industry experts to ensure that raw data flows directly from its source to a designated database, where it can be analyzed further. ZEMA also has scheduling and collecting capacities. Users can schedule parsers to collect data at any time, in any granularity. This process is fully automated, and users will be notified by JMS messages or email alerts if any errors arise during the data collection process. ZEMA is also capable of collecting information in any electronic data formats, including XML, HTML, PDFs, or stored in FTPs or on a cloud server. ZE continuously monitors over 400 data vendors with more than 4,000 data feeds, all of which can be supplied to ZEMA clients. All data ZEMA collects is then stored in one centralized database.
Next, ZEMA validates data, providing visual cues that make potential errors easy to locate and correct. ZEMA checks all incoming data for:
- Completeness – Is all of the data being collected?
- Correctness – Are the data values within acceptable boundaries?
- Timeliness – Is the data arriving on time?
- Newness – Is the data different than the previous day?
ZEMA also shows which data sets have passed or failed evaluation, as well as data sets that are awaiting evaluation.
Users may also configure a set of rules which can then be applied to preferred data reports, either in real time or on a certain schedule. If a report doesn’t come in during the scheduled time range, ZEMA will send a notification to the assigned users. In this case, end-of-day processes and other business inquiries can be fulfilled in a timely, efficient manner.
Finally, ZEMA possesses an entitlement management tool. Administrators can grant each individual user the ability to view a specific report or use certain ZEMA tools. This prevents unauthorized users from gaining access to the ZEMA database and changing data points or curves. This ZEMA functionality behaves like a steward, keeping the ZEMA database clean and organized.
In a nutshell, ZEMA’s data management tools are designed to provide clean data to traders and market participants. ZEMA’s philosophy revolves around keeping data timely, accurate, and complete, so users can make better decisions and stay informed of market changes.
 Dan Fitzpatrick, Gregory Zuckerman, and Liz Rappaport, “J.P. Morgan’s $2 Billion Blunder,” The Wall Street Journal, May 11, 2012, accessed August 1, 2014, http://online.wsj.com/news/articles/SB10001424052702304070304577396511420792008.