Tuesday, November 11, 2008

The Information Explosion

Who would have imagined it? Back in the early 1990s, before the first usable version of Microsoft Word (Windows 3.0 first shipped in 1990 over 18 years ago), most of us relied on pen and paper for our daily communication and record keeping.

Oh, we might have used an email system back then. At Hewlett Packard, I remember sending emails using its proprietary HP Desk mail system back in the mid 1980s. However, at the time, the majority of office workers relied on the tried and true.

Then, the world changed. The commercial success of the Internet along with the availability of easy-to-use authoring tools led to a sea change and a corresponding information explosion. This information explosion is so dramatic – corporations are literally drowning in it.

Back in 2006, IDC conducted an exhaustive study (Source: The Expanding Digital Universe, IDC, March 2007) and forecasted, between 2006 through 2010, a 57% growth rate year over year in the amount of information created, captured and replicated.

So, where is all this information coming from? The Enterprise Strategy Group estimates that between 80-85% of all business data is unstructured (Source: Extending Discovery to All Corporate Information, Enterprise Strategy Group, December 2007).

But what is unstructured data? Well, this includes Emails, Reports, User Files (Documents, Spreadsheets, PPTs, PDFs), Images, Video, HTML/XML, MP3, etc.

In the book “Tapping into unstructured data: Integrating unstructured data and structural analytics into business intelligence” (Bill Inmon and Anthony Nesavich, Prentice Hall, 2008), the authors describe the various types of unstructured data created by various departments in a corporation including: Accounting, Call Center, Engineering, Finance, Human Resources, Legal, Marketing, Sales, Shipping and Operations. Everyone is contributing to the challenge.

The Challenges of Unbridled Information Growth
Obviously, there are numerous challenges with this unbridled growth of information. These include:

Factor #1: Information must be stored
The more we have, the more storage is required. This need for more storage opened up tremendous opportunities for storage vendors as customers sought to purchase more and more equipment. The storage industry introduced the moniker Information Lifecycle Management to provide more cost effective ways to deal with this growth. The storage industry also introduced the concept of tiered storage to allow companies to better manage this growth along various dimensions: price, performance, capacity and function. Initially, the storage factor was the most prominent impact of this growth. Quickly, as the cost of storage declined, its importance became dwarfed by other factors.

Factor #2: Information can be sensitive and needs to be protected
As companies created more and more information, the importance of protecting that information and ensuring the proper level of access became more apparent. While it sounds easy (i.e. making sure the right people have access to the right information), it can be easier said then done. But the costs of not securing data can be astounding. Because their electronic information was not properly and consistently secured, companies suffered:

  • Hefty fines underPCI, SOX and HIPAA for breaches and noncompliance

  • Bad PR and damage to the corporate brand due to the need to publicly disclose privacy breaches

  • Outright IP theft where trade secrets and proprietary information could fall into the hands of a competitor and materially damage the company’s business prospects


Factor #3: Information must be preserved for regulatory reasons
Every company is governed by a set of regulations that that govern the length of time that information must be stored. There are a slew of regulations that govern information retention. The more familiar of these include:


  • Health Insurance Portability and Accountability Act (HIPAA) of 1996

  • Sarbanes-Oxley Act of 2002

  • SEC Rule 17a-3, a-4

There are countless more. Some industries (e.g. Pharmaceutical, Finance, etc.) are more regulated than others. And, of course, with the recent Credit Crisis, we expect the number of regulations to skyrocket in the coming years.In the good old days, retaining this information was simple. We simply put everything in a box and placed that box in a warehouse for however long. It’s a bit more challenging with electronic information. Especially given the growth.


Factor #4: Information is subject to electronic discovery


A critical event occurred in December 2006. The Federal Rules of Civil Procedure (FRCP) – the rules governing civil procedure in United States district (federal) courts (that is, court procedures for civil suits) – was amended to outline how electronic documents can be used to support litigation proceedings, as well as how electronic documents should be handled to support litigation search and discovery.


Essentially, this means that all information is discoverable. Well, that’s a problem. Who wants to pay an attorney $400/hour to perform discover across all of their information? It simply isn’t practical. As a result, companies are not only required to keep information for a particular period of time (for regulatory purposes), but also are incentivized to get rid of it as soon as possible.


The Challenges of Managing Unbridled Information Growth


So – where does this leave us? We have too much information today. We are creating new information at an alarming pace. Some of this information needs to be protected because it contains sensitive information. Some of this information needs to be retained for certain period of time due to regulatory constraints. All of this information is discoverable.

I attended last month’s ARMA(Association of Records Managers and Administrators) conference in Las Vegas to get more perspective on the information conundrum. This seemed the logical place – after all – records managers have had to deal with the management of information for many years – initially in physical form and more recently in electronic form.

The mantra for Records Managers is simple. We need to know what we have and where we have it. We need to make certain only the right people have access to the information. We need to know what to keep and keep it as long as we have to. We need to get rid of everything else. We simply need to set up policies across the enterprise. And, enforce them.

It sounds so simple. But is it? Do we know what we have? Do we know where we have it?

Unfortunately, it is easier said than done. The information we create is vast. It is stored in heterogeneous formats, throughout the world.

I spoke with one Records Manager of a mid-sized company who told me – yes – I know what we have. I have two file shares in Des Moines with my finance, marketing and sales files. I have a user share in our corporate office with personal files. At corporate, I also have my web farm. I have an Exchange Server, 1 Personnel Database, 1 Accounting Database, 1 Documentum System. Oh – and 12 SharePoint sites.

You get the picture. This is a problem. Her information is everywhere. How does she manage her distributed information across heterogeneous systems? How does she set up consistent policies for ensuring the right access, for ensuring that things are retained? For ensuring that I get rid of what she does not need? She does not. Her company could be out of compliance. And is at risk of being heavily fined. The problem, of course, is even worse for larger companies who literally have Petabytes of information stored everywhere.

One Certified Records Manager I spoke with likes to categorize information as follows:


He explained, “The problem is that a single, universal system for managing information does not exist.” I visited vendors, both big and small, and confirmed what every Records Manager has known for quite some time. That being able to effectively manage information according to the Records Management mantra is truly a Herculean task.

Managing Information in a Cloud

Classification will be a key enabler to solving the problem of unbridled information growth. Once information is Classified, it can be effectively Managed to address the concerns associated with sensitivity, retention and destruction. This Classification and Management requires a sophisticated Policy Engine. That can support different Policies for different information sources (databases, email systems, etc.). That can support different Policies for different regional regulations. That is flexible enough to deal not only with today’s regulations, but also with future ones.

Classification, in and of itself, is not sufficient to deal with the scale of the information in the average enterprise – which is geographically distributed across heterogeneous systems. A centralized information management scheme will not and cannot scale.

Because of this, we see Enterprise Information Management as being the first enterprise application that requires a form of cloud computing. So that all information can be managed. Everywhere.

What’s Next


As we proceed into 2009, only a couple things are certain: companies will create more information; government will create more regulation. How will you manage your information?




Post a Comment

0 comments: