I was recently reading IDC’s latest update to its groundbreaking research which measures the amount of digital information created and replicated (see: "The Diverse and Exploding Digital Universe, An Updated Forecast of Worldwide Information Growth Through 2011").

The updated study reports that – from 2007 through 2011 – a larger percentage of corporate information will “be subject to significant requirements” for information protection (growing from approximately 30% in 2007 to over 40% in 2011).

Protecting information is becoming a top management concern because the penalties associated with the mismanagement of sensitive data can be significant. It can include such things as: damage to shareholder value, brand damage, damage to customer loyalty, damage to employee relations as well as potential legal liability and possible fines.

As I wrote in another blog posting, information must be classified in order to be managed and, therefore, protected. The industry thought-leaders have identified classification as being the “secret sauce” of Information Management. In fact, Gartner wrote an article back in June 2007 on this very subject (see: "Data Classification Is a Vital First Step in Information Life Cycle Management").

It sounds so easy. But is it? Well, it is if you have a handful of emails, documents and databases containing sensitive information. But it’s quite different when you start thinking about tens of millions or even billions of documents that are stored in heterogeneous systems throughout the world.Unfortunately, our anecdotal evidence (based upon informal customer interviews) suggests that most companies do not know what they have or where they have it (which is the necessary ingredient for information classification).

If you think about it, this is rather amazing. It would be one thing if the information was worthless. But this information could represent a company’s most valuable asset. A 2005 study suggests that the “value of the information contained within documents created each year represents about a third of total gross domestic product, or an amount of about $3.3 trillion” (see: Untapped Assets: The $3 Trillion Value of U.S. Enterprise Documents). We could argue the amount. But clearly, this information is valuable. Think about it. Would a well-managed company ever consider not deploying an asset management system to track its tangible assets (including real estate, buildings, automobiles, inventory, information technology, etc.)? Of course not.

But what does it mean to classify information? How do you do it? Typically, we want to classify information based upon its most effective and efficient use. Including:

  • Its business value – is this a proprietary engineering document or an invitation to next week’s holiday party

  • Its confidentiality – how sensitivity is the data? Is this a top secret document and accessible only to the Executive Team or can the public view this document via our web site?

  • Its criticality – what is the relative importance of maintaining the integrity and availability of this data?


Beyond that, we could also classify information for retention and disposition. We also could classify information based upon its regulatory impact. The list goes on and on.

So, what is the process to classify information? Ideally, each document owner would classify their own information. It certainly would make the world an easier place if individuals could manually “mark” something as containing sensitive information or as being relevant to HIPAA or SOX (or whatever regulation is pertinent to the organization). A lot of companies have tried and failed in this approach. Why? First, it is difficult to maintain (i.e. what is not sensitive today, may be become sensitive tomorrow). Second, people don’t want to take the time to do it.

So, we need to automate classification. As best we can. How? Well, we need some sort of inference engine. That can look at both a document’s attributes (what it is named, where it is located, who it is owned by, etc.) as well as its content. That could use customized rules (based upon corporate policy) to set its classification.

Let’s look at a sample classification that could be facilitated using such an inference engine. Say a financial institution has some documents which reside on their New Jersey server that were authored by brokers. Because these documents are under United States jurisdiction (that is, they physically reside in the United States) and because these documents contain the names, social security numbers and addresses of its customers (that is, they contain content that matches a customer record stored in the customer database) and because these were created using a standardized template (that is, a request to sell shares of a particular stock), we can classify these as being “sensitive” and “relevant to SEC Rule 17a-3, a-4.”

Once we have classified all of our information, we can now confidently claim that we know what we have and where we have it. We are then able to apply policies to manage that information. We have become a classified enterprise.



Post a Comment

0 comments: