Central data repository model: How to proactively prevent data loss

With the proliferation of Internet databases and the frequency of data loss, it seems inevitable that everyone's private data will be compromised at one point or another. But is there a way to prevent this from happening? In this inventive tip, Gary Brown describes a possible central repository database designed to keep private data in the right hands.

In recent years, there have been a number of public security breaches in the U.K. involving personal information.

More on data breaches

January 2009: Recruitment giant Monster said hackers stole confidential database information, including usernames, passwords, telephone numbers, email addresses and demographic data.

August 2008: A memory stick containing sensitive information regarding more than 100,000 criminals was lost.

April 2008: Information Commissioner Richard Thomas reported that since November 2007, he had been informed of dozens of data breaches, including 62 in the public sector, 28 in the private sector and four associated with charities.

March 2008: Payroll details for 180 NHS staff were lost, only to be found by a member of the public and turned over to police.

December 2007: Nine NHS trusts in England admitted losing records affecting at least 168,000 patients.January 2008: A Royal Navy officer's laptop containing the details of 600,000 military personnel and recruits was stolen from a car.

November 2007: Two CDs with confidential information of around 25 million people; virtually every family in the country with children under the age of 16, were lost. 

The losses are staggering, and only represent incidents currently known to the U.K. government. Although there are a number of culprits contributing to the number of data breaches, I believe the primary cause may be that enterprises store personal information in many databases, resulting in the need to continually transfer information between departments and companies. From a conceptual perspective, if a single database existed with all personal and company information, then far less personal data would need to be stored on computers or passed between departments and organisations; only public references to individuals or companies involved would need to be shared.

However, in such a scenario it would be crucial to ensure that access to this central database is strictly controlled to prevent data loss.

Building a central repository using digital signature technology
The first step in enabling such a central data repository is to create a network of distributed databases that each contain the virtual representations of each entity (i.e. individual, company, department, social group, etc.) and are associated with a particular country or region. Each virtual representation will have a unique identifier (similar to a URL for a website) and the following structure:


  • Each virtual representation will be associated with information, ranging from simple data such as name, address, date of birth, etc., to more complex structured documents, such as medical or employment records. Each piece of information is protected by a set of access control rules. The rules will determine which privileges, such as the ability to read, update or delete information, are granted to an authenticated third party attempting to gain access.


  • Similarly, relationships between one virtual representation and another (e.g. individual X "works for" company Y) are protected by rules that govern who can read, update and delete them.


  • Third, 'create' rules are used to automatically determine if authenticated users can associate new information or relationships and accompanying access control rules with the virtual representation. If a suitable 'create' rule is not found that can automatically approve the association of the information/relationship, then the request could be submitted to a manual approval process, where the entity that owns the virtual representation will be informed that another user wishes to associate new information or a new relationship with his or her virtual representation, allowing the owner to approve or deny the request.

    Hacker gadgets make data leak prevention more difficult

    Bob Lewis helped to convict some of the world's most serious cybercriminals, but he admits new gadgets and technologies are making it too easy for anyone to steal information.

    Get the latest news and expert advice on data protection.

    Each entity in the database would have a digital signature as a means to authenticate itself when accessing the central repository or any system. The use of digital signatures, for instance, would ensure that customers or end users could create, access and update their information securely and govern subsequent access by others to such data or relationships.

    Having the ability for one entity to associate information with another entity, and specify rules that govern subsequent access to that information, provides a replacement mechanism for the multitude of existing proprietary databases.

    Let's consider two more specific, illustrative examples:

    1) For an organization with a website, rather than having a local database recording users' profile data, it would associate any additional 'website-specific' information with the user's virtual representation in the central data repository. For example, when the user accesses the Acme Corp. website, its digital signature will identify the user without him or her having to log into the site specifically. Acme's system would then access the central repository to retrieve the information associated with the user's virtual representation. The information will be protected by access control rules that ensure it can only be read by Acme and updated by the user.

    2) Medical information can be associated with an individual's virtual representation. Unlike the website example, the access control rules associated with medical information would prevent the individual from reading, updating or deleting his or her own records. (Although information is associated with an individual's virtual representation, it does not necessarily imply that he or she has the right to view or change the information.)


    What do you think?

    Could this work? Send us your thoughts on data protection solutions.

    Considering access rules based on digital signatures
    The problem with using the 'pure' digital signature approach is that the access control mechanism, which governs access to individuals' protected information in the central data repository, can only be based on information contained in the digital signature of the requester, such as his or her identity.

    This approach is sufficient if the potential group of entities accessing information is small, as in the website example above. In this example, only the user and Acme will have the right to access information, and therefore the access control rules have to authenticate both parties.

    However, for the medical information example, the rules governing who may access and update information are more complicated. It is not a case of identifying the requester based on identity. Instead, it may be necessary to distinguish whether the requester is a doctor or medical assistant. This type of access control rule could not be implemented based on digital signature identity.

    Another problem with using digital signatures as the source of information in access control rules is that the information is static; in other words, it only gets updated when the certificate is renewed. Access control rules may need to be based upon the most up-to-date information possible, e.g. whether a doctor currently has a license to practice.

    So it's clear that digital signatures alone would not be sufficient to facilitate a secure central repository for sensitive data. The approach is not scalable to enable access by wide-ranging groups or entities and certainly not in situations where access needs to be based on dynamic information about an entity, as opposed to someone's actual identity. However, such a secure central repository is possible with in-depth access control rules. In my next tip, I will discuss the creation and implementation of these rules for virtual representations.

    About the author:
    Gary Brown has a PhD in Computer Science, and has worked in the IT industry for over 18 years in the telecoms and financial service sectors..

    [1] Nov 07 Sky News "Fraud Risk To Millions After 'Catastrophic' Records Blunder"
    [2] Dec 07 Times Online UK "More personal data lost as nine NHS trusts admit security breaches"
    [3] Jan 08 Sky News "Des Browne: Two Further Laptops With Similar Data Lost"
    [4] March 08 Sky News "Lost Details Of 180 NHS Staff Found"
    [5] Apr 08 Sky News "Fresh Warning Over Lost Data"
    [6] Aug 08 Times Online UK "Thousands of criminal files lost in data fiasco"

Read more on Application security and coding requirements