I often use a search engine to explore and review my clients' websites and check if anything is untoward. The other week I cam e across a report on one client's site that was obviously intended for internal consumption only. I immediately rang my client to warn them. They explained that the report had been posted to the public website instead of the internal intranet by mistake and they'd removed it as soon as the error was discovered. Obviously they were quite alarmed that I could still access it more than a week later.
This is a great example of the all-consuming nature of Web searches, Google searches in particular. Google takes a snapshot of each page its search crawlers examine and caches it as a backup. It's also the version used to judge if a page is a good match for a query. My client's report was only on the Web for about three hours and yet a copy of it ended up stored in Google's cache and was still available for anyone to read. The fact that sensitive information that gets crawled can remain in the public domain means data classification and content change processes are vital to prevent this type of data leakage from occurring.
Unfortunately, private or sensitive business information makes its way onto the public Internet all too often. In this tip, we'll discuss reasons why this happens, and some strategies to help enterprises keep private or sensitive data off the Web.
Problems that can cause website information leaks
The incident noted above gave me the opportunity to address with my client some specific information security problems that led to the report being posted on its website. The first problem was that the organization didn't properly classify its data and documents. Implementing a system of data classification and clearly labelling documents with that classification would make such an incident far less likely.
The second point I discussed with the client was the poor implementation of its change-control policy. It does have a change-control process for updating its websites, but there are no checks or balances in place to counter the problem of people cutting corners when tasks are regularly repeated.
It's for this reason that airline pilots run through a printed checklist each time they prepare for takeoff, with a full challenge and response for each check. I recommend a similar checklist for updating a website or any other, mundane, oft-repeated task that has potential security implications if not completed correctly. For example, tasks for a website update should include a check of the content's classification, release date, and the information owner's approval. This approach will go a long way to preventing a simple yet potentially disastrous error, such as publishing an internal report to the public Web.
An alternate Web data protection solution: How to use Google Webmaster Tools
Another method for mitigating this potential security problem is fairly straightforward if you have a Google account. Google Webmaster Tools, a set of free tools to help organizations manage, publicize and secure their websites, can provide a lot of useful security information about a site, such as malware checks and defacements.
It's simple to get started: Once your Google account has been verified, you can then use any of the Google monitoring tools, including the URL removal request tool, which sends a request that a URL be removed from Google Web search and image search results. Content removed with this tool will be excluded from the Google index for a minimum of 90 days, allowing the time needed to remove the content from your Web server. For the request to work, obviously the page or image in question must already have been removed from your site. (Yahoo has a similar request tool, but the changes take effect during its next refresh cycle.) As an aside, if you have a page that you want to keep on your site but not have it cached and accessible through the cache link, then you can add a meta noarchive tag to the page:
<A <META NAME="ROBOTS" CONTENT="NOARCHIVE">
Another useful Webmaster Tool is Fetch as Googlebot. With Fetch as Googlebot, you can see exactly how a page appears to Google. This is a great way to ensure that your Web server is returning pages exactly as expected and that there's no unexpected data within the page, in particular, no hidden links.
Hidden links are often used by spammers who have hacked your site. Google's crawler may detect these links, in which case you will be notified in the Webmaster Tools Message Centre with detailed information about each link. If pages are being used to install malware on visitors' machines, a malware warning will appear on the "overview" page of your account. Once you have removed the malicious code, you can submit a malware review request to check that the site is safe and have the page included in search results again.
Google Webmaster Tools also provide statistics about top search queries for your site. This data can help you to monitor if your site is drawing in traffic from searches for suspicious, unrelated spam keywords. Another option is to run a "site: search" query on your website, looking for keywords that hackers commonly use, such as Viagra, ringtones, mp3, gambling, etc. So for example, in the Google search box you would type:
This search will only return indexed pages from your domain that contain the word ringtones. If the search does return specific URLs in the results, then these particular pages have been hijacked and need to be investigated. Similarly, Google Alerts can be used to monitor such queries. Once an alert is set up and verified, an email alert will be sent to you whenever such keywords are found in the content of your site. This is a free service that gives you an extra pair of eyes, so make the most of it.
Google's Webmaster tools help you to see your site from an attacker's perspective. Regularly using them will ensure you stay on top of information the site is making available, which may be of use to them. Finally, check the Google Webmaster Forums. These provide information about ways to properly protect and expose your site to Google.
About the author:
Michael Cobb, CISSP-ISSAP is the founder and managing director of Cobweb Applications Ltd., a consultancy that offers IT training and support in data security and analysis. He co-authored the book IIS Security and has written numerous technical articles for leading IT publications.