Recently in Database Notes and Queries Category

A vision for open data to revolutionise urban life

| No Comments
| More
ODCC.jpgGreg Hadfield, a former Fleet Street journalist and internet entrepreneur, is organising the United Kingdom's first Open-data Cities Conference. In this guest blog post, Hadfield discusses the opportunities of open data.

Imagine a city where your car tells you the location of the nearest vacant parking space. Or a city where you are notified as soon as a neighbour submits a planning application. Where up-to-the-minute listings of every cultural event and venue are available - all the time, wherever you happen to be. Imagine if you could discover the asking price of the cheapest two-bedroom home that has just gone on sale, in the catchment area that will guarantee your child a place at the best-performing school.
This is the thinking that led to the United Kingdom's first Open-data Cities Conference, which will be held at Brighton Dome Corn Exchange on Friday, April 20.
It's not technology that is holding us up. Although the rate of change will be greater as we progress towards ubiquitous, free, high-speed internet access available to everybody via a myriad devices.
For open-data cities to become reality, we don't have to wait until connectivity - and the "connectedness" it engenders - is the air we breathe.
Nor do we have to wait for the "internet of things", of which all kinds of objects - not just computers, tablets and phones - will be a part.
Emerging technologies associated with a semantic web of data are already sufficient to power innovative applications, services, and enterprises that will compete and combine to meet the needs of communities in the 21st century.
It is lack of data that will limit our ambitions. It is a dearth of data that risks keeping our cities in the slow lane to the future.
In a post-digital era - when the differentiation between analogue and digital, between "real" and "virtual", will finally be blurred beyond relevance - we will live in the age of data.
Even now, data is everywhere, all the time. It defines, describes and determines the world we live in.
The more data that is released - without strings attached, in machine-readable and non-proprietary "open" formats - the more likely it is that businesses and developers will use it to build the applications and services that world-class cities need.
Of course, I'm not urging the release of personal data relating to identifiable individuals.
The civic data I'm talking about is data about schools, catchment areas, and property prices; about bus times and bus-stops, taxi ranks, car parks, and traffic congestion; about energy use, CO2 emissions, and carbon footprints.
The crucibles for global change will be "open-data" cities - cities which self-consciously and collectively decide to make available unimaginable quantities of data, openly and freely.


Podcast interview: Tim Leonard, CTO, US Xpress on big data

| No Comments
| More

US Xpress has implemented a single data analytics user interface that pools in information from multiple sources. The logistics firm collects 900 data elements from tens of thousands of trucking systems - sensor data for tyre and petrol usage, engine operation, geospatial data for fleet tracking, as well as driver feedback from social media sites.

All of this data is stream both in real time and collected for historical analysis. Information fed to appropriate online transaction processing systems, Hadoop and data warehouses,

In this podcast, Tim Leonard, CTO and vice president at US Xpress, explains how the company processes and analyses Big Data to optimise fleet usage, reduce idle time and fuel consumption and save millions a year as a result.

Enhanced by Zemanta

Amazon pushes DynamoDB into Europe

| No Comments
| More

Amazon is offering its DynamoDB NoSQL database service, in Europe to provide businesses with a scalable database system in the cloud.

Amazon says DynamoDB in the EU-West region, complies with European data regulations since data remains the European Union. The database stores data on Solid State Drives (SSDs) and replicates it synchronously across multiple AWS Availability Zones within the EU-West region to provide built-in high availability and data durability.

BI is NOT a Cloud application

| No Comments
| More

Reading through the vast amount of analyst comment from Creative Intellect Consulting, Gartner, IDC, et al, it's hard to see how CIOs get any sleep. Over the last year, however, there appear to be two key themes that are at the top of the CIO agenda, Cloud and Business Intelligence. Now Teradata has thrown a spanner into the works by saying that BI is NOT a Cloud application.

Teradata's view, based on their customers who are almost exclusively in the top 3000 companies, is that BI and data warehousing are hugely IO intensive applications and the problem with Cloud based solutions is that they are more focused on the provision of compute resources such as virtual machines than they are IO. In addition, as Cloud operators grab any spare hardware when you need to scale, and high performance BI requires parallel systems, you would be seriously limited by the slowest machine in their environment.

Another problem for the current generation of Cloud service providers is that their money comes in the increased utilisation of hardware. According to Scott Gnau, Chief Development Officer, Teradata," our systems already run at 90-95% utilisation making them deploying as a private Cloud today so there is nowhere for the Cloud provider to make any money."

However, Teradata did concede that for smaller companies, particularly the SME who lack the technical expertise and resources, the use of Cloud based BI services may be a solution. They were even prepared to concede, under questioning, that it is possible that some form of Cloud infrastructure capable of doing high performance BI might eventually exist, but not in the short term.

What Teradata does believe might happen is a stratification of those Cloud providers who want to offer BI services. For this to happen, would require a significant investment in hardware and a bespoke solution that is also capable of being multi-tenant driven. In this scenario, the provider would be able to give a Service Level Agreement (SLA) for guaranteed IO.

Martin Willcox,Director of Product and Solutions Marketing, EMEA, says that Teradata has been having conversations with some Cloud providers about the idea of a BI Cloud but at the moment there isn't the demand from Teradata customers or really from the Cloud providers for such a service. Teradata's view is clearly at odds with a number of other vendors such as IBM, Microsoft, HP, Oracle and SAP Sybase who are already building out Cloud infrastructures that are capable of supporting their customers BI requirements.

Willcox also points out that Teradata has focused on the philosophy of making the DBMS as self managed as possible freeing up the DBA to deal with other, more pressing, issues. Among those are the complexity of removing the data warehouse silos that exist in a lot of companies and building a single, coherent, data warehouse against which they can ask questions. But at a fundamental level, a lot of organisations struggle with consistent data models, data transformation and naming schemes. Willcox believes that the only way to resolve this is internally, outsourcing your data warehouse and BI and then expecting the outsourcer to deal with this is outside of the scope of the outsourcer.

There are other issues that have to be overcome by the whole Cloud industry. These include the security of data both at rest and when moved to end user devices, where the working sets are held and the bandwidth needed to move all that data around. On top of this, the whole issue of application latency has to be dealt with and this is something that is already causing serious problems for application developers.

With many of Teradata's customer base in the financial industry, Willcox says that the idea of moving all their data to a Cloud provider is something that causes major internal and compliance issues. This is not just about live systems but also covers test and development and it is exactly this latter scenario that has driven many organisations to take a long hard look at the Cloud.

To be able to properly test an enterprise application you need real live data and the ability to test against many more users that you actually have. Internal resources are rarely enough so the idea of being able to deploy to the Cloud and just acquire the necessary resources short term is just good sense. In addition, test tools are extremely expensive so with vendors such as HP and IBM making their test tools available through Software as a Service, it is now possible for test and development teams to really stress an application.

However, Willcox still believes that BI is a long way from taking advantage of this. He points to the security problem of putting data onto a Cloud providers site, even if sent on tape and held in a secure area. As to uploading the data for short term tests, Willcox says that there are few companies, if any, that have enough bandwidth to actually do this.

One of the big drives from Teradata has been to get customers and developers to run their applications on the Teradata platform, so that they are as close to the data as possible. With Cloud, where there are security concerns over data, we have seen people move the applications into the Cloud and keep the data locally. Willcox dismisses this as being a viable option for high end BI because the latency inherent in the network is too slow to allow the application to work effectively.

However, this may be about to change, at least for Teradata customers. Willcox says that they have been doing a lot of benchmarking and testing and the Teradata labs in the US and on some customer sites to address the issue of the movement of data, compliance and data security. While there is a lot of maturing needed across all the various parts of the ecosystem, there are improvement happening and Teradata is preparing new software.

Despite all of this, Willcox still maintains that there is still a significant lack of customer demand for Cloud based BI solutions.

A quick look inside the Teradata Active EDW 6680

| No Comments
| More

With the launch of the Teradata Active EDW 6680 data warehouse, we sat down with Jim Dietz, Enterprise Product Marketing Manager, Teradata, to talk about how the Active EDW 6680 works.

CW: How does the Active EDW 6680 decide on what belongs in the different zones?
Dietz: The Teradata virtual storage software keeps its own statistics on how often each storage block is read or written and then uses time averaging to rank the blocks by usage. It takes a couple of days to raise the temperature of a block and only as that temperature rises, is the block moved between the hot, warm or cold zones. This is important as it prevents a user who is using a set of complex queries from distorting what is in the hot zone.

CW: Why not move cold data off to the storage area network?
Dietz: At the moment, there is no mechanism to move data to the SAN as our only focus is on the data warehouse. We can use high capacity disk drives in the system which helps to keep the cost down and that helps us keep more cold data in the system.

CW: What's the value of holding that cold data?
Dietz: It means that when users ask queries we have a long history of data. This is important for complex queries as it increases the data set and ensures that the data covers the entire application lifecycle. All of the data in the cold area has been partitioned by data so when the user writes a query, it is not accessed unless they want to look back that far.

CW: Which customers do you see using this?
Dietz: One example are the telcos. They have to hold a lot of historical data for compliance and other government demands. This means that not only can they take a long term look at their customers but they don't have to keep restoring data when they have to provide data to governments. In some cases, we are seeing them holding data as long as seven years.

CW: Do you use technologies such as MAID (Massive Array of Idle Disks) to keep power and cooling costs down?
Dietz: No. Apart from the SSD drives, we are using 15,000 RPM SAS drives and can support drives as large as 600GB.

CW: Why not support large capacity SATA drives if the data is rarely accessed?
Dietz: We do have plans for adding another tier of drives which would be high capacity SATA drives but I can't say when that will happen.

CW: You are not using the SSD to extend the memory of the 6680 so how much memory does it support?
Dietz: We use standard DDR 3 memory with 96GB per node. A typical system would consist of anywhere between 8-10 nodes which gives us around 1TB of memory. With the next generation of servers we will be able to support more memory. With that amount of memory we don't need to use the SSD as extended memory because we can use the entire 1TB as a hit cache. The optimiser looks forward to see what types of data it will need for queries that are coming up and we pull that data from the hot disks.

CW: How do you prevent the caches synchronising with each other and not loading new data?
Dietz: We have two separate processes. The memory cache and the hot disks are different systems and this prevents the two caches getting into a synchronised lock.

CW: Scott Gnau has positioned this as if this is about removing the DBA. Is that really the case?
Dietz: No. The announcement was not about removing the DBA, just that you do not need to put any more work on them. We prevent them having to worry about the hot and cold problem and spending all of their time optimising the data warehouse. We are the only company doing this as our competitors still need the DBA tp constantly monitor and building rules and processes.

CW: If everything is being automated, what sort of analytics are you providing to the operations team so that they can track performance?
Dietz: We are not introducing any new management information at this point in time. The mixed storage is more about performance. We are not re-architecting the way we do things, just improving the overall operation. Standard interfaces to system management software will be maintained so that what they will see is that the queries we were performing before they added the 6680 hybrid storage will be running faster.

CW: With the acquisition of Aster Data you are no longer in a SQL only world. How will you bring together the way that Aster Data queries data and the way that the existing Teradata tools work?
Dietz: Aster Data uses Map Reduce instead of SQL and yes, this is a very different way of querying the data. We will have to get to the point of hybridisation of queries so that the user can ask the question using either set of products and the query is capable of accessing the data both inside and outside the data warehouse

CW: Thank you

Teradata launches Active EDW 6680 hybrid storage solution

| No Comments
| More

At the Teradata Universe in Barcelona, Scott Gnau, Chief Development Officer unveiled Teradata's latest flagship product the Active EDW 6680. With support for SSD and SAS drives along with just under 1TB of RAM, the Active EDW 6680 is described by Gnau as being Teradata's answer to the need for speed in high end data warehouses.

The 6680 comes with its own intelligent software is fully automated with no direct DBA intervention allowed. The goal is to remove the DBA from the need to be constantly optimising indexes and tables for performance and this is achieved by the built in self managing and self optimising management software. According to Gnau, the 6680 will provide customers with a 4x performance per floor tile inside their datacentre.

What makes the 6680 different from other solutions is the way that it optimises data into three different zones, hot, warm and cold. According to Gnau, when you analyse the way that data is used in data warehouses, you discover that 43% of all IO is focussed on just 1.5% of the data stored. As you move forward, 85% of the IO address just 15% of data and 94% of IO touches no more than 30% of data in the data warehouse.

The result is that significant amounts of data rarely, if ever, used. In order to optimise for the data that is used, the 6680 will store the most used 20% of data on its SSD drives. This approach is very different to that of Teradata's competitors who currently utilise SSD as an extension of RAM to create large in-memory datasets. The problem of this, from Gnau's perspective, is that it doesn't really optimise the use of data at all and just keeps a lot of unused data on high value SSD.

The 6680 splits the remaining 80% of data into two groups of roughly 40% each called warm and cold. Warm data will be that which has recently been in the hot zone and cold data will be data that has not been accessed for some period of time. At present, Gnau says that there are no plans to start doing predictive analysis of the data to see what data might become hot and move it to the warm zone. While not ruling this it out completely, he said that much would depend upon customer demand.

Gnau also claims that the movement of data in and out of the hot zone will not impact the performance of the 6680. This is because the decision making and movement will be done using spare IO and processing cycles so the users will not see any impact on performance at all. 

Use it or go bust

| No Comments
| More

At the Teradata Universe conference in Barcelona, Hermann Wimmer, President Teradata EMEA gave a stark message to delegates - "Those companies that do not make use of their data to generate new business will disappear from the market."

As a company whose entire business has been built on Business Intelligence (BI) and the use of analytics, Wimmer's message might seem a little self serving but there is a serious truth about what he said. Companies have invested large sums of money to storage the massive amount of data that they generate. Within that data has to be information that can improve the business but if you cannot analyse the data, you can't improve the business.

Wimmer believes that this is about the ability of companies to use their raw materials properly. He made the point to the attendees that unlike other raw materials that businesses consume, data can be reused as many times as you want. This makes it the most efficient of raw materials and something that companies need to understand because once you have it, there is little cost to reusing it for new business models.

He them cited a number of examples of companies who are making significant use of their analytics. The first of these is US healthcare company, Wellpoint. They have been using data warehousing and analytics for a number of years to compare the efficacy of patient treatments. As a result, they are already able to feed back to patients and doctors what treatments are more effective than others which is changing the way patients are getting care.

Wellpoint are also able to deliver patient records within 5 seconds to any emergency room in the US. This means that a patient can walk into any hospital and doctors are immediately able to access their medical history. The implications for national healthcare providers like the UK's NHS are immediate. While governments have spent vast sums trying to create national data solutions, here is a private US company showing that it can be done and at a significantly lower cost than the money that has been poured into the NHS over the last decade to create a similar system.

Another UK example is Thames Water who have brought their sensor data and their maintenance data warehouses together to reduce leakage by 25% over the last year. Thames Water now claim to be able to detect a leaking main earlier than ever before but only because they are able to identify sudden excess usage of water, compare that to maintenance records and then despatch crews to minimise water loss.

To show that analytics is not just about businesses using data to improve what they do, Wimmer talked about UK bank Lloyds TSB who recently launched their money manager. At the back end of this system is a set of analytics that go through your bank account and provide the customer with a detailed analysis of their spending. For both consumers and businesses, especially those where cash flow is always an issue, this use of backend analytics to deliver detailed analysis of spending is a significant benefit.

For Lloyds TSB the costs of developing the system were minimal as all they have done is extend a front end application to their core databases. From Wimmer's perspective this show a democratisation of data where it is not just the business that can benefit but it allows you to provide customers, partners, prospects and your entire supply chain with valuable information that is currently locked up inside data warehouses.

Teradata keeps Aster Data at arms length

| No Comments
| More

At the Teradata Universe conference in Barcelona, Scott Gnau, Chief Development Officer told journalists that for the time being, Aster Data would be run as a wholly separate business unit. This news will be welcome to Aster Data's customers who have been concerned that any takeover would have an impact on forthcoming roadmaps and product releases.

According to Gnau, the speed of the acquisition and the way that the takeover process operates has so far meant that Teradata and Aster Data engineers have not yet had a chance to sit down and compare roadmaps. As a result, the existing Aster Data roadmaps would be honoured with nCluster 4.6 just releasing to the market and nCluster 4.7 still on target for the end of 2011.

When asked when we would see any plug-in to allow Aster Data to take advantage of Teradata's new Hot/Warm/Cold storage architecture, Gnau said that there are no announcements at present and this would also have to wait until the engineers for both companies have looked at each others products.

Despite this, Gnau was very upbeat about the acquisition and reiterated comments earlier in the day from Hermann Wimmer, President, EMEA region, that Aster Data and another Teradata acquisition, Aprimo, would both add significant value to Teradata in 2011.

Gnau told journalists that one of the big gains from the Aster Data acquisition is the ability to do Big Data which he also referred to as Analytics at Scale. This is because the Aster Data products address the more than 80% of data that lives outside of the traditional data infrastructure used by analytics vendors. This is also an area, according to Gnau, that is growing faster than traditional data.

In a separate set of discussions, Teradata sources would not comment on what plans, if any, there were to have an independent audit of both Aster Data and Teradata analytic function sets. Both companies have very complex sets of functions available to their customers and going forward there is a need for Teradata to prove that there are no inconsistencies between the two function sets.

IBM clarifies Cognos 10 Cloud strategy

| No Comments
| More

With all the attention around what applications are moving to the Cloud and how those applications will work, IBM has  clarified its own plans around Cognos. Last month, I said that they were planning to create a Business Intelligence as a Service application. This was based on comments made by people from IBM. It now appears that IBM is not planning that at all. So what is it doing?

IBM Cognos 10 Image
To make it easy for customers to move to Cognos 10 on the IBM Cloud, IBM created a Cognos 10 image that customers can use for rapid deployment. According to Harriet Fryman, IBM Product Marketing and Strategy, Business Analytics, the image not only includes Cognos 10 but also includes all the tools customers would normally buy to install and configure Cognos.

Who is it for?
Fryman says that IBM is targeting two groups of customers. The first is brand new organisations moving to Business Intelligence for the first time. The route IBM is offering them is that they buy into Platform as a Service on the IBM Cloud and then implement the Cognos 10 Cloud image.

The second group are existing customers who are either looking to shift new projects to the Cloud or who want to migrate their existing BI infrastructure to the Cloud.

One of the surprises in talking with Fryman was understanding how IBM was going to licence Cognos 10 from a Cloud perspective. IBM has been talking about using licence tokens that you can buy in bulk and then apply to any product. Some products may need a single token and others may require multiple tokens to use.

At the same time, IBM has stressed that you will be able to take your existing licences to any IBM software running on the IBM Cloud. Fryman put that in context for Cognos 10 by saying that customers would be able to take their per user licences to the Cloud, but the Cognos 10 Cloud image would need a separate licence agreement.

Not sure that I agree with this. If, as an existing IBM customer I want to move my existing Cognos implementation to Cognos on the IBM Cloud, I can't see why IBM would not allow me to migrate the licence rather than force me to buy a whole new licence. This seems a little mean spirited by IBM and perhaps needs a little more thought.

One of the concerns that a lot of companies have with moving to the Cloud is security. BI has come a long way over the last decade in allowing workable security to be applied to data without making it too hard for users to extract data and work with it.

IBM has been talking up its security portfolio, especially around the Cloud and Fryman was asked how IBM intended to apply federated security from organisations in-house solutions to Cognos 10 on the IBM Cloud platform.

Fryman said "Security is near and dear to everyone in IT and we already have guidance for customers. It's not just one straightforward answer but a mix of things. Security is with the dataset itself and then around the rights for an individual user to gain access." Fryman went on to talk about the book that IBM released at the Cognos 10 launch that has a lot of detail on security.

The problem for IBM is that it hasn't yet integrated the federated security products into Cognos 10. This means that customers will have to buy security from other parts of IBM. Once that is done, they will have to work out how to integrate what they buy for the IBM Cloud and how it will link with their existing security systems. For very large enterprises, this should be something that their own teams can handle but for a lot of the new customers IBM is targeting, there is a risk of large consultancy bills either from IBM or a System Integrator.

There is another option, leaving the data on your own site while the Cognos tools sit in the Cloud. This would remove the need for federated security from a database perspective. However, it would introduce another challenge, latency.

One of the problems with applications is that they have built-in time-outs to ensure that they do not sit forever doing nothing. If the link between the application and the data is not good enough, this can cause applications to regularly time-out causing a lot of user problems.

While admitting to not being a latency expert, Fryman did say that customers should use the Dynamic Query Analyser in Cognos 10 to see how queries were performing. The DQA can be used in both test and production mode and provides support teams, DBA's and developers with a graphic of where the query is hitting performance.

Rights management and mobile devices
Another part of the security problem is securing the data once it has been sent to the user. There has been a massive take-up of mobile devices among management who want data and dashboards pushed to their latest device. The problem is not just supporting a wide range of devices but also protecting the data once it is off the main servers.

Here again, Fryman admitted that the Cognos team didn't have the tools inside their product but pointed out that this wasn't their remit. Fryman also pointed to other parts of IBM that do have the rights tool to protect data and said that the Business Analytics unit was looking at what tools were available and how they could leverage them.

While this stops short of promising some form of rights management, it is, at least, a step in the right direction.

On the subject of mobile devices, especially with the surge in tablets, Fryman said that the iPad is currently only able to consume data if it is connected to the network. While this may change, there was nothing that Fryman was able to announce or provide a timescale for.

With the Blackberry , however, IBM does have a client side application that provides security around the data. Fryman says that it is unique to the user and is required whenever they want to see a report that has been downloaded.

Fryman said that IBM did recognise the need to support mobile but that the challenge with mobile was both the diverse nature of the interaction with the device and the adoption curve of devices. IBM, she said, will build tools and clients based on what is getting deployed.

While Fryman was able to clarify a number of things about the delivery of Cognos 10 on the IBM Cloud it is clear that there is still a lot for IBM to resolve internally. The most important is ensuring that the federated security tools are part of the Cognos 10 image. After that, resolving the issue of licensing is equally important if customers are not left feeling just a little cheated about not being able to exchange their existing licenses for a Cognos 10 Cloud image

Ingres and JasperSoft take on the big BI and database players

| No Comments
| More

Ingres and JasperSoft, two of the heavyweights of the Open Source community, have announced today that they have certified the latest release of each others software on their platforms. As part of this announcement, they are claiming that JasperSoft's BI Suite on Ingres VectorWise analytic database delivers faster query response times than any other solution currently on the market.

By throwing out the faster query response times statement, Ingres is relying on its VectorWise analytic database engine being able to deliver on claims from June 2010. At that time, Ingres announced that VectorWise was the first purpose built analytic database engine that was capable of delivering very high speed performance on commodity hardware.

The key to the Ingres claim was that other vendors database products were not capable of taking advantage of features in modern server processors such as multi-threading, SSE and the increase in size of L2/L3 caches. For VectorWise to take advantage, it meant architecting a new database engine from the ground up, something that the mainstream database vendors have not done.

Using a dual socket Intel Xeon x5560 2.8GHz computer with 48GB RAM, Ingres claimed that VectorWise  could improve a typical query from 16.5 seconds (based on Ingres 9.0) to 0.206 seconds. That time compares with a hand-written and optimised C++ program to do the same query which took 0.04 seconds. The query itself used a datacentre generated in accordance with the TPC-H benchmark although this was not a formal TPC-H standards test.

What gives VectorWise its biggest performance increase is that it treats the L2/L3 cache on the processor as its main memory and then uses RAM and disk as secondary sources of memory. This in-memory cache approach is something that Oracle, Sybase, IBM, Microsoft and others have been using but with less dramatic effect.

By combining that level of performance with JasperSoft's BI Suite, the two companies are now signalling that Open Source is not just about niche solutions or for companies on small budgets. This level of performance is something that large enterprises with complex BI solutions are struggling to achieve with their existing solutions that are expensive to run and tune. At time of writing, there were no prices available for an Ingres/JasperSoft solution.

In another challenge to the established database vendors, both companies are stressing that their solutions are standards driven and not just for on-premise applications but can be delivered through multi-tenant SaaS and Cloud platforms.

There are a lot of smaller Cloud solutions providers looking for products that would enable them to not just support the SME but also go after the very lucrative enterprise market. An Open Source pairing that could be quickly dropped onto their servers and then delivered to customers without the complexity of licensing that accompanies the solutions from the big database vendors.

About this Archive

This page is a archive of recent entries in the Database Notes and Queries category.

CIO is the previous category.

Deployment diary is the next category.

Find recent content on the main index or look in the archives to find all content.


Category Archives


-- Advertisement --