With the launch of the Teradata Active EDW 6680 data warehouse, we sat down with Jim Dietz, Enterprise Product Marketing Manager, Teradata, to talk about how the Active EDW 6680 works.
CW: How does the Active EDW 6680 decide on what belongs in the different zones?
Dietz: The Teradata virtual storage software keeps its own statistics on how often each storage block is read or written and then uses time averaging to rank the blocks by usage. It takes a couple of days to raise the temperature of a block and only as that temperature rises, is the block moved between the hot, warm or cold zones. This is important as it prevents a user who is using a set of complex queries from distorting what is in the hot zone.
CW: Why not move cold data off to the storage area network?
Dietz: At the moment, there is no mechanism to move data to the SAN as our only focus is on the data warehouse. We can use high capacity disk drives in the system which helps to keep the cost down and that helps us keep more cold data in the system.
CW: What’s the value of holding that cold data?
Dietz: It means that when users ask queries we have a long history of data. This is important for complex queries as it increases the data set and ensures that the data covers the entire application lifecycle. All of the data in the cold area has been partitioned by data so when the user writes a query, it is not accessed unless they want to look back that far.
CW: Which customers do you see using this?
Dietz: One example are the telcos. They have to hold a lot of historical data for compliance and other government demands. This means that not only can they take a long term look at their customers but they don’t have to keep restoring data when they have to provide data to governments. In some cases, we are seeing them holding data as long as seven years.
CW: Do you use technologies such as MAID (Massive Array of Idle Disks) to keep power and cooling costs down?
Dietz: No. Apart from the SSD drives, we are using 15,000 RPM SAS drives and can support drives as large as 600GB.
CW: Why not support large capacity SATA drives if the data is rarely accessed?
Dietz: We do have plans for adding another tier of drives which would be high capacity SATA drives but I can’t say when that will happen.
CW: You are not using the SSD to extend the memory of the 6680 so how much memory does it support?
Dietz: We use standard DDR 3 memory with 96GB per node. A typical system would consist of anywhere between 8-10 nodes which gives us around 1TB of memory. With the next generation of servers we will be able to support more memory. With that amount of memory we don’t need to use the SSD as extended memory because we can use the entire 1TB as a hit cache. The optimiser looks forward to see what types of data it will need for queries that are coming up and we pull that data from the hot disks.
CW: How do you prevent the caches synchronising with each other and not loading new data?
Dietz: We have two separate processes. The memory cache and the hot disks are different systems and this prevents the two caches getting into a synchronised lock.
CW: Scott Gnau has positioned this as if this is about removing the DBA. Is that really the case?
Dietz: No. The announcement was not about removing the DBA, just that you do not need to put any more work on them. We prevent them having to worry about the hot and cold problem and spending all of their time optimising the data warehouse. We are the only company doing this as our competitors still need the DBA tp constantly monitor and building rules and processes.
CW: If everything is being automated, what sort of analytics are you providing to the operations team so that they can track performance?
Dietz: We are not introducing any new management information at this point in time. The mixed storage is more about performance. We are not re-architecting the way we do things, just improving the overall operation. Standard interfaces to system management software will be maintained so that what they will see is that the queries we were performing before they added the 6680 hybrid storage will be running faster.
CW: With the acquisition of Aster Data you are no longer in a SQL only world. How will you bring together the way that Aster Data queries data and the way that the existing Teradata tools work?
Dietz: Aster Data uses Map Reduce instead of SQL and yes, this is a very different way of querying the data. We will have to get to the point of hybridisation of queries so that the user can ask the question using either set of products and the query is capable of accessing the data both inside and outside the data warehouse
CW: Thank you