Unfortunately, the only law that applies to Evapordation (The Law of Denuded Returns) is that the data lost will be significant i.e. it will contain, contact, evidential or reference material that you were certain you had in your grasp. What's worse it wont be on your back up either.
Unfortunately, the only law that applies to Evapordation (The Law of Denuded Returns) is that the data lost will be significant i.e. it will contain, contact, evidential or reference material that you were certain you had in your grasp. What's worse it wont be on your back up either.
BI has been around as a term since 1958 - an output of an IBM researcher and further brought into the public domain by Howard Dresner (the aforementioned facts come from my favourite on-line reference source) but the question is - have businesses become more intelligent from the legions of technology thrown at Business Analytics, Data Warehousing and related data churning approaches.
You may ask why am I concerning myself about this? Well I find myself intimately concerned with how large amounts of data are consumed by non-IT people and the tools we (at MS) offer them.
Within both our EPM and business data graphic offerings (alright MS Project Server and Visio) the amount of BI that we are using or facilitating has moved to a new, more sophisticated level. Now I demo this capability on a regular basis and I love it, but I do get asked in various situation can we do 'this' or can we get at 'that'. Most of these questions arise from a fundamental lack of understanding of the importance of (a) having, (b) understanding and (c) using your data model consistently and thus ensuring the on-going quality of that data.
I dined with a friend last week who works for a large UK bank currently seeking to integrate 'bought in' operations; they are faced with the fact that they both really don't understand their own data (which goes back decades) and certainly have little knowledge of the data they have recently acquired. Yet more data silo's being created.
I showed him the full capabilities of my beloved Visio and he was stunned at the lack of internal understanding of this type of tool.
If you have the quality data the industry has the solutions, to paraphrase colleagues of mine, 'Advance Business Intelligence made simple' is deliverable every day to every user, so don't be Oxymoronic just be Capable.
From the smallest home office business to the largest enterprise, the amount of data that businesses accumulate continues to grow. Using that information effectively is often challenging because users do not possess the tools or the knowledge on how to make the most of their data.
Small and even mid-sized enterprises often lack the resources to acquire BI tools, skills and training when compared to larger enterprises. This puts them at a disadvantage when it comes to competing and, more importantly, gaining a better insight to their business and market.
According to a recent press release TIBCO Silver Spotfire is targeted at the SME as "a fully functional on-demand offering designed to enable anyone to create,publish and share custom dashboards or reports for business analytics and business intelligence (BI) in the Cloud."
For the first year, those companies who want to try this can get free access to TIBCO Silver Spotfire. It comes with an authoring client and expansive web-based sharing and hosting for the user's favourite personal or business Spotfire application. After the year is up, TIBCO says that there will be a range of monthly hosting options for those who want to continue using the product.
TIBCO also describes this as not just BI as Software as a Service (SaaS) but as "social BI". The idea is that individuals can quickly create and share information across the business as part of an ad-hoc corporate analytics knowledge base. Any data created through TIBCO Silver Spotfire can also be integrated into a range of social media, blogs and online articles.
At the heart of all of this is the TIBCO Silver Cloud Computing platform on which the company has now made provision for Spotfire users which was updated in May 2010 as a hosted platform for TIBCO customers along with the Silver Spotfire beta.
A free one year subscription will be attractive to many customers. However, it is important to note that this is not the full Spotfire Enterprise product that TIBCO sells but a reduced functionality product. Customers who want to move to the full Enterprise version later will be able to do so and, at the same time, pull in the work that they have published on Silver Spotfire.
TIBCO is very clear about the target audience here. This is about extending the reach of BI into small companies, small branch office, departments who need a simple BI tool and where TIBCO currently has no presence. By using a free hosted Cloud based platform with customers having no initial costs, TIBCO believes that many companies will be tempted to try BI for the first time.
As a software developer tools vendors, TIBCO is also hoping to build a community of developers who want to build dashboards and other applications on top of the TIBCO Silver Cloud Computing platform and in particular on Silver Spotfire. This would allow TIBCO to attract an every increasing set of customers and make the Silver platform more attractive to third party Cloud hosting vendors who are looking for a value-add solution in order to attract more business.
The users work a local copy of the Spotfire client to create their data visualisation and then upload the data to the server . The maximum single file size is 10MB of compressed data. That might not sound a lot but TIBCO believes that it is more than enough for 300,000-500,000 rows of data depending on the level of data redundancy and the type of visualisation used.
This is a little disappointing but signals where TIBCO currently is with this product. While a fully fledged Cloud BI platform would have been nice, this is about hosting your results not hosting your BI. What will be interesting over the next year is if TIBCO can not only build the complete BI Cloud platform but then sell that as part of the Silver Cloud Computing platform to third party vendors. Success in this area would be a significant market changer but would also need to be linked to a host of other components such as virtual machines, a fully fledged SaaS platform and an active developer community.
As the work is all done locally, when the data is published to the Silver Spotfire platform, the files are not linked to the underlying data sources. This is important as it means that the files are not going to auto update as the core data changes and users will need to build their own local processes to recreate and republish.
In this first release the data will be hosted in the US and there is no Geo Locking. This means that you need to carefully control any data published through the platform to ensure that you do not inadvertently breach any data protection rules. With one of the goals of Silver Spotfire to make it easier to use social media for publishing data there is also a real risk to data leakage
Stopping this is more challenging than many companies realise so it is important that companies step up their data management training for users. This does not preclude using Silver Spotfire but it is something that must be taken into account especially as there is no guidance on data protection policies on the TIBCO website
Another missing element here is federated security. This is something that TIBCO has said it will be working on over the next year as it builds momentum with Silver Spotfire. At present, it is talking to the early adopters and will talk to any new customers about what they want in terms of security.
Despite the security and data protection concerns this looks like a very interesting opportunity and one that is well worth spending some time investigating.
Porting a database from one vendors offering to another has always been difficult. To try and ease the pain vendors have product porting guides, third party tools companies have products that will take your schemas and stored procedures and recreate them for the new target database and software testing companies have products that will allow you to create a series of acceptance tests against the newly ported data.
Despite all of this, the way we embed code inside applications today means that we often don't find the problems until it is too late and the help desk starts taking calls. One of the main reasons that embedded code causes us difficulty has been the divergence of SQL from a single standard into three main variants with a fourth - parallel SQL - starting to make its mark as databases become ever larger and queries more complex.
User Defined Fields are another challenge. They are often used to create a complex field type that the developer didn't want to break down into multiple fields for some particular reason associated to their code. It may also have been used to hold an unsupported data type from another database during a previous port.
Challenges go far beyond field and SQL constructs. Big database vendors are designing in features to support their own developer tools and key data driven applications. These end up as elements inside the database which often have no correlation in another vendors products.
But no matter how many challenges you identify, people still want to port their databases. It may be financial, it may be that the new DBA or IT manager has a preference for a different vendor or it may be that someone outside of IT has decided that we should now be moving all our tools over to a new supplier.
IBM has decided that it is time to change the landscape. Alongside its existing migration documents and professional services engagements, IBM is now allowing developers to run native code from both Oracle and Sybase against DB2 9.7.
All of this comes at a time when Oracle is still digesting Sun and Sybase is being bought by SAP. By allowing native code to be run IBM believes that those customers who are unsure about what the future holds for Oracle and Sybase can quickly move to DB2 without having to cost in the rewrite of thousands of lines of application code.
And come those customers have. IBM is claiming that is has been able to take banking customers from Sybase who were worried about systems optimisation. At the same time, they have picked up over 100 SAP implementations that were either looking at or already deployed on Oracle.
There is another reason for IBM to make itself the universal database target. While its competitors spend large sums of money buying application solutions and then rewriting them to run on their databases, IBM is able to simply focus on the underlying database technology. This allows it to focus its R&D on database performance and optimisation and then consume any application tier.
These are not the only two databases that IBM is targeting. It has both MySQL and Microsoft SQL Server in its sights although, at present, those are still part of a migration rather than a native code solution and IBM is unable to say when there will be native code solutions. For Microsoft SQL Server, this should be relatively simple as the T-SQL it uses has not diverged much from the Sybase T-SQL from which it was derived.
To help developers understand more about this there are a number of very interesting articles up on the IBM DeveloperWorks web site.
At the ESRI User Conference in San Diego, CA this week, Terradata and ESRI announced a new collaboration aimed at storing business data and Geographical Information System data in the same database. The goal is to enable businesses to better understand where there customers are in order to do more focused marketing.
This use of GIS and BI data is nothing new with an increasing number of marketing departments building their own solutions over the last 20 years. What is new is that all the data from both systems is being stored in the same database. The advantage for users is that rather than have to build complex queries against multiple data sources, they can write simpler, faster queries against a single database.
There are other advantages here for users. With both sets of data inside the same database, new data can be automatically matched with GIS information as it is entered. For any retailer doing overnight updates of their store data into a central BI system this provides them with the ability to create "next day" marketing campaigns aimed at individual stores. For large retailers, such as supermarkets, this is going to be highly attractive.
As well as retailers, Terradata and ESRI are targeting a number of other vertical markets such as telecommunications, utilities, transportation and government departments. In all these cases, being able to map usage and consumption to GIS data will mean the ability to deliver better services as need arises.
One area that will benefit highly is emergency response to situations where the mapping element of the GIS data will enable response teams in the field to immediately match population location with access routes. It will also provide them with the ability to create safe zones based on local geography without the problem of trying to match conditions on the ground with remote operations staff.
All of this marks a switch away from the integration plans of IBM, Oracle, Microsoft and others who believe that the future is in being able to access multiple data sources at query time and it will be interesting to see how long it takes for the others to follow Terradata's example. One company who could move quickly down this route is Microsoft by using its Bing Maps data but they currently have no plans to do so.
Matt | July 16, 2010 5:32 PM
"What is new is that all the data from both systems is being stored in the same database. The advantage for users is that rather than have to build complex queries against multiple data sources, they can write simpler, faster queries against a single database."
How is this new? It has been possible to do this for years with Oracle/Spatial or PostgreSQL/PostGIS. The main stumbling block for ESRI users has been ESRI's proprietary data structures and expensive middle tier server technology and its historically poor support for truly enterprise class RDBMS.
Last week, Oracle announced the availability of Oracle Business Intelligence 11g. This is a major release for Oracle and comes at a time when Microsoft and SAP have both made major announcements of their own.
The emphasis at the launch was on Integration with Charles Phillips, President, Oracle stressing the depth of the Oracle stack from storage to applications and all points in between. Much of the stack has come from the Sun acquisition and it is too soon to be sure that the emphasis on integration that Phillips kept stressing is really there.
Phillips was keen to point out that it was not just the ability to present an integrated stack that set Oracle apart from the competition but its focus on standards. Phillips told attendees at the launch that Oracle "supports standards, helps define standards and is standards driven". One of the challenges for Phillips here is that Oracle has been particularly coy on what is happening with the whole Java standards process.
The key message for Oracle BI 11g alongside integration was that as the industry moves forward we will begin to see BI embedded in all our processes. This makes a lot of sense. To many people, BI is still all about sales and competitive edge. Yet some companies are already looking at BI tools to see what else it can provide such as a better understanding of IT management in complex environments or the performance of software development. At present, however, Oracle has no stated intention of addressing either of these markets.
There is little doubt that Oracle is keen to counter the messaging from Microsoft about BI everywhere. In the launch, there was a ken focus on the end user experience and the ability to not only access BI from any device but to ensure consistency of what you were working with.
This is important. One of the criticisms of the Microsoft TechEd announcements was the loss of synchronisation and control of data as users saved into SharePoint and created a lot of disconnected data. Oracle is keen to ensure that it can keep control of the data and there was a significant focus on security and data management.
Yet despite all this talk of access from everywhere, Oracle has decided against embedded BI tooling inside Oracle Open Office and instead opted for seamless integration with Microsoft Office. This has to be a mistake and those who believed that Oracle has no interest in Open Office will see this as the smoking gun they have been waiting for.
If Oracle really wants to see Open Office and the enterprise version become a significant competitor to Microsoft's Office then embedding the BI tools is a requirement. This will also make it easier for developers to create applications that are part of the daily toolset used by end users and make BI just another desktop function.
Phillips presentation was just the warm up. It was left to Thomas Kurian, Executive Vice President, Oracle to make the full technical presentation of Oracle Business Intelligence 11g.
Kurian wasted no time in presenting the Common Business Intelligence Foundation upon which the Common Enterprise Information Model is built. This is designed to manage all of the integration between applications/devices and the data sources.
There were several features that stood out for me. The first is the reporting and Oracle BI Publisher. Lightweight, able to access multiple data source formats and output into a wide range of file formats it is also scalable. What was missing was any announcement that Oracle was going to release it as a BI appliance. This would be a game changer, especially if those appliances could be deployed to remote offices.
The second key feature is the integration with WebCenter Workspaces allowing users to collaborate on BI reports. What isn't fully clear yet is whether this has the same potential for data explosion as Microsoft allowing users to save to SharePoint.
The third feature is the Oracle BI Action Framework which can be manual or automated and will appeal to developers looking to build complex applications. It will tie into the existing Oracle Middleware and uses alerts to detect changes to key data. Users can not only ensure that they are working with the latest data but developers can use those alerts to call web services and trigger workflows.
Security is a major problem with BI. As data is extracted from multiple sources and stored locally, it is possible to lose data and end up with unauthorised access to data. The BI 11g Security model uses watermarking of reports as well as encryption along with role-based security. One potential issue here will be how that encryption will be deployed throughout the business chain when you are distributing data to suppliers and customers.
Finally, not only is Oracle looking to provide a range of packaged BI applications within the BI Enterprise Edition but it is ensuring that the BI capability is embedded inside its existing applications. This two pronged approach should mean a tighter link between the BI tools and the applications.
This was a big announcement from Oracle and one that requires some digestion. There are missing elements such the Oracle Open Office integration and the failure to announce a BI appliance. However, it is clear that Oracle is determined to stamp its control on the BI market and as it continues to integrate Sun into the product strategy, we can expect to see complete end to end solutions in the coming months.
You can watch the keynotes of both Phillips and Kurian as well as download their presentations at: http://www.oracle.com/oms/businessintelligence11g/webcast-075573.html
When SAP announced its intention to acquire Sybase in May 2010, it immediately raised a number of questions. Seven weeks on and neither side seems particularly interested in publicly talking about the rationale behind this acquisition.
At first glance, this appears to be a smart move by SAP and a business saver for Sybase.
A decade ago, ERP and CRM applications were seen as only relevant for large enterprises. Today, with the explosion of hosted services, even the smallest of companies can buy access to such software. This means that vendors need to be quick to respond and be able to support a much wider spread of customers.
SAP has led this market for a number of years but the acquisition of Siebel by Oracle, the consolidation by Microsoft of its Dynamics division and the success of Salesforce.com have started to make inroads into the business. SAP has not been idle. It has built a strong developer community and has established hosting deals with a number of companies such as T-Systems who host over 1.5m SAP seats.
Despite all of this, SAP has one part of the cycle that it does not own and its competitors do - the underlying database. The ability for customers and developers to tune their applications for maximum performance is critical and the best way to do this is to own all the components.
This presents SAP with a real challenge. It has done very well out of IBM, Microsoft and Oracle, all of whom have invested significant sums of money in building consultancies capable of tuning SAP on their database products. Oracle recently set a new benchmark for SAP performance running on top of its own database products so it might seem that there is little need for SAP to buy its own database product.
Sybase has been the fourth largest database vendor for some time now but the last two decades have not always been kind to it. In the 1990s, not only did it rival Microsoft with its Rapid Application Development tools but at various points was seen as the market leader. When the RAD tools market took a dive, Sybase was hit very hard and has really struggled to reinvent and reposition PowerBuilder as a mainstream development tool.
The well publicised split between it and Microsoft that left Microsoft with SQL Server did allow Sybase to concentrate on the Enterprise market while Microsoft built a product that could compete with it. While no longer being a significant player in the general database space, Sybase does have a serious position in the high-end database market, mobile services and low-end portable databases. Sybase also has its own Business Intelligence and Analytics tools.
All of these appeal to SAP. They can use the high-end database product which includes in-memory and cloud versions to extend their hosted platform offerings. The mobile services platform means that they can position themselves into the operator and payments arena. Finally, the low-end portable database market means that their development community can build applications for mobile workforces where data can be collected on devices such as Smartphones, PDAs and laptops that will synchronise easily into the enterprise solutions.
Taken together, this would appear to give SAP a complete set of offerings and enable it to compete with those competitors who have a complete stack from developer tools, through ERP/CRM and database.
However, there are issues that need to be resolved. The first is that the large percentage of SAP sales come through the professional services teams at IBM, Oracle and even HP. By having its own database and tools, it will need to prove that it is not intending to abandon customers using other databases.
While the Sybase tooling looks good it is still far from perfect. Building tools for a wide range of mobile devices is not easy and the current tools are very Microsoft focused. Despite talking Rich Internet Applications for several years, Sybase has failed to deliver any serious RIA tooling.
The BI tools are not widely used and SAP is going to have to make decisions as to how to integrate them with Crystal Reports to create a single powerful end to end reporting and analytics engine.
So, is this a wise move? Provided SAP is prepared to drive Sybase and not allow it to just operate as a fully autonomous business unit, this makes sense. But if it treats Sybase in the same way as EMC did VMware for several years, any benefits will be slow in maturing.
BI needs power. Lots of power. Before multi-core processors and desktop BI arrived, companies doing serious BI either used mainframes, mini-computers or small High Performance Computing (HPC) setups. The cost was not just the hardware, software vendors priced for the hardware configuration which made this an expensive solution.
Moving BI to the desktop changed this. By taking advantage of unused local processing power and drastically reducing the cost of the tools, BI quickly found itself being used more widely.
In the last decade, however we have seen the massive explosion of multi-core, multi-socket commodity servers, blade systems and motherboard capable of supporting 512GB of RAM. Alongside this have come virtualisation, fast Storage Area Networks (SANS) and huge storage arrays. So is now the time to consider moving BI back to the datacentre?
IBM thinks so and its reasons make a lot of sense.
While BI at the desktop has been a success, IBM points to the fact that it has some serious shortcomings. One of these is "versions of the truth". The issue here is that users can often be working off of the same dataset at different points in time. If that data is not linked to the same core data or has been derived from another users dataset then there are inconsistencies within the data. Anyone making business decisions is not doing so with the best data around.
Another issue is data control. As data is pulled down by users, it is "lost" by the datacentre. While the original data remains, the copies can often be stored locally and as few organisations have a truly universal backup approach to all their compute devices, this data disappears from view. In many cases, it is still owned by the organisation but there exists the potential for data to be removed from the organisation and that is a much more serious matter.
Bandwidth can also become a significant challenge. If the users and the data are co-located then this is just an issue of the LAN. But where data is remotely located, perhaps in a different country or even in the Cloud, there will be charges to moving such large amounts of data.
As BI evolves we are seeing users wanting to go further than just working with a subset of corporate data. They want and are being encourage to access data from other sources. A typical example here is geodata and census information from government and other sources. This allows sales teams to get a very detailed picture of who and where the products are being sold and from this they can build much more effective sales plans.
The challenge here is that the data will often be in different formats and may not even have common fields or data types. This means that the BI users needs very advanced tools, a lot of knowledge and the processing power to make sense of all of this data.
Proponents of localised BI accept that bandwidth and data security are an issue and rightly point to the fact that these are wider issues than just BI and on this they are right. They also point to the increasing power of desktop tools and the power of local machines. So does IBM actually have a case?
The answer is yes.
The commodity server explosion has ensured that the datacentre has more resources than ever before. With virtualisation, those resources are flexible and can be applied as required. Commodity servers have also made HPC easier and cheaper to implement and this is one of the reasons why IBM believes that it is time to bring HPC and BI back together again.
One of the key advantages of modern HPC is that it is capable of taking advantage of parallel programming and for complex BI applications, parallelism offers a significant advance in performance. While desktops can run individual analysis of datasets, an HPC array using parallel processing can outperform a large number of desktops.
More importantly, that HPC array is working off of a single master dataset. This means that there is a single enterprise version of the truth as far as the data goes. The data is kept synchronised and not lost to the datacentre. Data security and accuracy get enhanced and it is much easier to meet your legal and compliance obligations.
As the HPC array will be located in the same place as the SAN, the network can be tuned to ensure that the data is delivered optimally. This means no delays due to external systems and, financially, there are no huge penalties for moving large volumes of data in and out of the Cloud.
This does not mean that there is no place for fine tuning of data locally, but it does reduce the amount of data moved and ensures that the vast bulk of the processing is done much more efficiently.
Integrating multiple different types of data sources is also something that can be best done at the IT department level making it easier for users to then take advantage of data from different places and allowing them to spend their time focusing on getting answers rather than manipulating data.
On principle, I am not a great fan of pulling everything back into the datacentre but IBM does make a compelling case here. I would even go as far as saying that if you move the end-user BI tools onto virtual desktops or terminal services that are run in the same location as the HPC and BI software, you would gain even more advantages in terms of performance and security.
Next week Quest Software will announce the release of TOAD for Cloud, extending the reach of its database tools into the Cloud environment. In the first version there will be support for four datasources - Amazon SimpleDB, Microsoft SQL Azure, Apache HBase and any database with an ODBC driver.
TOAD for Cloud will be freely available off of the Quest website and according to Brent Ozar at Quest Software," the target market is DBAs and data analysts who need to be able to do cross joins between Cloud platforms and their existing databases inside the organisation."
Ozar has already said that Quest will be adding support for more databases over time including the Apache Cassandra project but stopped short of identifying Oracle, Sybase and DB2 as early targets despite the fact that all three either have either shipped or have announced Cloud versions of their products.
As this is built on the same underlying toolset as TOAD for Data Analysts, it is likely that the full reporting capabilities of that product will be available to the TOAD for Cloud product soon.
As increasing numbers of applications move to the Cloud architects need to think about the implications for system performance. A key benefit of the Cloud is access to resources on demand so that as applications need more processor, memory or disk space, these can be provisioned on demand.
But all of this is irrelevant if the applications and the databases on which they rely are too far apart. This creates a significant challenge for a lot of companies and can be compounded by the Cloud model that they choose.
As the application and the database get further apart the latency increases. Typically, those organisations whose databases and applications are in the same Metropolitan Area Network (MAN) should be able to cope with the latency but beyond this the delay causes problems. With some types of data, this is not an issue but with databases where data has to be properly committed it can cause transactions to fail and has the potential to cause data corruption. This problem is particularly acute in high transactional environments.
If might seem simple, therefore, to just co-locate the databases and the application but the two key reasons that prevent companies moving their databases to the Cloud are compliance and data protection. Neither of these should be underestimated especially if you are in the financial services industry.
Even if you can resolve the legal challenges around these two issues, it is not plain sailing. You also need to think about business continuity and backups. Some of this can be solved using snapshots and then copying the data back to the corporate datacentre but this still leaves open the risk of incomplete and lost transactions if you have to rely on the snapshots for recovery.
Another solution is to use Continuous Data Protection. This ensures that the data is copied, at a block level, as soon as it is changed. However, the ability of CDP to work synchronously is still limited by distance and anything outside of the range of a MAN is likely to have to work asynchronously.
One area where this problem can become exacerbated is when your choice of Cloud model revolves around the idea of Cloudbursting. While the different Cloud vendors differ in the implementation the principle here is that you use the Cloud as an extensible set of resources. When you need additional capacity you move an application into the Cloud and when the demand drops, you bring it back to the datacentre.
The most common way of implementing Cloudbursting is to have one copy of the application held locally and another in the Cloud. When you need to switch you simply turn on the Cloud version and redirect users to the location. This same principle can be used for the database that is associated with the application. Two copies, one local and one in the Cloud.
The two copies are kept synchronised in a master/slave configuration. There will be issues as to latency based on distance but in this example it can be managed. When you switch the application, you pause the master database, force any updates across to the slave and then reverse the relationship. Any operations that occur during this process are simply written to a log and then applied after the master/slave relationship is re-established.
As the need for the remote resources ends, you switch the application back to your local datacentre and repeat the switch with the databases.
All of this, however, is little more than a temporary fix. Over time and with heavily used applications the delays in synchronising the two databases can begin to impact performance. What is really needed is a better way to architect the applications so that they can manage remote databases. Unfortunately, none of the key database vendors have developer guidance around the best way to manage this.
So before you rush into the idea of just pushing your applications into the Cloud but retaining your databases, think about latency and whether you need to redesign your application for better performance using remote databases.
Cliff Saran | June 11, 2010 11:54 AM
Ian, perhaps there's a case for caching query results nearer to the apps?
Ian Murphy replied to comment from Cliff Saran | June 14, 2010 2:11 PM
Cliff, just caching the data near to the apps isn't enough. Cached data needs to be refreshed to keep it current and then written to disk in order to be saved properly. Bandwidth is still the issue in order to move that amount of data from local to the Cloud.
There is also the question as to which apps are you refering. The core corporate systems such as CRM, ERP, large databases and mail can sit alongside the data in the Cloud provided you solve the problems of security, backup and disaster recovery.
When we move to applications such as BI, however, you have to monitor the impact of greater use of desktop tools pulling large amounts of data down to the local device. If you are holding core data in the Cloud there is a good case for a replicated local copy which is part of your solution to backup and DR as well as providing a read-only copy for BI users.
At the moment, however, when you ask the question of how to architecturally design for this scenario, most vendors effectively shrug their shoulders and say that they are looking at the problem.
The days when BI was seen as hugely expensive tools that were only accessible to a small number of people are gone. The use of common tools such as spreadsheets and access to query languages have made it easier for end users to gain access to large datasets and do their own BI analysis.
One of the challenges, however, is that data inside the organisation often needs to be supplemented by external data in order to get the greatest value from it. An example of this is taking the sales data from a company database and comparing that with geographical and census data to not only see where certain items are sold but to see if there is a correlation between the population and the type of goods that can be more widely applied across the company.
Now there are a number of companies that work in this area and who charge a lot of money for dong this sort of analysis. None of these tools are easy out of the box and all require a fair bit of massaging to make them work.
At Microsoft TechEd in New Orleans, Amir Netz demonstrated new functionality inside Excel 2010 that has the potential to significantly change how many companies do BI.
Using a feature called PowerPivot, Netz connected to 11 different data sources, something that might not seem that difficult until you realise that these are not just local data sources but include SQL Azure (part of Microsoft's Cloud platform) and data sources that use the Open Data Protocol (OData).
OData was introduced by Microsoft last year as part of the Project Dallas announcement and is a Microsoft initiative.
In the Netz demonstration he announced that Netflix is making its entire DVD database available via OData. Using the OData interface, Netz proceeded not only to pull in the entire Netflix database onto the local laptop but then demonstrate that once you have brought together all your data sources inside PowerPivot, you can then save into SharePoint.
Once you save into SharePoint, the data is uploaded into SQL Analysis Services and becomes its own database. Users are now able to query against that database and use any reporting or analysis tools that they want and all of this has been done relatively seamlessly.
To get the geographical data, Netz used the Bing SDK that has just been released.
In order to view the data, Microsoft will be releasing a new Silverlight control - Pivot Viewer - which adds a lot of extra rich graphical ways to display and present data.
There is no doubt that this is a significant step forward for power users create BI databases and who need to work with complex datasets from multiple locations. Using the OData interface to bring in external data is also very powerful but there are some significant concerns here.
Once the user has create the BI set and then saved it out through SharePoint into its own database who owns the data and how is data integrity preserved? This is something that has not been addressed by Microsoft.
There are also some significant issues from a commercial perspective over data access and data usage. For example, if I make my data available via OData and you then use that for analysis, how do I stop you storing it locally and reusing it?
This is a significant problem for any commercial organisation looking to provide limited use licenses of data and without the ability to track and limit use, the commercialisation of large datasets could end up being heavily restricted.
Microsoft might be making it easier to pull together data sources but it does need to provide the tools to allow commercial data owners to better limit the way their data is used.
It is now over 25 years since I wrote my first piece about databases. I started with mainframe databases but found myself being asked to write about Q&A a small flat file database running on a PC. Like many DBA's I was a little skeptical about where this market was going and what it could be used for. Shortly after the Q&A review appeared I was sent a copy of dBase to look at and my attitude changed markedly. Here was something I could not only design a database in but something that would allow me to do all sorts of complex programming.
I showed it to one of the IT people at the publishing house where my day job way and they dismissed it as simply being a gimmick. A year later I showed the PICK, System Creator and the ease with which I could design an application for estimating book reprints. Not only were they equally dismissive but they refused to accept that this was built over a weekend when the production department had been asking for such an application for over a year.
The most positive thing that they showed me was how to use Querymaster on the ICL 2966 that the company owned. Unfortunately, it didn't take long for me to misform a query that caused absolute chaos requiring a reboot - oops. For me, the world had changed but for the IT department, the fact that I was able to cause such chaos just reinforced their resistance to the idea of non DBAs building databases.
In the intervening years, I have seen the role of the DBA seriously diminished. This was partly because of the explosion of the database on the PC, partly as a result of every development vendor telling us that developers could be a DBA and, sadly, partly due to the refusal of many DBAs to embrace change.
Now, as I look back on all of that and start this blog, I find that I have more than a little sympathy for those DBAs and wonder where the next generation will come from. If you are asking yourself why, the answer is simple.
We live in a world where everything is data. Increasingly, that data is being moved into databases for it to be managed. Both server and client operating systems now routinely use databases to hold critical data such as user logins, passwords and other security data. Collaboration products such as SharePoint and Domino, Content Management Systems and Document Management Systems now take large swathes of unstructured data and store them in databases. This provides advanced search utilities as well as a common location to manage versioning of content.
Enterprises are working hard to squeeze as much value as possible out of the data that they own and much of that is being done through the use of Business Intelligence. The target market here is the "Information Worker" who uses desktop tools to create value from increasingly large and complex datasets.
For all of this to work we need to take multiple sources of data and integrate it. The complexity of the integration tools and the need to build proper data integration suites has created a significant market for software and tools vendors.
But at the heart of all of this lies the database. Unless it is designed properly then it won't work well. It needs the logical design to be matched to the physical processes and the hardware on which the database will run. Alongside this effective indexing is absolutely critical if the database is to meet any reasonable performance criteria.
Management of the database environment also means understanding issues such as backup, snapshots and, as Cloud becomes ever more pervasive, latency a key issue when the application and data are not co-located. Always present is the issue of compliance and how to ensure that data is secure and not leaking to the outside world.
Who is going to ensure that all of this works well? The harsh truth is not the desktop support team or the advanced user. Nor will it be the developers in the IT department, they are too busy trying to keep everything else working. The only people with the skills and the experience to ensure that our data driven world continues to function are the DBAs.
So when you find yourself sitting around wondering why you cannot get the data out of your systems quickly or why your BI application runs slower than the old paper system take a good look at your IT department and as yourself "where are the DBAs?" And if you should discover that they exist tell them to stand up and be counted.
I recently noticed that a small company called BumpTop has been bought by Google - frankly there existence had passed me by.
However this video of the innovative desktop they had developed really impressed me, I think the G-people have made a great aquisition.
The BBC is trailing an article on the take up of cloud computing with a more than passing reference to our Office 2010 launch next week.
I think the closing quote from Gartner is the most interesting:
"All business computing will be more web-enabled," predicts Mr Dreyfuss at Gartner. "For some [companies] it will reach the point where it will be totally web centric."
We are now entering the age of the 'hybrid business computing' model, but then again maybe we are already leaving it.
With (as I write this) a little over 24 hours until the UK General Election results start coming through, I have been caused to wonder about a number of things;
a) Will the pundits and pollsters be accurate? - I just can't make up my mind - I am sort of excited to see how close or how far they are from reality
b) If they are close then the politcal wrangling will be of a level not seen before in the the UK, I suspect the 'net will go into melt down
c) If they are wide of the mark we will have endless analysis of either why they are so innacuate or why the British public is so fickle
d) If polls are close and we do end up with a 'hung' / 'balanced' / 'dysfunctional' (delete as applicable) parliament then he tffect will br that the drive towards some form of PR becomes unstoppable.
And if (d) comes true it will be like living in Belgium - without the beer and chips
Is this an Internet election?
I am not sure ... yes there is a huge amount of blogging, social networking and tweeting from all corners of the political diaspora but it was a good old fashioned television event that has truly galvanised the politically aware public.
I must admit reading and posting tweets is most enjoyable - but in the sort of way that it feels a bit like evesdropping on other peoples conversations or gossiping in the office kitchen.
The more advanced website are making good use of Web 2.0 technoigy (see http://news.bbc.co.uk/1/hi/uk_politics/election_2010/default.stm ) for a great example but will this affect the 'X's on the ballot - well lets see what happens on May 7th.
Of course the real thing may be something like this:
I just spotted this on Youtube and very touching it is too