Data warehousing has become the newest hype as an essentiality for analysis. Analysis always existed. But since the concept of data warehouse (DW) got popularized the formality of the practice set in. This is the second part of a two part series on data warehousing best practices. (Read the first five best practices here.)
Incorporate these finer aspects in your data warehouse development process and set your DW projects on fast track.
Best practice 6: Plan for flexibility and extendibility of data warehouse solution
A key data warehousing best practice is to ensure that the data model is flexible. The model should be able to extract data from additional source systems. The data model should not just address the current acute needs, but also suffice when these demands grow, without redesigning the entire DW. Flexibility and extendibility are important features that you can equip your DW with. Many a times, scalability is considered only from a performance point of view. Extendibility would entail how easily can more subject areas be added and can columns be added to store excess data.
Best practice 7: Make metadata and impact analysis intuitive, integrated
Metadata, which is a manifestation of the three levels―data model, extract, transform load (ETL), and BI reporting―of a data warehouse, needs to be intuitive. The names of the columns and tables should not be abstract, which nobody would automatically relate to. The informal way of data warehousing is writing scripts and not worrying about metadata. However, the formal way would entail having the technical infrastructure in line with the business strategies and managing the data as well as the changes in the DW.
A common observation is that adding of a new source system impacts current ETL modes. A formal data warehouse with well-built metadata layer will show the jobs that are impacted. However, when working in the script mode, the areas of impact will have to be checked manually and this could be time consuming, cumbersome, and costly.
Best practice 8: Maintain balance and control through data governance and auditing
Audit the data regularly as a part of the data warehousing quality and control process. This will ensure that the data from the source systems and reports match. A data warehousing best practice is to be proactive with scripts written to monitor the usage of data models and reports. Automated notifications will appear in case of mismatches between data from the source and DW. Reports about these gaps should be generated on a daily basis. To maintain data quality, a data champion could be employed.
Best practice 9: Plan and provision for ETL for future growth
In a DW, the ETL infrastructure would contain an ETL tool, servers, and database. If you choose the ETL tool that comes as part of a BI package, you may face flexibility issues. Such tools may not accommodate the source systems and additional data needs. As a data warehousing best practice, take the effort to evaluate and buy an appropriate ETL tool.
Best practice 10: Ensure that the project team has sufficient knowledge of BI
Generally, data warehousing projects are undertaken when organizations plan to deploy business intelligence. From this context, it becomes necessary to evaluate how business is conducted. Such evaluation will form as the foundation of any conversation on developing a data warehouse. The team that works on the DW development, should, therefore, have a fair understanding of business intelligence.
About the author: Amit Agarwal is a seasoned BI professional, heading the APAC and MENA businesses at iCreate, a specialist provider of packaged data warehousing and analytical solutions to the global banking industry.
(As told to Sharon D’Souza)