Wednesday, February 3, 2010

Data Warehousing

Yesterday, I was discussing this with my colleague in office who was trying to break her head in a apt design of a warehouse and why is it not very scalable. Here is what I came up with:

Warehousing is nothing but just one component of business intelligence. The overall goal of BI is to provide information that is required to MANAGE a BUSINESS. This simply means that BI provides data that helps in decision making process and hence many a times also called as decision support system. As we all know, the 3 generic strategies to provide a competitive advantage is:
1) Cost leadership
2) Focus
3) Differentiation
BI should provide data which will help u to reduce cost, improve customer satisfaction and in turn increase profitability.

When you are designing the warehouse architecture, you need to remember that:
1) Change is constant - Business requirements will change
2) Take an incremental approach, do not over design!!!
3) Make sure your existing applications (OLTP as well as warehouse) continue to work
4) Need to allow more data and new type of data to be added
5) The BI view should be intuitive, obvious, visible and memorable
6) When people are thinking, they do not want their thinking process to be interrupted. Make sure you provide necessary drill downs at just one click.

Most of the OLTP systems are modeled as entity- relationship modeling. A warehouse follows dimensional modeling. Every business process is separated in form of a data mart. Make them denormalized and follow the hybrid approach. Need to explicitly well design the data acquisition process (ETL - we could use tools or write our own scripts for the same). Very important to remember in the ETL scripts is sequencing, transaction management (commits and rollbacks), performance and last but not the least Data cleansing. Data Acquisition and cleansing form 50-80% of the effort in a typical warehousing system. Always document the data marts! it is very important to see a consolidated view of a data mart. Enforcing some of a standard across the WH always helps in better undjavascript:void(0)erstandability. Make you document the standards too!. Design the metadata architecture in such a way that it allow sharing across various data models (OMG standards). Last but not the least, the WH must have "a single version of truth". You can use Datawarehousing Bus (conformed dimensions and facts). Most of the companies ignore this as Capacity Planning is a very critical part of the warehouse. Always be ahead in capacity planning to ensure smooth functioning and delivery.

No comments: