The Drill-Down On Data Warehousing

If you’re not analyzing all the data collected by your ERP (enterprise resource planning) and CRM (customer relationship management) solutions, it might as well be collecting dust on a shelf.

The English language is full of peculiar words that sound like they should mean one thing and actually mean something else. For instance, ever wonder why you park in a driveway and drive on a parkway? When you apply this concept to technology, these terms become more confusing than they are amusing. One technology that seems to take the cake on word usage is data warehousing. When you think of a warehouse, for instance, what are some visuals that come to mind? “When most people think of data warehousing, they think of huge backup storage databases,” says Benjamin Taub, president of Dataspace Inc. (Ann Abor, MI). “Data warehousing is the technology necessary to gather data from disparate systems, integrate it into one place, and make it available for end users.”

Let’s break down Taub’s working definition of data warehousing a few steps further. The first part is gathering data from disparate systems. Even small- to medium-sized enterprises often use applications that either run on disparate platforms (Windows NT and UNIX) or run disparate databases (DB2 and Oracle). If a salesperson, for instance, wants to link customer information from the sales databases (NT-based) with accounting’s database (UNIX-based) to get a better picture of his customers, he has to call the IT person. The IT person has to export the data from the accounting database into SQL-friendly application and then link the data to the SQL database.

The problem with this is threefold: First, the accounting data is no longer live, which means that it loses relevance each day and has to be manually refreshed every time the salesperson wants to do an analysis. The second problem is that if several salespeople are making requests, the IT person can find himself making a full-time job out of manually exporting data and linking tables within a database. The third problem is that the salesperson will typically discover that accounting’s data doesn’t always match up nicely with his data, which makes for “dirty data,” and inaccurate query and reporting results.

Thus, the debate begins: Should accounting to expected to change “W. Smith” in its database to “William Smith” or should sales change “Bill Smith” in its database to “W.” or “William?” Multiply this by thousands of records and multiple requesters, and you begin to see the enormity of the task at hand.

The second part of the data warehouse definition is integrating data into one place. This part of the data warehousing process involves specialized tools know as ETL (extract, transform, load). These tools take all the disparate data, which resides in databases using disparate programming languages, transforms it into a common language, and copies it into a single, integrated

By: Jay McCall

Source: Integrated Solutions

PDF

Leave a Reply