Combing Through the Cosmos – Usama M. Fayyad, Ph.D.

Data mining is hard work, takes rocket science, and costs a fortune. But its well worth it when the stakes are high in a high-churn carrier universe. New approaches such as outsourcing bring the whole process down to earth.

Consider the heavens. The night sky may have a low churn rate among its membership of stars, but there are nevertheless any number of large-scale problems up there in the darkness that lend themselves to data mining. That’s why it’s astronomy we have to thank for data mining, invented first when Johannes Kepler spent decades crunching (in longhand) a set of increasingly narrowing segments of orbital data that had in turn taken Tycho Brahe a lifetime to gather. It was the stars again, much later, when NASA’s Jet Propulsion Lab gave Usama Fayyad the task of figuring out which distant pinpoints of light were stars and which were galaxies.

Kepler wound up with his laws of planetary motion, but his method for obtaining them was something that Newton, his contemporary, dismissed as guesswork. Fayyad, on the other hand, had begun an ongoing process of refining algorithms that could be used to examine large stores of data. Look at that data the right way, with the right attention to winnowing out the most significant details, and it turns out, you gain insights that you wouldn’t have spotted merely by running your eyes down the columns (which you couldn’t do anyway – we’re generally talking about genuinely huge quantities of data). What kinds of insights? Oh, just stuff like which of your subscribers are most likely to churn. After a stint at Microsoft, where he led the Data Mining and Exploration Group from 1995 to 2000, Fayyad left, along with Nick Besbeas and Bassel Ojjeh. Both these colleagues had gotten their hands very dirty in the world of data warehousing by running the warehouse server farm for Microsoft’s MSN service – at that time probably one of the world’s largest data warehouse operations. Together, the three founded DigiMine (Kirkland, WA – 425-896-1700), a company that takes the complexities of data mining out of their clients’ hands. As an outsourced service, DigiMine takes a company’s data (uploading it periodically over a VPN), matches it up with the best mining algorithms, and turns findings into the meaningful reports from which business insights are derived.

DigiMine also handles the grunt work of running the enormous databases that make up data warehouses. Besbeas, the executive vice president of sales and marketing, says building that data warehouse “is probably the hardest part of successfully using these types of technologies for business.” This where his background at MSN comes into play: “We ran a data-warehouse facility that was probably as big as anything else on the Internet. We were getting data from some 400 NT servers every day, synthesizing it, and providing reports back to the business units.

“In many ways, MSN is very similar to a web telephony or wireless business, where you have a large subscription base that you’re trying to manage, and where churn is a major consideration. Being able to move the churn rate by a couple of points has very serious implications for the bottom line.

Fighting Churn

Note that you don’t know what you’re looking for when you put on your dungarees and your boots to go data mining. You don’t know how many pinpoints will be galaxies, don’t know that the shape the orbit describes turns out to be an ellipse, don’t know that your youngest customers make purchasing decisions on the same criteria as your oldest, and so on. If these things were obvious, you wouldn’t need to mine for them, now, would you.

So, digging into the unknown and letting your algorithms find the patterns in the data is what mining is about – in contrast to the intimately related but somewhat different pursuit of data analytics, where you’re more likely to be looking at data that’s already been aggregated and presented in a chart of some kind. In other words, the kind of insight you’re able to have has been predetermined by the chart and the way the data has been aggregated. Looking at a P&L report is not data mining, though it’s certainly possible to perform highly relevant analytical functions with it.

When you use data mining, though, you do have a target in mind, and the best one you may choose is a good segmentation of your customer base as it relates to likelihood of churn. Because if you can figure out who’s about to churn, plus the chances of dissuading them, then you can make appropriately targeted offers to keep the customers with real revenue potential in the fold.

Finding Mr. Goodsegment

While the trick to knowing who to call and who gets which special offer may frequently boil down to knowing which segment a given customer belongs to, there’s an added difficulty in discovering those segments in the first place. Often, companies trying to analyze their data arbitrarily decide that, if segmenting customers by age, they can simply divide everyone into a few, round-number age groups – under 20 years, 21 through 30, and so on. This categorizes everyone in the customer database, but it may not produce any meaningful insights simply because these segments are arbitrary. They may actually mask the behavior of more meaningful age groupings. If all the customers aged 25 to 35 make similar buying decisions, for instance, you’ll cut the group in half if you segment on 21-to-30 and 31-to-40 boundaries.

In a data-driven world, we want segmentation that optimizes both efficiency and effectiveness. To be efficient, our breakdown should produce the lowest reasonable number of segments. To be effective, we want to make sure that any factors that would produce significant differences in commercial outcomes are taken into account in our segmentation. For instance, we might want to consider gender in conjunction with age.

At the risk of triggering flashbacks of high school math phobia, note that this little segmentation exercise requires solving multiple equations simultaneously. Given that we might have hundreds of possible data dimensions to sort through at once, we need a purpose-built tool. Once such instrument comes from that venerable provider of statistical software, SPSS (Chicago, IL – 312-651-3000). If you took any sort of serious statistics class in college in the modern computer era, there’s a strong statistical probability that you used an SPSS statistics package.

The package is called Clementine, and version 6.0 of the product shipped late last year. Because the package features a visual interface, users build a map of their data-mining project – called a “stream” – by selecting icons that represent steps in the process. It’s possible to interject domain-specific business rules into the stream, and SPSS simplifies this stream-building process by including a number of application templates – collections of streams, sample data, and documentation commonly used in specific applications. One of the earliest data-mining “workbenches” on the market, Clementine has grown to hold one of the largest shares in an admittedly small market. Some 300 organizations worldwide use Clementine, including British Telecom, Reuters, and Unilever. The software workbench, which starts at about $50,000, has a client/server architecture, with both pieces running on relatively souped-up NT systems. The server will happily analyze data from pretty much any mainstream back end database.

Serious Crunching

Though figuring out the appropriate algorithms and goals for analyzing data are more than half the data mining battle, there are other knotty problems to deal with before we know which customers are about to jump ship, which ones we’ll struggle to keep on board, and which ones we’ll make a point of ushering over the edge. First and foremost is scale: The daily operations of large-scale communications carriers crank out huge volumes of data. Merely chewing through the daily call records is enough to break down plenty of supposedly “scalable” systems.

“We work with customers who have hundreds of millions of records instead of hundreds of thousands of records,” says Matthew Doering, senior vice president and CTO of QueryObject Systems Corporation (Roslyn Heights, NY – 800-522-6302). “We work with customers who have high dimensionality – eight, nine, ten dimensions instead of three or four.” QueryObject Systems provides customers with the software it takes to build a multi-dimensional “fractal cube.” The software then runs on customer-owned servers.

To understand what’s meant by “dimension,” imagine a two-dimensional grid – a big graph. One axis of the grid might be the “Originating U.S. State” dimension – there’d be fifty categories arranged along this axis. The other axis might be “call duration in minutes,” with a category for every duration that actually happened to have been recorded. This would be a two-dimensional configuration and each intersection of a category in one dimension with a category from the other dimension would contain a metric – how many calls there were of that duration originating in that state.

Of course, those are only two dimensions. There could (and probably will) be lots of other dimensions. To continue our hypothetical calling record example, there would be dimension for the state the call was made to, a dimension for the kind of calling plan under which the call was made, and so on. Each of these dimensions will intersect with all the other dimensions (which means we have a cube well beyond the three dimensions that most of us are used to, but mathematics stopped worrying about the niggling details of real-world cubes some years ago).

The trick to getting any kind of real-world use out of this multi-dimensional cube, it turns out, is to come up with a polynomial equation that describes the location of all the metrics within the cube’s interior space. The equation, as it happens, is a fractal equation. “It’s self-described, it’s iterative,” says Doering. Putting too much weight on the fractal part makes Doering uneasy, because people associate fractals with the lossy compression schemes used for packing big graphics files into smaller packages. While it’s true that the data points stored in a QueryObject fractal cube are aggregates of individual data records, the cube itself is compacted with no loss of any of those aggregated points.

So, Doering says, the process of building a cube is one of “reading in the data and breaking it down into metrics and dimensions. From the dimensions it creates an algorithm – you can look at it as just a complex index – that allows us to directly access the answers we’re looking for.”

When it comes time to query the data stored in the cube, he says, “I ask a question, and the question is decomposed into a series of coefficients of an equation. By solving the equation, I know the exact area in the cube where my answer lies. It’s not like a relational database, where I have to scale through a table, or run through indexes, or do joins. Literally by solving the equation, I can know the exact space in the cube to get my answer. So it’s very, very fast in terms of its querying, because I’m not handling a lot of information to get you the answer you need.”

The company’s impetus to make an almost magically fast back end for a query engine came out of experience in previous jobs, where both Doering and Robert Thompson, the CEO, developed query creation interfaces. “No matter how good we made the front-end tool,” Doering notes, “we were always hampered by the back end. Creating a tool that could create a complicated query that required three days to run, didn’t really solve the business problem.” So Doering and Thompson decided to tackle the problem of making the back end exponentially more responsive. “So when I ask the killer question,” Doering says, “I get the answer in a couple seconds.”

WorldCom, for example, needed to get a handle on patterns in its international call detail records (CDRs). The company had a six-hour window – 3 to 6 a.m. – in which information had to be collected, appended to previous data, and aggregated in various ways so that reports could be generated the following day. Conventional approaches to aggregating the data from all the switches onto relational databases far exceeded the time window and resulted in query response times measured in hours.

With QueryObject’s fractal cube technology, the application now gathers raw CDRs daily and integrates them into the previous 179 days of call data, which is then used to update two key MCI data marts, one for international traffic data, and one for a complete overview of aggregate CDRs. QueryObject performed this quickly enough, in fact, to upgrade the process to handle the entire past year’s worth of calls.

Similarly, Telecom Italia found it too could store 18 months of call data, to the tune of 500 million CDRs per month.

The CRM Side

Insight doesn’t matter if you can’t act on it. Armed with a better understanding of customer segments, a marketing manager might launch finely targeted initiatives. By and large, this is a well-understood response to mined data. But it’s a much trickier art to incorporate this sort of insight into daily, one-on-one customer interactions. A leader in CRM analytics is E.piphany (San Mateo, CA – 650-356-3800), whose E.5 software release we discussed back in October of last year (“CRM Intelligence at Your Service”).

E.piphany’s Brad Wilson, director of product marketing, says “The number one trend in 2001 for contact centers is trying to move things to a more analytic foundation, just because the effort to do things manually requires tons of training and because manually entered, rules-based technology quickly becomes unmanageable.”

Wilson notes that “classic call center applications show you screens and data, then let the human being trying to figure out the connections. The new way to do things is to analyze that data, either using offline analytics or, increasingly, using real-time analytics, and predict what this pattern of data implies about behavior or preferences.”

Although E.piphany offers both offline and real-time analysis, Wilson says the focus is on reacting in real time to customer activity. “We find that IT managers increasingly want to link websites and the call center. You want to know when someone who’s calling your call center was on your website this morning, and you want to know what web pages they were looking at when they call this afternoon. If you’re using offline analytics, you typically won’t capture that.” And that’s a shame, Wilson says, because “we find that the best predictors of behavior are what the customer did most recently.”

Does that mean E.piphany thinks offline data mining is passé? Not exactly. Wilson concedes that some things can only be ferreted out by sifting through data warehouses in batch mode. But the results of these mining sessions should be incorporated into the real-time analysis of the current click stream. Offline data “is already jacked up on steroids. It’s data that should be predictive of something, because otherwise you’ve wasted a lot of time and effort in creating it. So if you have an offline customer-churn prediction score, we have a profile of the customer that includes anywhere from ten to a couple hundred pieces of information. That churn likelihood becomes one more piece of data in our profile.”

It’s not as though E.piphany is the only company that can move mining insights into real-time reactivity, we should note. DigiMine, for instance, can boil down their mining results either into predictive values stored in relational tables, or into models incorporated into executables that run on your web server. You can get the less-than-dead-obvious insights lurking in your data and you can turn them into action at the moment of sale by importing the data tables into your existing CRM database or by programming your web server to consult the back-end executable that DigiMine provides.

The bottom line is perhaps best expressed by Digimine’s Besbeas: “Now, without being data mining experts, companies can apply the most advanced technology in the world to their business”(see figure) It’s true that most enterprise operations these days don’t quite have full-fledged data warehouses, don’t quite have data mining operations up and running, and are only partly able to act on customer histories and preferences in real time. But the landscape is changing.”

CRM vendors are getting smarter about employing predictive models within real-time customer-support scenarios. Statistical analysis packages are being tailored to provide services that are approachable to managers who aren’t full-time mathematicians. And data mining expertise is increasingly a matter of outsourcing, rather than hiring your own think-tank full of PhD’s. This stuff could start to add up.

Filling the IP-Driven Warehouse

The traditional telecom use for analyzing data such as CDRs is nothing fancier than detecting toll fraud. Even that relatively mundane task has seen some refining as telephony meets IP. Starting about three years ago, says Dana Kreitter, a Hewlett-Packard ( Palo Alto, CA – 650-857-1501) marketing manager, “Our premise was that as more and more services moved to IP, service providers were going to need the same sort of usage information about their subscribers as they were used to in the circuit-switched world.” Enter HP’s Internet Usage Manager.

Kreitter notes that the days in which service providers had all they could handle just to add servers fast enough are over. “Now they’re starting to figure out how to do this stuff profitably.” They’re realizing that “they can’t afford to spend $300 to capture a new subscriber if that subscriber is just going to jump ship in twelve months. So service differentiation and customer retention are essential for success in the evolving market.

“This metering product we’ve developed collects the usage details from wherever they may be, and we’re able to very flexibly process that data.”

The brains of the product lie in its ability not only to pull together different source records from different kinds of IP servers, but to correlate all the activity back to specific users, then report the information to various parts of an organization in the best format for each. Billing, for example, will want highly specific tallies of minutes used in each session. Upper-level management will be more interested in trends in customer preference.

The IUM works by placing servers called collectors down at the network layer. “To get a high degree of detail about specific users,” Kreitter notes, “you need to look in more than once place. In the telephony world, you’d just look to your switch, pull out your file of CDRs, and it would tell you this subscriber called from point a to point b, the call was successful, it lasted four minutes. When you move to an IP network, first of all, there is no one place to go for the data. As a consequence, you have to have a distributed collection system. I can get part of it here, part of it there, and then I’m going to have to put them together.”

One obvious point of collection, of course, is at the IP router. “A router will give you an association between an IP address and the amount of traffic that flowed through. That’s interesting, but it’s not billable yet, because you don’t know which subscriber was assigned that IP address. For this, we could go to the RADIUS server that handled the subscriber’s authentication when he dialed in and was in fact assigned that IP address. The session information would give us an association between the IP address and the billable customer. Then by just correlating those two data sets, you can end up with a record that gives you data volume by subscriber ID.”

While this may look pretty obvious, it hasn’t been put into practice yet by IP providers, who have so far modeled their businesses on flat monthly rates. This means they haven’t needed to bother with metered billing, but it also means that they’ve put themselves on something of a starvation diet as consumers demand more services at constantly falling price levels. It’s no secret in telecom that enhanced services offered in metered doses offer a chance to plump up the cash flow, but ISP-style businesses have no machinery and no experience here. It’s precisely this expertise that HP has packaged in the Internet Usage Manager.

Decision Trees and Neural Nets

Assuming you’ve got your data all collected into a coherent (and probably huge) data warehouse, you’ve got to figure out what sort of thing you’re trying to find and then use one of a number of rather sophisticated recipes for sifting through the data.

Returning to churn prevention, we said that we’d need to figure out which segments of our subscriber base were most likely to churn. We can do this, to take just two general examples, by decision trees or neural networks.

In either approach, we’re trying to look at all of each customer’s transactions, plus all the information we have on him or her. (We may have asked for basic information such as home address; we may have asked their opinions on surveys.) With all that info, we want to come up with the likelihood, expressed as a percentage, that they’ll bolt in the near future.

With a neural network approach, we consider all the possible relationships of the various kinds (or dimensions, as they are called) of data and provide a “weight” for each of the possible interconnections. The network of connections and weights is “taught” to be more accurate over time, by comparing the predictions it would have made in past customer histories with the actual churn outcomes, so that the weights for the various connections are adjusted and gradually fine tuned.

Although neural networks can, in some instances, make predictions with remarkable accuracy, the internal networks they create are extremely complex when used to consider, say, a thousand different dimensions of data. This makes them difficult to interpret. They keep the “secrets” of how a given customer was scored tightly bound up in the intricate interplay of network connections.

Decision trees, in contrast, are more straightforward. An algorithm designed to construct decision trees will try to find rules that repeatedly divide a customer base until finally only the customers with a high likelihood of churn are left. The algorithm will uncover rules that bring to mind Aristotelian syllogisms: “If the customer makes calls longer than some number of minutes, then they are more likely to churn.”

Why these rules are “true” may require further interpretation, of course. Maybe our pricing for long-distance calls isn’t competitive. Maybe one of our competitors has some capability that makes them more attractive (good pricing on tri-mode wireless handsets, for instance, which might be more attractive to frequent travelers who tend to make lengthy toll calls). Whatever the reason, the advantage of the decision tree approach is that it more clearly points to remedial action. Maybe we can call customers who are ferreted out by this rule and tie them down with a one-year contract with a lower per-minute rate.

By: Robert Richardson