Virtual Prospecting
From oil exploration to neurosurgery, new tools are revealing the secrets hidden in mountains of data
Just over a century ago, on Jan. 10, 1901, a gusher of oil erupted from a well on Spindletop Hill near Beaumont, Tex.–by far the biggest strike the world had ever seen. Almost overnight, black gold became the capital that funded the growth of powerful corporate empires and paved the way for the automobile. But these days, petroleum engineers are more likely to hit pay dirt in Texas warehouses stuffed to the rafters with magnetic tapes. “Those tapes contain a lot of information that was poorly looked at,” says Michael J. Zeitlin, chief executive of Magic Earth Inc. “You could find a lot of oil in there with today’s technology.”
He’s talking about data mining, his Houston company’s specialty. Data mining harnesses artificial intelligence and slick statistical tricks to unearth insights hiding inside mountains of data. The software is so thorough, and so clever at spotting subtle relationships and associations, that it regularly makes fresh discoveries. The results can point to new business opportunities, novel products, and better manufacturing processes–especially when the results are presented graphically, by means of sophisticated visualization systems. Since the mid-1990s, data mining has taken root in industry after industry (table). “This is basically becoming a business imperative,” declares Harry R. Kolar, head of strategy at IBM’s business intelligence unit. “It’s coming up everywhere, because information sources have just gone crazy.”
Human minds simply can’t cope with the torrents of data unleashed by computers and the Internet. Scientists at the University of California at Berkeley recently plumbed this flood, and the numbers they calculated are dumbfounding. All the information ever produced since mankind began painting pictures in caves and writing on papyrus comes to roughly 18 exabytes. That’s 18 followed by 18 zeros. But what’s really crazy is that 12% of it was generated in 1999 alone. And two-thirds of that, or 1.5 exabytes, was digital.
SECRECY. Among this year’s Top 50 elite, more than one-third are energy and finance companies. Those two sectors just happen to be two of the pioneers of data-mining technology. Now, it would be a stretch to assume that Amerada Hess (AHC) and Phillips Petroleum (P), or Capital One (COF) and Morgan Stanley Dean Witter (MWD), made the list because of data mining. The technology is still fairly new, after all. But there’s little doubt it is becoming a lucrative asset in the battle for market share. Many companies refuse to discuss details, presumably for fear of tipping off rivals.
At Texaco Inc., data mining has clearly paid off in a big way. It played a key role in discovering the huge Agbami oil field off the coast of Nigeria, containing an estimated 1.45 billion barrels of oil. The telltale signs of oil were spotted by reassessing seismic data with Texaco’s GeoProbe data-mining system, which uses animated images to help geologists sift out salient features. Traditional methods with static images had missed the oil. Test wells in 1999 and 2000 confirmed it was a major deposit, and Agbami’s potential value probably was a factor in Chevron Corp.’s (CHV) $35.2 billion takeover of Texaco (TX) last October.
In finance, even small banks are joining the action. Two years ago, Dallas Teachers Credit Union decided to become a full-fledged community bank. But where to build the branch that would launch the initiative? “We had to get it right the first time,” says Jerry Thompson, DTCU’s chief information officer. “Competitors won’t cut you any slack.”
So Thompson called in IBM’s data miners to help comb through demographic data. One target: pockets of people who might open checking accounts. To bankers, that’s “cheap money,” he notes. When DTCU analyzed its customer data, out popped a big surprise:“If a branch was within a 10-minute drive, we had a checking account. But if the drive was 10 1/2 minutes, we didn’t. It was that stark,” says Thompson. Plotting 10-minute drives around prospective sites produced very irregularly shaped markets, he adds. “That’s something you wouldn’t have gotten from a spreadsheet.”
SEISMIC SCREENS. Last November, DTCU opened the branch in north Dallas. It turned profitable in only 90 days. Normally, a new branch takes a year to climb into the black, says Thompson. “Needless to say, we have already entrusted the computer with selecting our next branch location.”To help Texaco spot pockets of oil, Magic Earth’s Zeitlin headed the team that built the first theater for data mining in 1996 (Texaco spun off the technology last year to Zeitlin and nine members of his crew). It features a 25-foot-wide, 9-foot-tall screen on which seismic data are projected by a supercomputer from Silicon Graphics Inc. (SGI) SGI installed the next big-screen center in 1997, at Atlantic Richfield Co., now part of BP Amoco (BP). Occidental Oil & Gas Corp. was among the 25 companies to get one in 1998–and it has slashed the time for making drill/no-drill decisions industry. Chevron has four.
SGI dominates the visualization scene because its computers make the data come alive, as an interactive movie. Geologists can swoop through subterranean rock formations that are color-coded to highlight underground channels and changes in rock density. When the data are transformed into moving images, “something magical happens,” says Zeitlin. “Details you didn’t notice before suddenly pop out at you.” He believes moving images tap a primordial part of the brain. Prehistoric humans became conditioned to pay special attention to things in motion, he says. “It’s what saved our early ancestors from being eaten by Africa’s lions.”
Other times, just the sheer size of the image makes the difference. That happened at Lawrence Livermore National Laboratory in 1999, says Terri M. Quinn, assistant head of scientific computing. For days, a physicist had searched his desktop display in vain for the software glitch that was causing a bomb-related simulation to, well, bomb. Then he ran the top-secret simulation on Livermore’s SGI system–and spotted the error in minutes.
To bring data mining to mainstream markets, three Microsoft Corp. (MSFT ) managers left cushy jobs and founded digiMine Inc. in March, 2000. The startup’s CEO, Usama M. Fayyad, got fed up with a string of data-mining failures. Microsoft was working on pilot projects with several large companies, “and they all said the technology was like magic,” recalls Fayyad. But within a few months, the operations would begin decaying. The databases weren’t kept up to date, so they “effectively turned into large, glorious, expensive data tombs,” he says.
Lack of talent was the chief problem. Programmers and engineers with experience in artificial intelligence and advanced statistics are not easy to find. So Fayyad figured many companies would willingly shell out $10,000 a month to get the whole show taken off their hands–maintaining a database, mining it for key measures, and providing online reports. He hoped his Bellevue (Wash.) venture could attract 10 customers in year one. Actually, digiMine ended up with 30, including Nordstrom, etrieve, and Dialpad.com. Next, Fayyad wants to branch out into manufacturing–an area where he enjoyed some success at Microsoft. Harris Semiconductor and Hughes Electronics (GMH ) use data mining to unmask the causes of problems in factories.
Perhaps no group is being info-inundated more than scientists. The volume of research getting published online is skyrocketing. Now, help is on the way thanks to data mining. For example, a team led by Kurt D. Bollacker at NEC Research Institute in Princeton, N.J., has developed CiteSeer, a program smart enough to rank the importance of new scientific papers on the Web, in part by evaluating the number and significance of the citations and links to other papers.
201 BEAMS. Data mining also promises to help scientists do their jobs, judging from the results of a contest at the University of Maryland School of Medicine. It pits a neurosurgeon against a data-mining program. The challenge is to devise a plan for treating a brain tumor with radiation. While eyeballing X-ray slices of the patient’s brain, the surgeon mentally packs the tumor with as many buckshot-size doses of radiation as possible. Then he decides how to deliver the radiation, using 201 beams on the inside of a giant helmet–part of the Gamma Knife system from Elekta Instrument.
After the computer runs through the same process, the neurosurgeon picks the better plan. Since the trial started in January, “the doctor has tossed out his own plan every time and used the computer’s,” boasts Michael C. Ferris, a computer scientist with the University of Wisconsin’s Data Mining Institute. Ferris developed the software with David M. Shepard, a former student who now works as a medical physicist at the University of Maryland.
So far, the bulk of these nascent data-mining applications has been painstakingly developed in-house at companies like Texaco. On the Web, though, the technology is spreading rapidly. Web sites use it to analyze their visitors’ behavior and tailor customer service or suggest related links. Such software is available from scores of vendors: giants such as SAS Institute, Oracle (ORCL), and NCR (NCR), along with newcomers like ClearForest, Megaputer, Quadstone, and Surfnotes. If this new way of finding gold in vast amounts of data maintains its current pace, its long-term impact could rival that of the big Texas gusher of 1901.
By: Otis Port