Data Mining

Data Mining

Data Mining

Data Mining and information extraction companies have been handed a golden opportunity after Sept.11 as the U.S. Department of Defense is actively soliciting technologies to stem terrorism. Here’s a peek at what could be the next jewel in the embattled tech sector.


Put it down to serendipity, even if it emanates from the tragic events of Sept. 11. Due to the apparent failure of the intelligence community to detect terrorist activities, new breeds of technologies – fairly nascent ones such as data mining and information extraction – stand to gain from the government’s sudden and insatiable appetite for information gathering and processing tools.

Companies are secretive and everything is hush hush. After all, the nation’s security is at stake. But it is clear that the market opportunity for data mining and information extraction companies is huge and not exactly quantifiable, because the U.S. government is difficult to define in terms of numbers. The Department of Defense (DoD) has released what are known as Broad Agency Announcements to solicit technologies to combat terrorism. And the response has been immense.

“The government is basically swamped with inquiries,” says Barney Pell, vice president of stratefy og WhizBang Labs, a Provo, Utah-based information extraction company, now working with DoD.

It is obvious that there is a real need for such technologies, considering the large volumes of unstructured data that are fragmented and scattered all over the place. The agencies are also not adept at handling foreign languages such as Arabic, Pell says, adding that WhizBang has already shown that it can pull information off of Arabic Web sites. There is also a colossal need to integrate the data as federal agencies are not used to sharing information even within departments of the same agencies.

“One desk might cover bioterrorism in Ireland, while another desk may be covering bioterrorism in Saudi Arabia and those guys don’t even talk,” Pell declares.

According to him, In-Q-Tel, the venture arm of the C.I.A., is funding several companies in the information extraction space and the companies, in turn, whether they be on the search and retrieval side or the data mining side, are looking at the government as a customer. Companies that were going out of business are suddenly expecting a new lease of life as they approach the government to obtain funding, now that venture capitalists have become circumspect. Mahendra Vora, chairmen and CEO of Intelliseek (funding by In-Q-Tel) declares that 50 percent of companies “knocking on the government’s door” are doing so to survive. But he stressed that the government will not entertain all applicants.

“Every joker is going to try to jump on this and leverage the opportunity, but I have dealt with the people in the government and they have a bunch of highly technical Ph.D.s working on this and they are not jus going to fall for any kind of software,” Vora maintains.

The Potential Players

Intelliseek, based in Cincinnati, Ohio, provides software that allows businesses to become more competitive by extracting unstructured but relevant data from the Internet, Intranet, and Extranets. This enables a company to mine information from e-mails, message boards, chat and discussion rooms on the Internet as well faxes, customer response and documents. The company’s relationship with In-Q-Tel precedes Sept.11, says Vora, but the fact that the nexus now has an added dimension becomes clear when he declines to comment specifically about the company’s dealings with the DoD, revealing only that he is working under “strict instructions.”

Another company that is working with federal agencies, but declines to reveal details in digiMine based in Bellevue, Wash. Usama Fayyad, digiMine’s co-founder, president & CEO, believes that data mining is the next generation of data processing technology and is desperately needed in today’s Data Mining 2world. Fayyad, who previously worked with the National Security Agency and the C.I.A. on an advisory level, says that data mining solutions can significantly reduce the number of documents that agencies have to view because it can flag “certain interesting events” based on context, rather than randomly flagging words that appear violent – says “bomb.”

“That capability to apply that extra intelligence to understand data mining can add value,” Fayyad explains. “Because of the volume, you have to be extremely intelligent about which events you flag so that your false negative rate [the number of times you flag an event when there is nothing there] is substantially brought down, without missing any of the true positives.

This however, requires configuring the algorithms that run through the data specifically to the domain in which they are being used, Fayyad stresses.

Academics in the data mining and information extraction field have also been receiving renewed interest from federal agencies. Rajeev Motwani, a professor in the department of computer science at Stanford University, is one such individual. The time that Osama bin Laden’s name first emerged was during the embassy bombings in Kenya and Tanzania in 1998. Soon after, Motwani approached the agencies saying that he could do a project to try and identify who is meeting whom and flagging these connections. That could potentially have identified organizations interacting with one another. The project did not work out. The project did not work out. However, since Sept.11, there has been a dramatic shift, as more resources are now being spent on such tools. Pell of WhizBang believes that most of the money spent on intelligence will now be channeled to intelligence technologies.

“Part of this is because the intelligence community has become aware that these guys [terrorists] are using the Internet for their own communications.” Motwani says.Data Mining 3

Apart from information extraction and data mining companies, search engines, by their very nature seem to be an obvious choice for the U.S. intelligence community because they deal with large amounts of data and mine them to respond to specific queries. Pell confirms that the government is indeed talking with search engines because they enable it to anonymously monitor sites.

“If you are sitting on a government machine browsing sites, then those sites will know that they are being monitored, but a search engine foes and looks at those sites every day,” Pell explains. “So, if the government can piggy back on top of the search engines, it can do passive monitoring.”

Google, In-Q-Tel and the Defense Advanced Research Project Agency (DARPA) declined requests for an interview.

The Next Six Months

Motwani believes that the next six months will provide a window of opportunity for companies to take advantage of the immediacy of Sept.11. However, he cautions against thinking that because federal agencies are funding these efforts, there will be a quick and easy solution.

“I think it would be naïve to think that there’s any technology that is there today that can be packaged and by January be up and running inside the C.I.A. – we are very very far from it,” Motwani says. “The question is, “Can you facilitate the task of the human analyst and provide the tools that will leverage the time better?”

Data Mining 4

The challenge lies not so much in getting the data, integrating it or processing large volumes and putting it in some canonical form, he argues. The task lies in running efficient algorithms that will be able to make statistical linguistic and semantic correlation across various data.

Although the next six months will be crucial to see which company can take charge and customize its products for the intelligence community, no one doubts that data mining is here to stay. Referring to Moore’s law, which states that every 18 months the processing capacity in the world doubles, and an empirical law that postulates that the storage capacity in the world doubles, and an empirical law that postulates that the storage capacity in the world doubles every nine months, Fayyad of digiMine concludes that the world has more data than it can process. This makes the adoption of data mining technologies both for government intelligence or business intelligence, inevitable.

“To me it’s not a question ‘if’,” Fayyad says. “To me it’s a question of ‘when’.”

By: Arundhati Parmar

Source: silicon india


Leave a Reply