Big Data Mining 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications


Keynote Speakers

Usama Fayyad

Title: Big Data Analytics: Applications and Opportunities in On-line Predictive Modeling

Dr. Usama Fayyad is co-Founder & CTO of ChoozOn Corporation — a search engine service that helps consumers make sense of the chaos of overwhelming offers and deals through personalization and intelligent matching. In 2010 he was appointed by King Abdullah II of Jordan to lead the OASIS-500 as its Executive Chairman — a tech startup investment fund that runs an accelerator/incubator, entrepreneurship training program, and angel network that aims to fund 500 Internet and Technology startups in the next 5 years. In 2008, Fayyad founded Open Insights, a data strategy, technology and consulting firm to help enterprises understand data strategy and deploy data-driven solutions that effectively and dramatically grow revenue and competitive advantages.

Up until September 2008, he was in Sunnyvale, CA as Yahoo!’s chief data officer & Executive VP responsible for Yahoo!’s global data strategy, architecting Yahoo!’s data policies and systems, prioritizing data investments, and managing the Company’s data analytics and data processing infrastructure which processed over 25 Terabytes of data per day. Fayyad also founded and managed the Yahoo! Research organization with offices around the world and which became the premier scientific research organization to develop the new sciences of the Internet, on-line marketing, and algorithmic Advertising. At Yahoo! he applied Big Data techniques to content and advertising targeting and built the world’s largest group of data scientist – helping Yahoo! grow its revenues for targeting by 20x in 4 years. In 2003 Fayyad co-founded and led the DMX Group, a data mining and data strategy consulting and technology company that was acquired by Yahoo! in 2004.

He is an active angel investor in the U.S. and in the Middle East and specializes in early-stage tech companies. He is part of the U.S. Dept of State Delegation on Entrepreneurship in the Middle East and North Africa. For more details on Usama’s background, see his personal web site at:

Abstract: Virtually all organizations are having to deal with Big Data in many contexts: marketing, operations, monitoring, performance, and even financial management. Big Data is characterized not just by its size, but by its Velocity and its Variety for which keeping up with the data flux, let alone its analysis, is challenging at best and impossible in many cases. In this talk I will cover some of the basics in terms of infrastructure and design considerations for effective an efficient BigData. In many organizations, the lack of consideration of effective infrastructure and data management leads to unnecessarily expensive systems for which the benefits are insufficient to justify the costs. We will refer to example frameworks and clarify the kinds of operations where Map-Reduce (Hadoop and and its derivatives) are appropriate and the situations where other infrastructure is needed to perform segmentation, prediction, analysis, and reporting appropriately – these being the fundamental operations in predictive analytics. We will thenpay specific attention to on-line data and the unique challenges and opportunities represented there. We cover examples of Predictive Analytics over Big Data with case studies in eCommerce Marketing, on-line publishing and recommendation systems, and advertising targeting: Special focus will be placed on the analysis of on-line data with applications in Search, Search Marketing, and targeting of advertising. We conclude with some technical challenges as well as the solutions that can be used to these challenges in social network data.

Bharat Rao

Title: Rapid Learning Systems to improve patient outcomes and reduce healthcare costs

Bharat Rao, PhD is Senior Director and Head of the newly-formed Center for Innovations in the Health Services (HS) business unit in Siemens Healthcare, headquartered in Malvern PA. The Siemens HS unit develops and markets enterprise information technology and business intelligence solutions for hospitals and other healthcare providers. The Center for Innovations was established in May 2012 with the vision to foster thought-leadership for Siemens in the dynamic field of healthcare IT. The Center’s goals are to create a continuous-innovation pipeline of new products, services and capabilities; to develop and rollout processes to translate innovation into commercial success; establish collaborations with luminary customers, academic & industry partners; and to drive an innovation agenda that impacts the entire HS portfolio and workforce.

Previously, Dr. Rao led the Knowledge Solutions group, Healthcare Analytics and Business Intelligence which develops and deploys data analytics solutions that analyze millions of patient records, impacting three major areas in healthcare. These include, automated quality measurement and decision-support from hospitals EMR’s, computer-aided diagnosis systems to identify suspicious lesions on medical images, and predictive models for personalized medicine. The group launched the first-to-market startup offering in healthcare quality, Soarian Quality Measures (and its cloud counterpart, the Quality Reporting Service) which is now an essential part of Siemens solution to satisfy the meaningful use requirements for US health reform.

Abstract: Trends over the past two decades indicate that the quantity and precision of diagnostic data available for a single patient has increased dramatically, the amount of published medical knowledge is doubling every few years, and a number of promising therapies have been developed. Despite all these advances, medicine remains largely mired in a ‘one size fits all’ paradigm that has led to an explosive increase in patient costs without a concomitant improvement in patient care.

We are on the verge of a paradigm shift in healthcare. Traditionally, medical knowledge has being derived from carefully conducted clinical studies, namely evidence-based-medicine; now, a new form of evidence is emerging – that created by rapid learning systems that will mine vast amounts of electronic patient data collected in routine care, to create “evidence generated medicine.” Thus, mining the millions of patient records collected routinely in the daily care of patients has tremendous potential to individualize care to the specific patient.

In this presentation, I will describe a first-of-its-kind US/Euro health IT network consisting of 10 cancer centers in 5 nations. In this network, cancer centers are able to securely learn personalized models from patient data collected across all centers. Learned models for predicting patient survival and side effects for 3 different cancers (lung, rectal, larynx) have been made available to the public and physicians at .

Guirong Xue

Title: Big Data Practice at

Dr. Guirong Xue, Senior Director at He leads the department of the Web Search and Web Data Analytics. From 2006 until 2010, he worked at Department of Computer Science and Engineering, Shanghai Jiaotong University. His research interests are Cloud Computing, Web Searching, Recommendation Systems and Computational advertising. He has published more than 70 Papers on ACM TOIS, ACM TIST, DMKD, IEEE TKDE, Information Retrieval, ACM SIGIR,ACM SIGKDD, ACM WWW, NIPS, ICML, AAAI, ACL etc. He co-organized CCIR 2010 and severed as the Co-chair of WWW2012 Internet Monetization Session.

Abstract: The vision of Aliyun is to become an Internet Data Sharing Platform. We realize that the next frontier of Big Data depends on effectively managing, using, and exploiting these heterogeneous data. It is now possible to extract knowledge and useful information in ways that were previously impossible, and to gain new insights in a timely manner. In this presentation, Firstly, I will introduce the big data collections at Aliyun, which include the whole Web content and user accessing log. Then, I will give an introduction about the infrastructure for processing the Big Data and show some typical applications on that platform. Finally, some challenging issues will be proposed for processing and mining the big data.

Invited Speakers

Charles Parker, BigML

Title: Unexpected Challenges in Large Scale Machine Learning

Dr. Charles Parker received his Ph.D. in Computer Science in 2007 under Professor Prasad Tadepalli at Oregon State University. His thesis, “Structured Gradient Boosting”, presented a gradient-based approach to structured prediction useful in information retrieval and planning domains. From 2007 to 2011, he worked for the Eastman Kodak Company on various problems in data mining for machine reliability, scanned document analysis, and consumer video indexing, and was promoted to the rank of Research Associate. He currently works for BigML, Inc., helping to develop a web-scale infrastructure and interface for machine learning. His work has appeared in ICML, AAAI, ICDM, and other notable venues.

Abstract: In machine learning, scale adds complexity. The most obvious consequence of scale is that data takes longer to process. At certain points, however, scale makes trivial operations costly, thus forcing us to re-evaluate algorithms in light of the complexity of those operations. Here, we will discuss one important way a general large scale machine learning setting may diff er from the standard supervised classification setting and show some the results of some preliminary experiments highlighting this di fference. The results suggest that there is potential for signifi cant improvement beyond obvious solutions.

Vivekanand Gopalkrishnan, Deloitte

Title: Big Data, Big Business: Bridging the Gap

Vivekanand Gopalkrishnan is the Director of Research in Deloitte Analytics Institute | Asia. As Deloitte’s formal thought leadership organization, the Institute is responsible for increasing Deloitte’s market eminence across industry domains through Analytics. The Institute achieves differentiation by bringing research to life and into the heart of business. Vivek’s multi-disciplinary team of data scientists create innovative R&D based solutions and publicize insights arising from scaling out research efforts.

Vivek has nearly 20 years of experience in research, teaching, consulting and practical application development to solve real-world business problems using analytics. He advises clients in their data strategy for driving insights to business action, and specializes in architecting innovative solutions that mine insights even when the data is not well-behaved. His research expertise covers data mining, machine learning and data warehousing, and he has published over 50 papers in these fields. Vivek continues to actively serve the academic and research communities. He is on the editorial board and reviewing committee of leading research journals, and on the program committee of top international data mining and information management conferences. As a passionate educator, he continues to guide university academic programmes and research councils in analytics.

Abstract: Business analytics, occupying the intersection of the worlds of management science, computer science and statistical science, is a potent force for innovation in both the private and public sectors. The successes of business analytics in strategy, process optimization and competitive advantage has led to data being increasingly recognized as a valuable asset in many organizations. In recent years, thanks to a dramatic increase in the volume, variety and velocity of data, the loosely defined concept of “Big Data” has emerged as a topic of discussion in its own right – with different viewpoints in both the business and technical worlds. From our perspective, it is important for discussions of “Big Data” to start from a well-defined business goal, and remain moored to fundamental principles of both cost/benefit analysis as well as core statistical science. This note discusses some business case considerations for analytics projects involving “Big Data”, and proposes key questions that businesses should ask. With practical lessons from Big Data deployments in business, we also pose a number of research challenges that may be addressed to enable the business analytics community bring best data analytic practices when confronted with massive data sets.

View Online / PDF

Leave a Reply