The world’s leading Knowledge Discovery and Data Mining conference (KDD 2013), organized by the Association for Computing Machinery Special Interest Group on Knowledge Discovery and Data Mining (ACM SIGKDD), announced today that Microsoft Research will sponsor and provide the datasets for the annual KDD Cup challenge. This annual data science competition is one of the most prestigious in the world and attracts hundreds of teams, whose goal is to design solutions to complex real world scientific problems that deal with large volume and variety of data. KDD Cup 2013 is designed by a team from Microsoft Research, the Center for Web and Data Science at University of Washington, Tacoma, and the Computational Web Intelligence team at Ghent University. To download the data set, for participation details and deadlines, please visit https://www.kaggle.com.
KDD Cup 2013 will feature datasets from the Microsoft Academic Search, Microsoft’s free academic search engine that covers 49 million publications and over 20 million authors across a variety of domains. Automated systems to ingest authorship details are very noisy leading to frustrations when trying to identify experts and relevant publications. The main challenge of the contest is to design an algorithm that will accurately confirm or deny which papers are written by a particular author. The algorithm is expected to use various metadata about the paper and the author including co-authorship and affiliations. The task was selected after careful deliberation by KDD Cup Chairs for 2013 Claudia Perlich and Brian Dalessandro, Media6°. The KDD Cup challenge is hosted by Kaggle, the world’s leading platform for predictive modeling competitions.
The goal of Microsoft Academic Search is to make it easier for scientists to explore published research, and connect with each other. In addition, given the large number of data sources and amount of noise in publication data, this year’s KDD Cup challenge is to develop solutions to accurately determine paper authorship. Solutions to the challenge are expected to foster the growth of algorithms and approaches for dealing with complex and multivariate big data beyond the current state of the art.
“The mission of Microsoft Research and the Microsoft Academic Search portal is to fuel scientific collaboration that accelerates scientific research,” said Vani Mandava, Senior Program Manager, Microsoft Research and KDD Cup sponsor. “By sponsoring the KDD Cup and providing the data science community with a complex data set, we are not only encouraging best practices and innovation in data science, but hope to bring new opportunities to the global scientific community of students and researchers.”
The rules of the contest were designed by a team of Profs. Martine De Cock, Ghent University, Belgium, Senjuti Basu Roy, University of Washington, Tacoma, Vani Mandava, and Ben Hamner and Will Cukierski, Kaggle. “Students today depend on online tools such as MAS and the research efforts of hundreds of others who have studied the field before or alongside us. The ability to accurately identify and access the relevant research we need, and to connect and collaborate with our colleagues will be of great value to scientists worldwide,” said UWT graduate student Swapna Savvana, who is helping the team construct the dataset for the challenge.
One of the biggest challenges for the research community is the lack of real world large data sets that allow them to develop and test new algorithms and solutions. Since its inception 17 years ago, the KDD Cup has provided data scientists with challenges and data sets from different industries, facilitating advancement on hot issues such as early detection of breast cancer, predicting student performance, developing accurate retail, marketing and recommendations, identifying pulmonary embolisms from image data and more. In addition, since participants have access to the same data sets, results and algorithms can easily be tested and evaluated, creating a model for collaboration between industry and research institutions and building a scientific foundation for future research in a responsible and privacy-sensitive manner.
“The KDD Cup gives data scientists the unprecedented opportunity to explore new applications of algorithms for Big Data,” said Robert L. Grossman KDD 2013 Conference General Chair. “By focusing this year’s challenge on one of the world’s largest research databases, Microsoft Academic Search, we are amplifying the impact of the Cup beyond data science community but to the entire scientific community regardless of discipline, industry or application.”
Winners of KDD Cup will win prizes totaling $15,000USD and will be announced at KDD 2013, the premier international conference on data science, Big Data and data mining, taking place in Chicago, from August 11-14, 2013. A workshop focusing on the solutions will also be held in conjunction with the conference.
With over 2000 members from leading research institutions, universities and business organizations in more than 80 countries, the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) is the premier forum for advancement and adoption of KDD, data science, and data mining over Big Data. SIGKDD’s mission is to provide the growing community of big data and analytics experts with tools and resources to promote the value of knowledge discovery and data mining in today’s data-centric economy. The current SIGKDD Executive Council Chairman is leading data mining expert Usama Fayyad, Ph.D. For additional information, please visit http://www.kdd.org or follow us on Twitter (@kdd_news).
ACM, the Association for Computing Machinery http://www.acm.org, is the world’s largest educational and scientific computing society, uniting computing educators, researchers and professionals to inspire dialogue, share resources and address the field’s challenges. ACM strengthens the computing profession’s collective voice through strong leadership, promotion of the highest standards, and recognition of technical excellence. ACM supports the professional growth of its members by providing opportunities for life-long learning, career development, and professional networking.
Author: EMILIA PALAVEEVA