The aim of the conference BAFI 2015 is to bring together researchers and developers from data science and related areas with practitioners and consultants applying the respective techniques in different business-related domains. This event provides a platform to stimulate an academic exchange of recent developments as well as to foster the mutual influence between academia and practitioners.
At BAFI 2015 we focus on methodological developments aimed at uncovering information contained in large databases, as well as on business applications in various sectors, such as finance, retail, and telecommunication, among others.
Abstract: Complex data call for the use of specialized techniques for data handling, data analysis and data visualization.Whereas traditional representations via 2-D plots of data has been routinely done for decades, it remains as a challenge to properly and meaningfully represent more complex data (e.g. dynamic, or with mixed types of attributes) and relations between their features.
One key tool for visualization techniques is Mathematical Optimization. A mountain of algorithms, either of exact or heuristic, gradient-based or of combinatorial nature, are/can be used to properly address visualization problems. In this talk we will revisit the use of some of such algorithmic ideas applied to visualization problems, including the representation of dynamic data or features visualization.
Abstract: Television impacts our daily lives. It provides news, entertainment and education. And, with minute-by-minute information on TV ratings and multi-billion-euro income streams from advertising, it is a real Big Data decision problem that requires sophisticated Analytics. TV viewers must be predicted for each 30 second timeslot, normally days in advance, in order to schedule the right advertisement. However, viewership patterns vary by demographic of target audiences, from young adults to elderly couples and the affluent to the unemployed, across geo-regions from rainy North to sunny South, with TV programme schedules and weather driving viewership patterns over time. Without analytical methods, comply decisions quickly become inefficient. We present a case-study from a leading private UK TV channel where we employed analytics to support decision making.
Attendees will learn:
– How a time series approach helped to make sense of terabytes of data
– How to explore viewer behaviour using time series clustering in Descriptive Analytics
– How to forecast future viewers using k-nearest neighbours in Predictive Analytics
– How to optimise advertising scheduling across changing variance in Prescriptive Analytics
Abstract: Currently banks used credit risk models to predict the probability that a credit customer will default in a given window of time. This presentation will discuss a new and more comprehensive approach to modelling credit risk that gives predictions of the probability, for each customer, that he/she will transit from one state of delinquency to another between any two months in the life of the loan. The transitions include not only transitions into further delinquency but also transitions to lesser states of delinquency that is cure. These types of models give much more information that the standard credit scoring model and should enable banks to compute provisions and the amount of capital for credit risk more accurately than at present. The model includes macroeconomic variables and so enables the analyst to gain predictions for different macroeconomic scenarios. The advantages of using this type of model compared to other types of credit scoring models will be described as will the methodology. Results of applying the method to a large dataset relating to credit card holders will be illustrated.
Abstract: Business Analytics is a field with many threads, some rooted in machine learning, some in statistics, and yet others in Operational Research. In this talk, I present a successful application of several analytics threads to a problem in national security. How to clear a harbour that has been seeded with naval mines by an adversary? An underwater autonomous vehicle equipped with a sidescan sonar is able to identify possible mine targets. Once identified, targets may be visited by divers for further investigation and, if necessary, disarming. However, the sidescan imaging process returns many false positive contacts that in fact may simply be rocks. As diver resources are scarce and expensive, and as time to clear the harbour is a factor, it makes sense to revisit targets from a different angle to better classify targets at the UAV imaging stage.
Pablo Estévez Computational Intelligence Challenges in the Big Data Era: Application to Time Domain Astronomy
Abstract: As we are entering in the Big Data Era there are several challenges from the point of view of computational intelligence/machine learning. Among the promising techniques are generalized correlation (correntropy), semi-supervised learning, active learning, deep learning, and new visualization algorithms. In this talk I will review some of these challenges using as an example time domain astronomy, which is facing a paradigm shift caused by the exponential growth of the sample size, data complexity and data generation rates of new astronomical sky surveys. In the next decade, Chile will have 70% of the global astronomical observation capacity, due to new facilities currently under construction such as the Large Synoptic Survey Telescope (LSST). The LSST will begin operations in northern Chile in 2022, and will generate a nearly 150 PetaByte imaging dataset of the southern hemisphere sky. The LSST will stream data at rates of 2 TeraBytes per hour, effectively capturing an unprecedented movie of the sky. The new data-oriented paradigms for astronomy combine statistics, machine learning and computational intelligence, in order to provide the automated and robust methods needed for the rapid detection and classification of known astrophysical objects as well as the unsupervised characterization of novel phenomena. Two examples will be given: a new method for finding periodicities in light curves and a pipeline for detecting supernovae in real-time.
Usama Fayyad “From BigData to Data Science: Challenges & Opportunities in the Rapidly Changing Data Landscape”
Abstract: With a fundamental change in the assumptions underpinning a structured data world dominated by relational databases, we are entering the age of BigData. The combination of economic drivers in enterprise computing, the need to leverage semi-structured and unstructured Data, and the emergence of the Internet of Things (IOT), a dramatic shift in the Data landscape is taking place. The advent of Hadoop and the Open Source stack in this space have accelerated the changes to a point of confusion. Today’s data analyst faces a bewildering environment of technologies and challenges involving semi-structured and unstructured data with access methodologies that have almost no relation to the past. This talk will cover issues and challenges in how to make the benefits of advanced analytics fit within the application environment. The requirement for Real-time data streaming and in situ data mining is stronger than ever. We demonstrate how many of the critical problems remain open with much opportunity for innovative solutions to play a huge enabling role. This opportunity makes Data Science and several related fields critical to almost all future analytical tasks. The talk will use 3 real case studies to demonstrate and discuss the challenges and the great opportunities for BigData and Data Science.
Abstract: A dataset with M items has 2^M subsets anyone of which may be the one satisfying our objectives. With a good data display and interactivity our fantastic pattern-recognition can not only cut great swaths searching through this combinatorial explosion, but also extract insights from the visual patterns. These are the core reasons for data visualization.
Witold Pedrycz “Data Analytics with Information Granules”
Abstract: The apparent challenges in data analytics inherently associate with large volumes of data, data variability, and a quest for transparency and interpretability of obtained results. We advocate that information granules play a pivotal role in addressing these key challenges. We demonstrate that a framework of Granular Computing along with a diversity of its formal settings offers a badly needed conceptual and algorithmic setting instrumental for data analytics.
We elaborate on selected ways in which information granules and their processing address help in coping with abundance of data. A suitable perspective built with the aid of information granules is advantageous in realizing a suitable level of abstraction and forming sound, problem-oriented tradeoffs among precision of results, easiness of their interpretation, value of the results and their stability. All those aspects emphasize importance of actionability and interestingness of the produced findings.
Discussed are ways of forming information granules carried out on a basis of abundant data. We show an involvement of efficient granular mechanisms facilitating an inclusion of domain knowledge and making the results of ensuing data analytics user-centric. The development of information granules of higher type and higher order is advocated and their unique role in realizing a hierarchy of processing and coping with a distributed nature of available data is presented.
The facet of variability of data is addressed effectively by invoking the mechanisms of transfer learning applied to the adjustment of information granules.
Bernhard Scholkopf “Toward Causal Machine Learning”
Abstract: In machine learning, we use data to automatically find dependences in the world, with the goal of predicting future observations. Most machine learning methods build on statistics, but one can also try to go beyond this, assaying causal structures underlying statistical dependences. Can such causal knowledge help prediction in machine learning tasks? We argue that this is indeed the case, due to the fact that causal models are more robust to changes that occur in real world datasets. We touch upon the implications of causal models for machine learning tasks such as domain adaptation, transfer learning, and semi-supervised learning.
We also present an application to the removal of systematic errors for the purpose of exoplanet detection. Machine learning currently mainly focuses on relatively well-studied statistical methods. Some of the causal problems are conceptually harder, however, the causal point of view can provide additional insights that have substantial potential for data analysis.
– Sebastián Maldonado, Universidad de Los Andes, Chile (Chair)
– Cristian Bravo, Universidad de Talca, Chile
– Richard Weber, Universidad de Chile, Chile
– Karla Jaramillo, ISCI, Chile
– Julio Casanova, Grupo ISCI Data Science
– Richard Weber, Universidad de Chile, Chile (Chair)
– Galina Andreeva, University of Edinburgh, UK
– Roberto Battiti, University of Trento, Italy
– Bart Baesens, KU Leuven, Belgium
– Cristián Bravo, Universidad de Talca, Chile
– Emilio Carrizosa, Universidad de Sevilla, Spain
– Sven Crone, Lancaster University, UK
– Matt Davison, Western University, Canada
– David Díaz, Universidad de Chile, Chile
– Fernando Gomide, Universidade Estadual de Campinas, Brazil
– Rudolf Kruse, Otto-von-Guericke Universität Magdeburg, Germany
– Fazel Famili, University of Ottawa, Canada
– Cristian Figueroa, Sales Manager SAS and External Professor Universidad de Chile, Chile
– Jose Guajardo, University of California Berkeley, USA
– Hisao Ishibuchi, Osaka University, Japan
– Sebastián Maldonado, Universidad de Los Andes, Chile
– Christophe Mues, University of Southampton, UK
– Witold Pedrycz, University of Alberta, Canada
– Karim Pichara, Pontificia Universidad Católica de Chile, Chile
– Bárbara Poblete, Universidad de Chile, Chile
– Alejandro Rodríguez, Universidad de Talca, Chile
– Vania Sena, University of Essex, UK
– Alex Seret, Universidad de Los Andes, Chile
– Wouter Verbeke, Vrije Universiteit Brussel, Belgium
– Graham Williams, ATO, Australia
BRIEF BIOGRAPHICAL SUMMARY
USAMA M. FAYYAD
Usama M. Fayyad, Ph.D. is Group Chief Data Officer at Barclays in London where his responsibilities include building and delivering the data infrastructure for BI, data warehousing, BigData and analytics/insights technologies across the Barclays Group globally as well as data governance, and enterprise Data Architecture. He also took on an additional role at Barclays as CIO of Risk, Finance, and Treasury Technology.
He is Chairman of Oasis500 in Jordan following his appointment in 2010 by King Abdullah II of Jordan to be the founding Executive Chairman. Oasis500 a tech startup investment fund that runs an accelerator, entrepreneurship training program, and angel investment network aiming to fund 500 Internet and Technology startups in the MENA Region. From 2011-2013 he served as Chairman & CTO of BlueKangaroo, a mobile search engine to help consumers benefit from the vast offers environment that is difficult to search and benefit from.
Up until September 2008, Fayyad was based in Sunnyvale, CA as Yahoo!’s chief data officer & Executive VP responsible for Yahoo!’s global data strategy, architecting Yahoo!’s data policies and systems, prioritizing data investments, and managing the Company’s data analytics and data processing infrastructure which processed over 25 Terabytes of data per day. He was the industry’s first Chief Data Officer. Under his EVP role, Fayyad also founded and managed the Yahoo! Research Labs organization with offices around the world to develop the new sciences of the Internet, on-line marketing, Microeconomics, and algorithmic Advertising. At Yahoo! he applied Big Data techniques to content and advertising targeting and built the world’s largest group of data scientist – helping Yahoo! grow its revenues from user targeting by 20 times in 4 years. After Yahoo! and prior to Barclays he founded Open Insights, LLC a data strategy, technology and consulting firm based in Bellevue, WA to help the largest Telcos and Tech companies understand data strategy and deploy data-driven solutions that effectively and dramatically grow revenue and competitive advantages. Open Insights also worked with major private equity and investment firms to help prioritize and analyze investment opportunities.
In 2003 Fayyad co-founded and led the DMX Group, a data mining and data strategy consulting and technology company specializing in BigData Analytics major projects with Fortune 500 clients. DMX Group was acquired by Yahoo! in 2004. In early 2000, he co-founded and served as CEO of Audience Science (digiMine, Inc.), a venture backed company addressing hosted business analytics and leading the market in targeted advertising.
From 1995 to 2000, Fayyad was at Microsoft in Redmond, WA where he led the data mining and exploration group at Microsoft Research and headed the data mining products group for Microsoft’s server division. From 1989 to 1996 Fayyad held a leadership role at NASA’s Jet Propulsion Laboratory (JPL), where his work in the analysis and exploration of Big Data in scientific applications gathered from observatories, remote-sensing platforms and spacecraft garnered him the top research excellence award that Caltech awards to JPL scientists, as well as a U.S. Government medal from NASA.
Fayyad earned his Ph.D. in engineering from the University of Michigan, Ann Arbor (1991), and also holds BSE’s in both electrical and computer engineering (1984); MSE in computer science and engineering (1986); and M.Sc. in mathematics (1989). He has published over 100 technical articles in the fields of data mining, Artificial Intelligence, machine learning, and databases. He holds over 30 patents, is a Fellow of the AAAI (Association for Advancement of Artificial Intelligence) and a Fellow of the ACM (Association of Computing Machinery), has edited two influential books on the data mining and launched and served as editor-in-chief of both the primary scientific journal in the field of data mining (Data Mining and Knowledge Discovery) and the primary newsletter in the technical community published by the ACM: SIGKDD Explorations.
He continues to be active in the academic community serving as Chairman Emeritus and Director of ACM’s SIGKDD Executive Committee which runs the world’s premiere data science, big data, and data mining conferences: the KDD international annual conferences. He is a recipient of the ACM SIGKDD Innovation Award (2007) and Service Award (2003) – the only person to receive both awards.
Fayyad serves on the advisory boards and boards of directors of several private, public, as well as non-for-profit and academic organizations. He regularly delivers keynotes on BigData, Data Mining, Predictive Analytics, Data Strategy, and Entrepreneurship at many international conferences and forums.