Keynote Speaker – (BAFI 2015) Second Conference on Business Analytics in Finance and Industry, Santiago – Chile, December 14-16, 2015

Presentation

The aim of the conference BAFI 2015 is to bring together researchers and developers from data science and related areas with practitioners and consultants applying the respective techniques in different business-related domains. This event provides a platform to stimulate an academic exchange of recent developments as well as to foster the mutual influence between academia and practitioners.

At BAFI 2015 we focus on methodological developments aimed at uncovering information contained in large databases, as well as on business applications in various sectors, such as finance, retail, and telecommunication, among others.

Organizers:

Sponsor Institutions:

Sponsor Companies:

Contributors:

Program

Download the complete Conference Program

Download the Conference Proceedings

Keynote Talks

Panel (only in Spanish)

Keynote Talks

Ricardo Baeza-Yates “Big Data or Right Data”
Abstract: Big data nowadays is a fashionable topic, independently of what people mean when they use this term. But being big is just a matter of volume, although there is no clear agreement in the size threshold. On the other hand, it is easy to capture large amounts of data using a brute force approach. So the real goal should not be big data but to ask ourselves, for a given problem, what is the right data and how much of it is needed. For some problems this would imply big data, but for the majority of the problems much less data will and is needed. In this keynote we explore the trade-offs involved and the main problems that come with big data in the Web: scalability, redundancy, bias (and bias!), noise, spam, and privacy.
Emilio Carrizosa “Mathematical Optimization Tools for Data Visualization”

Abstract: Complex data call for the use of specialized techniques for data handling, data analysis and data visualization.Whereas traditional representations via 2-D plots of data has been routinely done for decades, it remains as a challenge to properly and meaningfully represent more complex data (e.g. dynamic, or with mixed types of attributes) and relations between their features.

One key tool for visualization techniques is Mathematical Optimization. A mountain of algorithms, either of exact or heuristic, gradient-based or of combinatorial nature, are/can be used to properly address visualization problems. In this talk we will revisit the use of some of such algorithmic ideas applied to visualization problems, including the representation of dynamic data or features visualization.

Sven Crone “Business Analytics for TV Audiences – A Case Study on Combining Descriptive, Predictive and Prescriptive Analytics”

Abstract: Television impacts our daily lives. It provides news, entertainment and education. And, with minute-by-minute information on TV ratings and multi-billion-euro income streams from advertising, it is a real Big Data decision problem that requires sophisticated Analytics. TV viewers must be predicted for each 30 second timeslot, normally days in advance, in order to schedule the right advertisement. However, viewership patterns vary by demographic of target audiences, from young adults to elderly couples and the affluent to the unemployed, across geo-regions from rainy North to sunny South, with TV programme schedules and weather driving viewership patterns over time. Without analytical methods, comply decisions quickly become inefficient. We present a case-study from a leading private UK TV channel where we employed analytics to support decision making.

Attendees will learn:
– How a time series approach helped to make sense of terabytes of data
– How to explore viewer behaviour using time series clustering in Descriptive Analytics
– How to forecast future viewers using k-nearest neighbours in Predictive Analytics
– How to optimise advertising scheduling across changing variance in Prescriptive Analytics

Jonathan Crook “Intensity Models to Predict Credit Card Transitions”

Abstract: Currently banks used credit risk models to predict the probability that a credit customer will default in a given window of time. This presentation will discuss a new and more comprehensive approach to modelling credit risk that gives predictions of the probability, for each customer, that he/she will transit from one state of delinquency to another between any two months in the life of the loan. The transitions include not only transitions into further delinquency but also transitions to lesser states of delinquency that is cure. These types of models give much more information that the standard credit scoring model and should enable banks to compute provisions and the amount of capital for credit risk more accurately than at present. The model includes macroeconomic variables and so enables the analyst to gain predictions for different macroeconomic scenarios. The advantages of using this type of model compared to other types of credit scoring models will be described as will the methodology. Results of applying the method to a large dataset relating to credit card holders will be illustrated.

 

Matt Davison “Analytics when the Stakes are High: Uniting Several Business Analytics Techniques to Solve a Problem in Harbour Security”

Abstract: Business Analytics is a field with many threads, some rooted in machine learning, some in statistics, and yet others in Operational Research. In this talk, I present a successful application of several analytics threads to a problem in national security. How to clear a harbour that has been seeded with naval mines by an adversary? An underwater autonomous vehicle equipped with a sidescan sonar is able to identify possible mine targets. Once identified, targets may be visited by divers for further investigation and, if necessary, disarming. However, the sidescan imaging process returns many false positive contacts that in fact may simply be rocks. As diver resources are scarce and expensive, and as time to clear the harbour is a factor, it makes sense to revisit targets from a different angle to better classify targets at the UAV imaging stage.

How best to do this represents both an interesting problem in data-driven classifier theory (what is the optimal angle relative to the first one for a second or third look at a target) and an interesting travelling salesman problem in a non-traditional space obtained by adjoining 2D spatial coordinates with a view angle coordinate.
In this talk I present collaborative work between my team and the Royal Canadian Navy in which we use a unique dataset of sonar images of various real and simulated mine targets and rocks together with operational characteristics of real UAV vehicles to solve both problems.
I end the talk with discussion on the need for increased emphasis on game theory for the business analytics toolbox, both for security problems such as this and for more traditional business analytics problems in credit scoring etc.

 

Pablo Estévez Computational Intelligence Challenges in the Big Data Era: Application to Time Domain Astronomy

Abstract: As we are entering in the Big Data Era there are several challenges from the point of view of computational intelligence/machine learning. Among the promising techniques are generalized correlation (correntropy), semi-supervised learning, active learning, deep learning, and new visualization algorithms. In this talk I will review some of these challenges using as an example time domain astronomy, which is facing a paradigm shift caused by the exponential growth of the sample size, data complexity and data generation rates of new astronomical sky surveys. In the next decade, Chile will have 70% of the global astronomical observation capacity, due to new facilities currently under construction such as the Large Synoptic Survey Telescope (LSST). The LSST will begin operations in northern Chile in 2022, and will generate a nearly 150 PetaByte imaging dataset of the southern hemisphere sky. The LSST will stream data at rates of 2 TeraBytes per hour, effectively capturing an unprecedented movie of the sky. The new data-oriented paradigms for astronomy combine statistics, machine learning and computational intelligence, in order to provide the automated and robust methods needed for the rapid detection and classification of known astrophysical objects as well as the unsupervised characterization of novel phenomena. Two examples will be given: a new method for finding periodicities in light curves and a pipeline for detecting supernovae in real-time.

 

Usama Fayyad “From BigData to Data Science: Challenges & Opportunities in the Rapidly Changing Data Landscape”

Abstract: With a fundamental change in the assumptions underpinning a structured data world dominated by relational databases, we are entering the age of BigData. The combination of economic drivers in enterprise computing, the need to leverage semi-structured and unstructured Data, and the emergence of the Internet of Things (IOT), a dramatic shift in the Data landscape is taking place. The advent of Hadoop and the Open Source stack in this space have accelerated the changes to a point of confusion. Today’s data analyst faces a bewildering environment of technologies and challenges involving semi-structured and unstructured data with access methodologies that have almost no relation to the past. This talk will cover issues and challenges in how to make the benefits of advanced analytics fit within the application environment. The requirement for Real-time data streaming and in situ data mining is stronger than ever. We demonstrate how many of the critical problems remain open with much opportunity for innovative solutions to play a huge enabling role. This opportunity makes Data Science and several related fields critical to almost all future analytical tasks.  The talk will use 3 real case studies to demonstrate and discuss the challenges and the great opportunities for BigData and Data Science.

 

Alfred Inselberg “Visualization and Data Mining for High Dimensional Datasets”  

Abstract: A dataset with M items has 2^M subsets anyone of which may be the one satisfying our objectives. With a good data display and interactivity our fantastic pattern-recognition can not only cut great swaths searching through this combinatorial explosion, but also extract insights from the visual patterns. These are the core reasons for data visualization.

With parallel coordinates the search for relations in multivariate datasets is transformed into a 2-D pattern recognition problem. Guidelines and strategies for knowledge discovery are illustrated on several real datasets (financial, process control, credit-score and one with hundreds of variables) with stunning results. A geometric classification algorithm, having low computational complexity, provides the classification rule explicitly and visually. The minimal set of variables, features, required to state the rules is found and ordered by their predictive value. Multivariate relations can be modeled as hypersurfaces and used for decision support. A model of a (real) country’s economy reveals sensitivities, impact of constraints, trade-offs and economic sectors unknowingly competing for the same resources. A smart display for Intensive Care Units determines the patient’s state by the interaction of many variables. An overview of the methodology provides foundational understanding; learning the patterns corresponding to various multivariate relations. These patterns are robust in the presence of errors and that is good news for the applications. A topology of proximity emerges opening the way for visualization in Big Data.

 

Witold Pedrycz “Data Analytics with Information Granules”

Abstract: The apparent challenges in data analytics inherently associate with large volumes of data, data variability, and a quest for transparency and interpretability of obtained results. We advocate that information granules play a pivotal role in addressing these key challenges. We demonstrate that a framework of Granular Computing along with a diversity of its formal settings offers a badly needed conceptual and algorithmic setting instrumental for data analytics.

We elaborate on selected ways in which information granules and their processing address help in coping with abundance of data. A suitable perspective built with the aid of information granules is advantageous in realizing a suitable level of abstraction and forming sound, problem-oriented tradeoffs among precision of results, easiness of their interpretation, value of the results and their stability. All those aspects emphasize importance of actionability and interestingness of the produced findings.

Discussed are ways of forming information granules carried out on a basis of abundant data. We show an involvement of efficient granular mechanisms facilitating an inclusion of domain knowledge and making the results of ensuing data analytics user-centric. The development of information granules of higher type and higher order is advocated and their unique role in realizing a hierarchy of processing and coping with a distributed nature of available data is presented.

The facet of variability of data is addressed effectively by invoking the mechanisms of transfer learning applied to the adjustment of information granules.

 

Bernhard Scholkopf “Toward Causal Machine Learning”

Abstract: In machine learning, we use data to automatically find dependences in the world, with the goal of predicting future observations. Most machine learning methods build on statistics, but one can also try to go beyond this, assaying causal structures underlying statistical dependences. Can such causal knowledge help prediction in machine learning tasks? We argue that this is indeed the case, due to the fact that causal models are more robust to changes that occur in real world datasets. We touch upon the implications of causal models for machine learning tasks such as domain adaptation, transfer learning, and semi-supervised learning.

We also present an application to the removal of systematic errors for the purpose of exoplanet detection. Machine learning currently mainly focuses on relatively well-studied statistical methods. Some of the causal problems are conceptually harder, however, the causal point of view can provide additional insights that have substantial potential for data analysis.

Committee

Organizing Committee

– Sebastián Maldonado, Universidad de Los Andes, Chile (Chair)

– Cristian Bravo, Universidad de Talca, Chile

– Richard Weber, Universidad de Chile, Chile

– Karla Jaramillo, ISCI, Chile

– Julio Casanova, Grupo ISCI Data Science

Program Committee

– Richard Weber, Universidad de Chile, Chile (Chair)

– Galina Andreeva, University of Edinburgh, UK

– Roberto Battiti, University of Trento, Italy

– Bart Baesens, KU Leuven, Belgium

– Cristián Bravo, Universidad de Talca, Chile

– Emilio Carrizosa, Universidad de Sevilla, Spain

– Sven Crone, Lancaster University, UK

– Matt Davison, Western University, Canada

– David Díaz, Universidad de Chile, Chile

– Fernando Gomide, Universidade Estadual de Campinas, Brazil

– Rudolf Kruse, Otto-von-Guericke Universität Magdeburg, Germany

– Fazel Famili, University of Ottawa, Canada

– Cristian Figueroa, Sales Manager SAS and External Professor Universidad de Chile, Chile

– Jose Guajardo, University of California Berkeley, USA

– Hisao Ishibuchi, Osaka University, Japan

– Sebastián Maldonado, Universidad de Los Andes, Chile

– Christophe Mues, University of Southampton, UK

– Witold Pedrycz, University of Alberta, Canada

– Karim Pichara, Pontificia Universidad Católica de Chile, Chile

– Bárbara Poblete, Universidad de Chile, Chile

– Alejandro Rodríguez, Universidad de Talca, Chile

– Vania Sena, University of Essex, UK

– Alex Seret, Universidad de Los Andes, Chile

– Wouter Verbeke, Vrije Universiteit Brussel, Belgium

– Graham Williams, ATO, Australia

BRIEF BIOGRAPHICAL SUMMARY
USAMA M. FAYYAD

Usama M. Fayyad, Ph.D. is Group Chief Data Officer at Barclays in London where his responsibilities include building and delivering the data infrastructure for BI, data warehousing, BigData and analytics/insights technologies across the Barclays Group globally as well as data governance, and enterprise Data Architecture. He also took on an additional role at Barclays as CIO of Risk, Finance, and Treasury Technology.
He is Chairman of Oasis500 in Jordan following his appointment in 2010 by King Abdullah II of Jordan to be the founding Executive Chairman. Oasis500 a tech startup investment fund that runs an accelerator, entrepreneurship training program, and angel investment network aiming to fund 500 Internet and Technology startups in the MENA Region. From 2011-2013 he served as Chairman & CTO of BlueKangaroo, a mobile search engine to help consumers benefit from the vast offers environment that is difficult to search and benefit from.
Up until September 2008, Fayyad was based in Sunnyvale, CA as Yahoo!’s chief data officer & Executive VP responsible for Yahoo!’s global data strategy, architecting Yahoo!’s data policies and systems, prioritizing data investments, and managing the Company’s data analytics and data processing infrastructure which processed over 25 Terabytes of data per day. He was the industry’s first Chief Data Officer. Under his EVP role, Fayyad also founded and managed the Yahoo! Research Labs organization with offices around the world to develop the new sciences of the Internet, on-line marketing, Microeconomics, and algorithmic Advertising. At Yahoo! he applied Big Data techniques to content and advertising targeting and built the world’s largest group of data scientist – helping Yahoo! grow its revenues from user targeting by 20 times in 4 years. After Yahoo! and prior to Barclays he founded Open Insights, LLC a data strategy, technology and consulting firm based in Bellevue, WA to help the largest Telcos and Tech companies understand data strategy and deploy data-driven solutions that effectively and dramatically grow revenue and competitive advantages. Open Insights also worked with major private equity and investment firms to help prioritize and analyze investment opportunities.
In 2003 Fayyad co-founded and led the DMX Group, a data mining and data strategy consulting and technology company specializing in BigData Analytics major projects with Fortune 500 clients. DMX Group was acquired by Yahoo! in 2004. In early 2000, he co-founded and served as CEO of Audience Science (digiMine, Inc.), a venture backed company addressing hosted business analytics and leading the market in targeted advertising.
From 1995 to 2000, Fayyad was at Microsoft in Redmond, WA where he led the data mining and exploration group at Microsoft Research and headed the data mining products group for Microsoft’s server division. From 1989 to 1996 Fayyad held a leadership role at NASA’s Jet Propulsion Laboratory (JPL), where his work in the analysis and exploration of Big Data in scientific applications gathered from observatories, remote-sensing platforms and spacecraft garnered him the top research excellence award that Caltech awards to JPL scientists, as well as a U.S. Government medal from NASA.
Fayyad earned his Ph.D. in engineering from the University of Michigan, Ann Arbor (1991), and also holds BSE’s in both electrical and computer engineering (1984); MSE in computer science and engineering (1986); and M.Sc. in mathematics (1989). He has published over 100 technical articles in the fields of data mining, Artificial Intelligence, machine learning, and databases. He holds over 30 patents, is a Fellow of the AAAI (Association for Advancement of Artificial Intelligence) and a Fellow of the ACM (Association of Computing Machinery), has edited two influential books on the data mining and launched and served as editor-in-chief of both the primary scientific journal in the field of data mining (Data Mining and Knowledge Discovery) and the primary newsletter in the technical community published by the ACM: SIGKDD Explorations.
He continues to be active in the academic community serving as Chairman Emeritus and Director of ACM’s SIGKDD Executive Committee which runs the world’s premiere data science, big data, and data mining conferences: the KDD international annual conferences. He is a recipient of the ACM SIGKDD Innovation Award (2007) and Service Award (2003) – the only person to receive both awards.
Fayyad serves on the advisory boards and boards of directors of several private, public, as well as non-for-profit and academic organizations. He regularly delivers keynotes on BigData, Data Mining, Predictive Analytics, Data Strategy, and Entrepreneurship at many international conferences and forums.

Second Conference Business Analytics in Finance and Industry Conference Schedule December 2015 PDF

Second Conference Business Analytics in Finance and Industry PROCEEDINGS PDF

Home / PDF

Keynote Speakers Abstracts / PDF

Committee / PDF

Brief Biographical Summary: Usama M. Fayyad PDF

 

Leave a Reply