WE ARE THRILLED TO WELCOME USAMA FAYYAD AS OUR FIRST KEYNOTE FOR RAPIDMINER WORLD 2014.
The opening keynote will feature a presentation by Usama Fayyad, a pioneer in predictive analytics. Fayyad’s address will provide insight into the evolving market of advanced analytics and how companies will need to prepare for the expansion and complexity of big data in the years ahead. Fayyad is currently chief data officer and group managing director at Barclays Bank and chairman at Oasis500. Previously, Fayyad has served as chairman and CTO at ChoozOn Corporation (Blue Kangaroo), founded Open Insights LLC and was Yahoo’s chief data officer. He co-founded the DMX Group and Audience Science (digiMine, Inc.) and led data mining and exploration at Microsoft Research. He also worked at NASA.
The conference will also offer an industry analyst session led by Shawn Rodgers of Enterprise Management Associates, who will provide the latest research and opinions about the analytics market. Following the session, panelists will continue to discuss the trends and technologies that are shaping the landscape of predictive analytics.
Over the course of the four-day conference, participants will be able to attend presentations designed to showcase how current researchers and practitioners are using or extending RapidMiner for scientific and commercial use. Use cases will include the application of RapidMiner for cross-selling and direct marketing, processing of big data for analysis, predictive maintenance, quality prediction and more.
Technical paper presentations will include sessions on text classification, data mining and visualization techniques, similarity assessment, sentiment analysis in social media and process mining workflows. Other activities taking place during RapidMiner World include a pre-conference workshop, partner showcase, a session outlining the RapidMiner product roadmap, a Radoop-specific workshop and product certification sessions. Together, these events provide attendees with the opportunity to “test drive” solutions, meet face-to-face with technical experts and exchange ideas with other organizations, including RapidMiner partners.
The evenings will feature social events, including a visit to Harpoon Brewery and a tour of the city via Boston Duck Tours.
RapidMiner World will take place August 18-21, 2014 at District Hall, 75 Northern Avenue in Boston, Mass. Until July 6, 2014, RapidMiner is offering a special registration rate of $775. To register and for more information, visit: https://rapidminer.com/rapidminer-world/.
RM World 2014: Big data vs. classic data
Transcript
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 BigData Vs. “Classic” Data: Predictive Analytics in a Changing Data Landscape Usama Fayyad, Ph.D. Chief Data Officer – Barclays Twitter: @usamaf August 19, 2014 RapidMiner World Boston, MA
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Outline • Big Data all around us • The CDO role • Some of the issues in BigData • Introduction to Data Mining and Predictive Analytics Over BigData • Case studies • Summary and conclusions
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 What Matters in the Age of Analytics? 1.Being Able to exploit all the data that is available • not just what you’ve got available •
what you can acquire and use to enhance your actions 2. Proliferating analytics throughout the organization • make every part of your business smarter • Actions and not just insights 3. Driving significant business value • embedding analytics into every area of your business can significantly drive top line revenues and/or bottom line cost efficiencies - RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Why Big Data? A new term, with associated “Data Scientist” positions: • Big Data: is a mix of structured, semistructured, and unstructured data: – Typically breaks barriers for traditional RDB storage – Typically breaks limits of indexing by “rows” – Typically requires intensive preprocessing before each query to extract “some structure” – usually using MapReduce type operations • Above leads to “messy” situations with no standard recipes or architecture: hence the need for “data scientists” – conduct “Data Expeditions” – Discovery and learning on the spot
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 The 4V’s of “Big Data” • Big Data is Characterized by the 3V’s: – Volume: larger than “normal” – challenging to load/process • Expensive to do ETL • Expensive to figure out how to index and retrieve • Multiple dimensions that are “key” – Velocity: Rate of arrival poses realtime constraints on what are typically “batch ETL” operations • If you fall behind catching up is extremely expensive (replicate very expensive systems) • Must keep up with rate and service queries onthefly – Variety: Mix of data types and varying degrees of structure • Nonstandard schema • Lots of BLOB’s and CLOB’s • DB queries don’t know what to do with semistructured and unstructured data.
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Male, age 32 Lives in SF Lawyer Searched on from London last week Searched on: “Italian restaurant Palo Alto” Checks Yahoo! Mail daily via PC & Phone Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k people Searched on: “Hillary Clinton” Clicked on Sony Plasma TV SS ad Registration Campaign Behavior Unknown Spends 10 hour/week On the internet Purchased Da Vinci Code from Amazon “Classic” Data: e.g. Yahoo! User DNA
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Male, age 32 Lives in SF Lawyer Searched on from London last week Searched on: “Italian restaurant Palo Alto” Checks Yahoo! Mail daily via PC & Phone Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k people Searched on: “Hillary Clinton” Clicked on Sony Plasma TV SS ad Spends 10 hour/week On the internet Purchased Da Vinci Code from Amazon How Data Explodes: really big Social Graph (FB) Likes & friends likes Professional netwk reputation Web searches on this person, hobbies, work, locationMetaData on everything Blogs, publications, news, local papers, job info, accidents
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 The Distinction between “Classic Data” and “Big Data” is fast disappearing • Most real data sets nowadays come with a serious mix of semistructured and unstructured components: – Images – Video – Text descriptions and news, blogs, etc… – User and customer commentary – Reactions on social media: e.g. Twitter is a mix of data anyway • Using standard transforms, entity extraction, and new generation tools to transform unstructured raw data into semistructured analyzable data
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Text Data: The Big Driver • RM World 2014: Big data vs. classic data RapidMiner We speak of “big data” and the “Variety” in 3V’s • Reality: biggest driver of growth of Big Data has been text data – Most work on analysis of “images” and “video” data has really been reduced to analysis of surrounding text Nowhere more so than on the internet • MapReduce popularized by Google to address the problem of processing large amounts of text data: – Many operations with each being a simple operation but done at large scale – Indexing a full copy of the web – Frequent re-indexing
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 IT Log & Security Forensics & Analytics Automated Device Data Analytics Failure Analysis Proactive Fixes Product Planning Advertising Analytics Segmentation Recommendation Social Media Big Data Warehouse Analytics Cost Reduction Ad Hoc Insight Predictive Analytics Hadoop + MPP + EDW Find New Signal Predict Events 100% Capture Big Data Applications and Uses
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 A few words on: The Chief Data Officer Why are companies creating this position?
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Why a Chief Data Officer? • There is a fundamental realisation that Data needs to become a primary value driver at organizations • We have lots of Data • We spend much on it: in technology and people • We are not realising the value we expect from it • A strong business need to create the CDO role: • Traditional companies are not following, but adopting the model that actually works in other dataintensive industries • CDO has a seat at executive table: the voice of Data • Data done right is an essential element to unify large enterprises to unlock value form business synergies
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 What about traditional IT? • Does your IT department provide you with the data you need? • Or is it just another stumbling block to get at Data? • Does your IT department understand what you need for analytics? • Or is it just about ETL, SQL databases, and restricted access? • Does your IT department have people who understand data modeling? Data Warehousing? BI? • Or is it just about capturing transactions into standard normalized databases? • Is BI more than just a tool for MI reporting?
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Fundamental Data Principles to Support Analytics Usama’s Obvious Data Axioms
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Data Axioms 1. Data gains value exponentially when integrated and coalesced. – When fragmented: dramatic value loss takes place; – increased costs; – reduced utility/integrity; – and increased security risks
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Data Axioms 2. Fusing Data together from disparate/independent sources is difficult to achieve and impossible to maintain Hence only viable approach is: • Intercepting and documenting at the source • fusing at the source • controlling lifecycle and flow
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Data Axioms 3. Standardisation is essential • for sustained ability to integrate data sources and hence growing value; • for simplifying downstream systems and apps • For enforcing discipline as a firm increases its data sources
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Data Axioms 4. Data governance and policy must be centralised • needs to be enforced strongly else we slip into chaos and a Babylon of terms/languages • An Enterprise Data Architecture spanning structured and unstructured data
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Data Axioms 5. Recency Matters data streaming in modelling and scoring • Often, accuracy of prediction drops quickly with time (e.g. consumer shopping) • Value of alerts drop exponentially with time… • Ability to trigger responses based on real time scoring critical • Streaming, realtime model updates, real time scoring
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Data Axioms 6. Data Infrastructure Needs: • rapid renewal & modernization: the pace of change and development of technology are very rapid – Design for migration and infrastructure replacement via abstraction layers that remove tech dependencies • Encryption and Masking: Persisting unencrypted confidential and secret data (even within secure firewalls) is an invitation for problems and risks
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Data Axioms 7. Data is a primary competency and not a sideactivity supporting other processes • Hence specialized skills and know how are a must • Generalists will create a hopeless mess • Data is difficult: modelling, architecture, and design to support analytics
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Reality Check: Brand/Reputation online What are people saying about my brand on Social Media?
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Reality Check Surely there are companies I can work with that can help me make this practical?
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Reality Check So what do technology people worry about these days?
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 To Hadoop or not to Hadoop? when to use techniquesrequiring MapReduce and grid computing? • Typically organizations try to use MapReduce for everything to do with Big Data – This is actually very inefficient and often irrational – Certain operations require specialized storage • Updating segment memberships over large numbers of users • Defining new segments on user or usage data
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Drivers of Hadoop in Large Enterprises Cost of Storage • Fastest growing demand is more storage • Data in Data Warehouses have traditionally required expensive storage technology: –$100K per terabyte per year – cost of Teradata storage – $2.5K per terabyte – much lower per year – cost of Hadoop on commodity storage
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 ERP Financial Data 1% Supply Chain Data 2% Sensor Data 2% Financial Trading Data 4% CRM Data 4% Science Data 7% Advertising Data 10% Social Data 11% Text and Language Data 16% IT Log Data 19% Content and Preference Data 24% Hadoop Use Cases by Data Type
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Analysis & Programming Software PIG HIPI
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Reality: If Storage is Biggest Driver of Hadoop Adoption; What is the next biggest? ETL • Replaces expensive licenses • Much higher performance with lower infrastructure costs (processors, memory) • Flexibility in changing schema and representation • Flexibility on taking on unstructured and semi structured data • Plus suite of really cool tools…
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Turning the three Vs of Big Data into Value Understand context and content • What are appropriate actions? • Is it Ok to associate my brand with this content? • Is content sad?, happy?, serious?, informative? Understand community sentiment • What is the emotion? • Is it negative or positive? • What is the health of my brand online? Understand customer intent? • What is each individual trying to achieve? • Can we predict what to do next? • Critical in crosssell, personalization, monetization, advertising, etc…
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Many Business Uses of Predictive Analytics Analytic technique Uses in business Marketing and sales Identify potential customers; establish the effectiveness of a campaign Understanding customer behavior model churn, affinities, propensities, … Web analytics & metrics model user preferences from data, collaborative filtering, targeting, etc. Fraud detection Identify fraudulent transactions Credit scoring Establish credit worthiness of a customer requesting a loan Manufacturing process analysis Identify the causes of manufacturing problems Portfolio trading optimize a portfolio of financial instruments by maximizing returns & minimizing risks Healthcare Application fraud detection, cost optimization, detection of events like epidemics, etc… Insurance fraudulent claim detection, risk assessment Security and Surveillance intrusion detection, sensor data analysis, remote sensing, object/person detection, link analysis, etc…
- Case Studies: 1. Context Analysis (unstructured data) 2. Yahoo! predictive modeling
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Understanding Context
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Reality Check So who is the company we think is best at handling BigData?
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Biggest BigData in Advertising? Understanding Context for Ads
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 The Display Ads Challenge Today What Ad would you place here?
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 The Display Ads Challenge Today Damaging to Brand?
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 The Display Ads Challenge Today What Ad would you place here?
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 The Display Ads Challenge Today Irrelevant and Damaging to Brand Completely Irrelevant
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 NetSeer: Intent for Display • Currently Processing 4 Billion Impressions per Day
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Problem: Hard to Understand User Intent Contextual Ad served by Google What NetSeer Sees:
- Case Studies: 1. Context Analysis (unstructured data) 2. Yahoo! Predictive Modeling
- Yahoo! – One of Largest Destinations on the Web 80% of the U.S. Internet population uses Yahoo! – Over 600 million users per month globally! Global network of content, commerce, media, search and access products 100+ properties including mail, TV, news, shopping, finance, autos, travel, games, movies, health, etc. 25+ terabytes of data collected each day • Representing 1000’s of cataloged consumer behaviors More people visited Yahoo! in the past month than: • Use coupons • Vote • Recycle • Exercise regularly • Have children living at home • Wear sunscreen regularly Sources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005. Data is used to develop content, consumer, category and campaign insights for our key content partners and large advertisers
- Yahoo! Big Data – A league of its own… Terrabytes of Warehoused Data 25 49 94 100 500 1,000 5,000 Amazon Korea Telecom AT&T Y!LiveStor Y!Panama Warehouse Walmart Y!Main warehouse GRAND CHALLENGE PROBLEMS OF DATA PROCESSING TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET Y! Data Challenge Exceeds others by 2 orders of magnitude Millions of Events Processed Per Day 50 120 225 2,000 14,000 SABRE VISA NYSE YSM Y! Global
- Behavioral Targeting (BT) Search Ad Clicks Content Search Clicks BT Targeting ads to consumers whose recent behaviors online indicate which product category is relevant to them
- Male, age 32 Lives in SF Lawyer Searched on from London last week Searched on: “Italian restaurant Palo Alto” Checks Yahoo! Mail daily via PC & Phone Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k people Searched on: “Hillary Clinton” Clicked on Sony Plasma TV SS ad Registration Campaign Behavior Unknown Spends 10 hour/week On the internet Purchased Da Vinci Code from Amazon Yahoo! User DNA • On a per consumer basis: maintain a behavioral/interests profile and profitability (user value and LTV) metrics
- How it works | Network + Interests + Modelling Analyze predictive patterns for purchase cycles in over 100 product categories In each category, build models to describe behaviour most likely to lead to an ad response (i.e. click). Score each user for fit with every category…daily. Target ads to users who get highest ‘relevance’ scores in the targeting categories Varying Product Purchase CyclesMatch Users to the ModelsRewarding Good BehaviourIdentify Most Relevant Users
- Recency Matters, So Does Intensity Active now… …and with feeling
- Differentiation | Category specific modelling time intensityscore time intensityscore IntenseClickZone Example 1: Category Automotive Example 2: Category Travel/Last Minute Different models allow us to weight and determine intensity and recency Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click IntenseClickZone
- Differentiation | Category specific modelling time intensityscore Intense Click Zone Example 1: Category Automotive Different models allow us to weight and determine intensity and recency with no further activity, decay takes effect Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click user is in the Intense Click Zone
- Automobile Purchase Intender Example A test adcampaign with a major Euro automobile manufacturer Designed a test that served the same ad creative to test and control groups on Yahoo Success metric: performing specific actions on Jaguar website Test results: 900% conversion lift vs. control group Purchase Intenders were 9 times more likely to configure a vehicle, request a price quote or locate a dealer than consumers in the control group ~3x higher click through rates vs. control group
- Mortgage Intender Example We found: 1,900,000 people looking for mortgage loans. +122% CTR Lift Mortgages Home Loans Refinancing Ditech Financing section in Real Estate Mortgage Loans area in Finance Real Estate section in Yellow Pages +626% Conv Lift Example search terms qualified for this target: Example Yahoo! Pages visited: Source: Campaign Click thru Rate lift is determined by Yahoo! Internal research. Conversion is the number of qualified leads from clicks over number of impressions served. Audience size represents the audience within this behavioral interest category that has the highest propensity to engage with a brand or product and to click on an offer.Date: March 2006 Results from a client campaign on Yahoo! Network Example: Mortgages
- Experience summary at Yahoo! • Dealing with one of the largest data sources (25 Terabyte per day) • Behavioral Targeting business was grown from $20M to > $400M in 3 years of investment! • Yahoo! Specific? BigData critical to operations – Ad targeting creates huge value – Right teams to build technology (3 years of recruiting) – Search is a BigData problem (but this has moved to mainstream)
- Lessons Learned A lot more data than qualified talent Finding talent in BigData is very difficult Retaining talent in BigData is even harder At Yahoo! we created central group that drove huge value to company Data people need to feel like they have critical mass Makes it easier to attract the right people Makes it easier to retain Drive data efforts by business need, not by technology priorities Chief Data Officer role at Yahoo! – now popular
- What About RapidMiner And Big Data? What does this all mean to RapidMiner and its use in enterprise analytics?
- RapidMiner’s Strengths 5959 • Open Source Community & Marketplace – Crowd sourced innovation, quality assurance, market awareness. • Fully integrated Platform – Integrated, process- based business analytics platform with focus on predictive analytics. • No Programming Required – Easy to use, low maintenance costs, standard platform for business analysts. • Advanced Analytics at Every Scale – In memory, in database and in Hadoop analytics offer best option for every size of database. • Connectivity – More than 60 connectors (incl. SAP & Hadoop), allowing easy access to structured and unstructured data.
- 30,000+ Downloads per Month CONFIDENTIAL SELECT LIST OF RECIPIENT ORGANIZATIONS 6060 Government & Defense Pharma & Healthcare Consulting Oil & Gas, Chemicals Financial Services Software & Analytics Retail Manufacturing Business Services Consumer Products Aerospace Technology Entertainment Academia
- Leader in Advanced Analytics 61 Source: Gartner Magic Quadrant for Advanced Analytics Platforms (February 2014) Full report at: http://www.rapidminer.com/gartner 2014
- PayPal Who > world leading online payment services provider Solution > Customer feedback and voice of the customer analysis, churn prediction and prevention, text mining and sentiment analysis SmartSoft Who > provider of solutions for preventing fraud, money laundering, and risks in financial institutions Solution > Integration of RapidI’s predictive analytics engine into their solutions for fraud detection and fraud prevention for the financial and telecom sectors Select Customer Stories
- SO WHAT IS THE BIG DATA STORY?
- So the data is naturally moving to Hadoop… Situation: –The data is moving to Hadoop for Cost (storage) and Convenience (ETL) forces –How do we get the value of predictive analytics to the data? Rather than move the data out, move the analytics to the data! –Can we minimize the need for data movement? –Data copies can become a management nightmare –Analytics on a “Business As Usual” manner require convenience
- Radoop – RapidMiner on Hadoop Opportunity: –Avoid expensive data movement –Leverage convenient data transformation –Thousands of data connectors, many over semi-structured and unstructured data Why is this big news? –Leverages a naturally occurring wave –Analytics over a richer variety requires much more processing –The energy placed on data extraction and loading moves to energy applied on actual analysis and modelling
- BigData Analytics & Threats Old data analysis regimes are breaking down – They cannot accommodate all the new data sources from online and mobile – Any data set can be enriched with semi-structured and unstructured data –News articles –User commentary and feedback –Contextual awareness: –Semantics of content –Semantics of location – Data driven marketing is key to innovation/platform success
- Big Picture on Big Data Analytics Observations and Concluding Remarks
- Retaining New Yahoo! Mail Registrants Often, Simple is Very Powerful!
- Integrating Mail and News Data showed that users often check their mail and news in the same session –But no easy way to navigate to Y! News from Y! Mail Mail users who also visit Y! News are 3X more active on Yahoo –Higher retention, repeat visits and time spent on Yahoo
- “In the news” Module on Mail Welcome Page Increased retention on Mail for light users by 40%! – Est. Incremental revenue of $16m a year on Y! Mail alone
- Nordstrom: Queries with No Matches Julie Bornstein, Web Marketing Director –What are my customers looking for and not finding? –June 2002: queries for “belly button rings” –returned no matches in store –Why the sudden interest?
- Nordstrom: Queries with No Matches Print Ad Campaign Models happen to be sporting a navel ring Nordstrom does not sell navel rings What to do???
- Concluding Remarks
- The early days of mass auto production
- Today’s Auto: It just works! No need to understand what happens when you turn on ignition Very complex inside, but all simplicity on the outside
- “Gotcha”s in Big Data Discounting the need for near real time data processing and analytics – Evils of “batch” thinking No grid story – go build separate infrastructure to learn – Expensive –Bad price performance as utilization insufficient Data Mining & Predictive analytics teams expect data in their own stores –Good luck using the analytic tools on your own DW store – Build a “replica” or “extract” DW Scoring and integration of results is a whole separate story
- Data Platform Considerations Eliminate the evils of data extracts/movement out of the main data store Do you need mapreduce over a grid? –Can you set up instanton grid (Hadoop) as needed for the mapreduce tasks/demand without moving data? How do you integrate with analytic functions? – scoring, data transformations – plugin 3rd party analytics scoring apps – Advanced dashboard and scorecard support Builtin integration with statistical packages? SAS, R, data mining algorithms, advanced analytics algorithms libraries Can you avoid data movement and additional copies?
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 BigData Analytics for Organizations • Key to competitive Intelligence: – Understand context – Understand intent • Key to understanding consumer trends through social media analysis – Brand issues – Trend issues – Anticipating the next shift
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Threats & Opportunities • Data world is changing, especially in online businesses • Major shifts from relational DB to NoSQL, document oriented stores • Connecting new world to “old” world? – Convenience of execution – integration with data platforms – Appropriateness of algorithms to BigData – Unstructured data algorithms: • Text, Semi-structured and Unstructured data • Entity extraction a must • Appropriate theory and probability distributions (power laws, fat tails) • Sparse Data – Model management and proper aging of models – Getting to basics so we can decide what models to use: • Understanding noise and distributions • Data tours
- RapidMiner World Keynote talk – Copyright Usama Fayyad © 2014 Usama Fayyad usama@openinsights.com usama_fayyad@yahoo.com Twitter – @Usamaf +12065295123 www.Oasis500.com www.openinsights.com Thank You! & Questions
[slideshare id=38257720&doc=bigdatavs-140822101329-phpapp01]
Introduction PDF
Big Data vs Classical Data PDF