Mod 3
Mod 3
• Social Media: People add more than 500 TB of new data on various social
media applications such as Facebook, Instagram every single day in the
form of videos, photos, messages, comments.
• Sometimes you need to go into the field and design a data collection
process yourself, but most of the time you won’t be involved in this step.
Many companies will have already collected and stored the data for you,
and what they don’t have can often be bought from third parties.
• Data can be stored in many forms, ranging from simple text files to tables
in a database. The objective now is acquiring all the data you need
• Finding and getting access to data needed in your project.
• This data is either found within the company or retrieved from a
third party.
• Project Charter states which data you need and where you can
find it.
• Data takes many forms ranging from text files, Excel
spreadsheets to different types of databases.
1. Start with data stored within the company –
• This data can be stored in official data repositories such as databases,
data marts, data warehouses, and data lakes.
• A data mart is a subset of the data warehouse and geared toward
serving a specific business unit.
• While data warehouses and data marts are home to preprocessed
data, data lakes contains data in its natural or raw format.
• Finding data even within your own company can sometimes be a
challenge.
• Getting access to data is another difficult task. Organizations
understand the value and sensitivity of data and often have policies in
place so everyone has access to what they need and nothing more.
2. Don’t be afraid to shop around
• If data isn’t available inside your organization, look outside your organization’s walls. Many
companies specialize in collecting valuable information. Other companies provide data so that you, in
turn, can enrich their services and ecosystem. Such is the case with Twitter, LinkedIn, and Facebook.
Data.gov.in The home of the India Government’s open data
https://open-data.europa.eu/ The home of the European Commission’s open data
Freebase.org An open database that retrieves its information
from sites like Wikipedia, MusicBrains, and the SEC
archive
Data.worldbank.org Open data initiative from the World Bank
Aiddata.org Open data for international development
Open.fda.gov Open data from the US Food and Drug
1.Cost reduction. Big data technologies such as Hadoop and cloud-based analytics bring
significant cost advantages when it comes to storing large amounts of data – plus they
can identify more efficient ways of doing business.
2.Faster, better decision making. With the speed of Hadoop and in-memory analytics,
combined with the ability to analyze new sources of data, businesses are able to
analyze information immediately – and make decisions based on what they’ve learned.
3.New products and services. With the ability to gauge customer needs and satisfaction
through analytics comes the power to give customers what they want.
• Walmart handles more than one million customer transactions every
hour.
• Facebook handles 40 billion photos from its user base
• Decoding human genome originally took 10 years to process, but now
it is achieved in one week.
Why Big Data
Companies use big data in their systems to improve operations,
provide better customer service, create personalized marketing
campaigns and take other actions that, ultimately, can increase
revenue and profits.
Several factors have contributed to the current interest in Big Data.
• New technologies like Hadoop, Cloud computing, Machine learning, IoT, Artificial
Intelligence etc. – access a tremendous amount of data and extract value from it.
• So, there is now more data and less expensive faster hardware.
• Companies are using big data analytics to improve sales revenue, increase profits
and give a better service to customers.
3 V’s of Big data: V3
Exponential increase in
collected/generated data
• Ads featuring products and services we might actually want and use to better our lives.
The predictive analytics marketing used in the advertising industry has bridged the
communication gap between advertisers and consumers.
• Personalized and targeted ads - And these more personalized and targeted ads are all
based on massive amounts of personal data we constantly provide about what we’re
doing, saying, liking, sharing.
• Hyper-localized advertising - ads to the right people at the right time through right
channels.
• Using Big Data to Optimize Advertising
The big promise of big data to advertising is improved accuracy of communication.
• Big Data and Branding- Branding campaigns frequently aim to improve brand image or
recognition. Socio demographics like age and gender determine the relevant segments.
Advertising is delivered only to its target group, driving down wastage significantly.
• Predict customer interest.
• Expanded Customer Acquisition & Retention
BIG DATA TECHNOLOGY
• Hadoop Parallel World
• Hadoop Distributed File System(HDFS)
• Map Reduce
• Old Vs New Approaches
• Data Discovery
• Open Source Technology for Big Data Analytics
• The cloud and Big Data
• Predictive Analytics
• Software as a Service BI
• Mobile Business Intelligence
• Crowdsourcing Analytics
• Inter and Trans Firewall Analytics
Hadoop
• Hadoop is
• An Apache project
• A distributed computing platform Cloud Applications
• Hadoop is an open-source platform for
storage and processing of diverse data types
that enables data-driven enterprises to rapidly MapReduce
derive value from all their data.
A Cluster of Machines
History (2002-2004)