0% found this document useful (0 votes)
34 views

Big Data Visualizer Course Notes

This document presents an introduction to Big Data analysis. It explains that a Big Data visualizer can develop applications to obtain, clean and process data from multiple sources in order to generate graphs that provide valuable information to improve processes, reduce costs and find opportunities. In addition, it describes the basic principles of Big Data such as volume, variety and velocity of data, as well as the purposes of Big Data analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Big Data Visualizer Course Notes

This document presents an introduction to Big Data analysis. It explains that a Big Data visualizer can develop applications to obtain, clean and process data from multiple sources in order to generate graphs that provide valuable information to improve processes, reduce costs and find opportunities. In addition, it describes the basic principles of Big Data such as volume, variety and velocity of data, as well as the purposes of Big Data analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Big Data Visualizer

Level 01

Introduction

The person who completes this course will be able to identify the principles necessary
to leverage data, with large volumes, variety and growth.

As a Big Data viewer you can develop applications that obtain, clean and process all
types of data from various sources. The reason for doing this is to generate and present
graphs that allow you to have a vision of the behavior of an organization. This vision is
very valued, since it can suggest the direction of an entire company with the objective of
improving processes, minimizing costs and finding growth opportunities.

 Recording and collection of data from various sources


 Filters, enrichment and classification of data
 Data analysis, modeling and prediction
 Data delivery and visualization

Lesson 01

What is Big Data and why is it important?

It refers to the use of an immense amount of data, requiring the ability to obtain, store,
manipulate and analyze millions of data, which would be impossible to do with
conventional analysis tools (data capture, relational databases, dynamic tables).

Big Data Characteristics and Application Fields

Main features

 Volume
 Variety (Structured and unstructured data)
 Speed

Other features

 Veracity
 Worth
Purposes of Big Data

 Improve operations
 Complex decision making
 Costs reduction
 Time reduction
 Deployment of personalized offers
 Business intelligence

Lesson 02

The Big Data model

Ecosystem

Information Technology Services

They are divided into two parts:

 Data management and storage. It is made up of three areas: data sources


(structured and unstructured), the core of Big Data, Operational Data
Management.

 Analysis and applications. Big Data Analytics and Users.

Workflow

 Data collection
 Big Data
 Big Analytics
 Users

Big Data Components

Lifecycle

 Recording and collection of data from various sources


 Filters, enrichment and classification of data
 Data analysis, modeling and prediction
 Data delivery and visualization
Big Data Infrastructure

Clusters. Several computers connected to each other, each one is known as a NODE.
Advantages:

 Parallel work
 High performance
 High workload support
 Scalability

Software used in the Clusters is based on open source platforms

Distributed file system


Core that maps and reduces data
Library Set

Level 02

Lesson 01

What is a JSON and how is it used?

It is a simple data exchange format, which began by communicating technologies such


as JavaScript with Python, PHP, “.net” and others. It has become popular for its
lightness and ease of use.

What is an API and what is it for?

An API (Application Programming Interface) is a series of instructions that tell the


system how to satisfy the user's request and returns with a response. Thanks to this
tool, information can travel from one place to another and different devices or
applications. , can connect to each other, allowing purchases, reservations or
publications.

API Types

Select. Return the record of an object


List Returns a list of records related to a specific object.
Update. Makes modifications to the database such as creating or deleting records.
Preparation for data extraction on social networks
The API of a social network is the main method for serving and sending data from the
platform.

Social media data acquisition

Before starting data acquisition, install and import the required libraries in Python
(Social Network Library, Requests Library), then declare the following variables:

Import Facebook
Import requests
Token = “EAAC.”
Graph = Facebook GraphAPI (token)
quantityComments = 100
PageId = 13254565767
LikesCount = 0
CommentList = [ ]
Flag = False
Comments = graph.get_connections(PageId, 'feed')

Extracting audio and video files

Lesson 02

Types of databases

NoSQL (Not only SQL) or non-relational databases:

 Key-Value Oriented
 Document-oriented
 Column oriented
 Graph-oriented

NoSQL database management

Connecting to a NoSQL database with programming


Level 03

Lesson 01

Big Data Analysis

Identify:

 Patterns
 Correlations
 Trends
 Customer preferences

Business intelligence generation

Goals

 Reduction of operating costs


 Improve decision making
 Offer new products and services

Benefits

 Improvement of services
 Generation of efficiency in operations that give an advantage over the
competition.

Big Data analysis techniques

Data Scientist Specialization

The disciplines to analyze Big Data are:

 Predictive analysis
 Data mining
 Text analysis
 Statistic analysis
 Machine Learning
 Data visualization
 Other tools based on NoSQL data analysis
Sectors that use Big Data

 Travel agencies and hotel chains


 Medicine and health care
 Government
 Department Stores and Online

Machine Learning or Automatic Learning

It refers to the study of data with the ability for the machine to learn without needing to
be explicitly programmed. It is achieved by building algorithms that create a model from
a sample of data, based on which machines create predictions or express decisions
themselves. They work for complex models, difficult to make conventionally, since
these end up being built on their own.

Classification

Machine Learning – Big Analytics or Big Data – Unsupervised

 Statistical classification
 Clustering
 Regression
 Anomaly detection
 Association rules

Sentiment analysis

Classify the polarity of a comment

Programmation logic

Import requests

Apikey = 'ab881ef9-5941-45d7-95ª7-595fc89d129d'
Language = 'eng'
ligaPetition = 'https://api.havenondemand.com/1/api/sync/analyzesentiment/v1?
text=(0)&language=(1)&apikey=(2)'

message = “I really liked the support but everything is very bad”


requestleague = requestleague.format(message1.language.apikey)
jsonResponse = requests.get(requestlink).json()
Analysis result structure
List of positive elements
List of negative elements
Total

Lesson 02

Web Scraping

Web Scraping Techniques

Techniques to obtain web page data, “scraping” means Scraping:

 Copy and paste


 Regex
 Data mining algorithms
 HTML parsing
 Applications or programs

Acquisition of data from a web page

For example to obtain the data on the dollar exchange rate.

Import the necessary libraries


Read the web page and convert it to string type
Find the data that interests you
Extract only the data you require
Give them treatment according to needs
Test the development

Take the following into account when developing this type of programs:

Use the find method to find the substring within the text, this method takes the position
within the entire text, you must count the positions to define the data you want from the
beginning to the end. Make functions to obtain the data of interest, in the part of
processing the acquired data, is where you will program the logic to follow, consider the
type of update, keep a program in a cycle so that it runs continuously until you want to
stop it .

Lesson 03
Creation of web graphics

To schedule and display a scatter chart, follow these steps:

 In a file add the basic HTML structure


 Import the jquery and graphics libraries
 Add a Script tag to start coding in JavaScript
 Add an object of type char to indicate which graph you are going to use, in this
case, it will be scatter.
 Add the following objects for the graph titles: title, subtitle, xAxis, yAxis, legend,
and finally add the series list.
 Save the HTML file and call it with the browser, to display the complete scatter
plot.

Bar graph with detail

 In an HTML file add the HTML structure, libraries and graphics titles
 Add the plotOptions object to add the percentage of each bar
 Adds the series object including the total data of each bar to indicate the display
of the detail of each bar.
 Then add the drilldown property with its respective name.
 Add another object called drilldown, followed by a series list, to specify the detail
of each bar.
 Add an object with detailed information about each bar and add the ID with the
name that you indicated as drilldown in the totals objects.
 Reload the page and check the detail functionality in each bar.

Creating a web service

A web service is a program that runs on the server side to exchange information
between applications. These can provide information in two formats XML and JSON. A
web service requires the following:

 PHP server version 5.6


 Non-SQL Database
 No SQL Data Library for PHP

Library installation on PHP Server

 Go to the folder where your server files are hosted and paste the “.dll” library in
the extensions folder, rename the library.
 Open the configuration file and register the library, restart the server.
 Check the installed extensions.

Creating a web service

To schedule a service that returns a collection in a non-relational database, follow these


steps:

 Open the collection with No SQL database


 Select the database you are going to consult
 Select the collection you are going to require from the database
 Bring all elements of the collection with the find statement
 List all elements and convert them to JSON
 Print each of the elements in the collection
 Close the connection to the database
 Finally, include the lines you just wrote in an exception to handle the error, in
case the connection cannot be made.

Creating real-time charts

To develop an application that works in real time, you need to query a database using a
PHP web service that returns a JSON. To program it, follow these steps:

Create a PHP file called data


Connect the No SQL database and select the collection you need to graph
Add a switch that captures the query parameter of the get request and add two cases
When the parameter to be queried is 1, add a query to the base where only the last
record added to the collection is returned.
When the parameter to be queried is not 1, add a query to the database where it
returns all the elements of the collection, this is known as default.
In both cases, add the results to a list and return them as JSON.
Test the web service in both cases and verify that the JSON is correctly constructed.

To draw a graph with HTML and JavaScript in real time, starting from an online graph,
follow these steps:

Add the series variable in the load part, so that the graph is updated every time a new
record is entered into the database.
Add two temporary variables in the document ready part, one for x and one for y, there
you save the last record added to the graph.
Create the get and set functions that will allow you to enter and modify those two
variables.
Add an ajax request, this will allow you to make queries and update the graph without
having to refresh the page, this request will require two pieces of information, the url of
the web service with the request to bring all the data and the type of request in this case
it will be get.
It scans the JSON of the content of each request and saves each of its elements in an
array.
Store temporary variables in an X and Y register.
Finally, assign the array to the data parameter so that the data is drawn on the graph.
Now you can see how the data is plotted on the graph, but it is not yet in real time.

To make the graph update in real time follow these steps:

Modify the interval function, which already brings the graph, adding a request to the
web service, but this time with the parameter of 1 to bring only the last record.
Compare the the graph.
Save the new records in temporary variables.
Finally, assign an interval of 1000 milliseconds to run this request again.

In this way you can create applications that allow you to observe and analyze in real
time the behavior of users on social networks, up to the latest movements in the stock
markets.

Structure of a dashboard

Control board or dashboard. It is a visual summary in real time, with business


information and can launch alerts of critical situations, it even offers the possibility of
consulting from any electronic device.

Process to design a dashboard

Define the KPIs of the organization, list them in order of importance, select the three or
four most important, if there are more than four you must divide them into different
groups so as not to saturate them, select the graphs that show the behavior of the
chosen KPIs. Arrange the graphs in a single panel, place filters, sorting functions and
descriptions.
Assessment

Assuming that the request returns a JSON, which is what the following code segment
would print in the console:
$.ajax({
url: "data.php?Consult=0",
type: 'get',
success: function(RecoveredData) {
RetrievedData = JSON.parse(RecoveredData);
$.each(RecoveredData, function(i,o){
Concoles.log( parseInt(ox));
Concoles.log( parseInt(oy));
});
User response:
The JSON coordinates that were sent as a response
Result:

Correct!
Question results

If with the following line of code a query is made to the NoSQL database for the last
record entered, how should you modify the statement to bring all the elements in
ascending order?
User response:
$cursor = $collection->find()->sort(array('$natural' => 1))->asd(1);
Result:

You need to reinforce the topic: Web Services and Creation of real-time graphics
Question results

Ivan is programming an application to display a line graph, but the graph does NOT
display the data. If he is using the following object for the data, what should Ivan do with
the data to solve the problem?

data: [[161.2], [167.5], [159.5], [157.0], [155.8]]


User response:
Place the data in coordinate form so that it can be read
Result:

Correct!
Question results
What output does a web service have in php with the following lines of code assuming
that the variable "$cursor" contains this list of objects: [{“x”: “1”, “y”: “2”},{“x ”: “2”, “y”:
“5”},{“x”: “3”, “y”: “8”}]

foreach ($cursor as $doc) {


array_push($lst, $doc);
}
echo json_encode($lst);
User response:
[{1,2},{2,5},{3,8}]
Result:

You need to reinforce the topic: Web Services and Creation of real-time graphics
Question results

It is a visual summary of an analysis, which contains information about a business:


User response:
Dashboard
Result:

Correct!
Question results

Assuming that the "ajax" request in the following code segment returns a JSON, what
would it print in the console:

$.ajax({
url: "data.php?Consult=0",
type: 'get',
success: function(RecoveredData) {
$.each(RecoveredData, function(i,o){
Concoles.log( parseInt(ox));
Concoles.log( parseInt(oy));
});
User response:
The JSON coordinates that were sent as a response
Result:

You need to reinforce the topic: Creating real-time graphics


Question results
Martha is programming a graph, but it does NOT display the title due to an error in this
segment of the program. How can you fix the error?

title: {
title: 'User account'
},
User response:
Changing title name to text
Result:

Correct!
Question results

Maria is installing a NoSQL database in php and has already placed the file in the
extensions folder. What do you have to do to be able to use the library?
User response:
Check the installed extensions
Result:

You need to reinforce the topic: Web Service


Question results

If with the following line of code a query is made to the NoSQL database for the last
record entered, how should you modify the statement to bring the first record? $cursor
= $collection->find()->sort(array('$natural' => -1))->limit(1);
User response:
$cursor = $collection->find()->sort(array('$natural' => 1))->limit(1);
Result:

Correct!
Question results

José has the following output from a web service and is using this instruction to read
the data. What will José obtain as a result?

<Sender>
<Name>Sender name</Name>
<Mail> Sender's email </Mail>
</Sender>
<Recipient>
<Name>Name of recipient</Name>
<Mail>Recipient's email</Mail>
</Recipient>

</Sender>
User response:
A graph without data because the information you are trying to graph does not
correspond to the type of graph
Result:

You need to reinforce the topic: Creating real-time graphics

José has the following output from a web service and is using this instruction to read
the data. What will José obtain as a result?

<Sender>
<Name>Sender name</Name>
<Mail> Sender's email </Mail>
</Sender>
<Recipient>
<Name>Name of recipient</Name>
<Mail>Recipient's email</Mail>
</Recipient>

</Sender>
User response:
An error in the read because it is not a valid input for the instruction it uses
Result:

Correct!
Question results
How many dashboards do you need if you selected 10 KPIs?
User response:
3
Result:

Correct!
Question results
Yvette is programming a graph, but it does NOT display the x-axis title of the graph and
she found the error in this part of the program. What is this error due to?

X axis: {
title: {
enabled: true,
text: 'Height (cm)'
}
},
User response:
The object name is incorrect
Result:

Correct!
Question results
If with the following line of code a query is made to the NoSQL database for the last
record entered, how should you modify the statement to bring the first record? $cursor
= $collection->find()->sort(array('$natural' => -1))->limit(1);
User response:
$cursor = $collection->find()->sort(array('$natural' => 1))->limit(1);
Result:

Correct!
Question results
What output does a web service have in php with the following lines of code assuming
that the variable "$cursor" contains this list of objects: [{“x”: “1”, “y”: “2”},{“x ”: “2”, “y”:
“5”},{“x”: “3”, “y”: “8”}]

foreach ($cursor as $doc) {


array_push($lst, $doc);
}
echo json_encode($lst);
User response:
[{“x”: “1”, “y”: “2”},{“x”: “2”, “y”: “5”},{“x”: “3”, “y”: “8”}]
Result:

Correct!
Question results
What should you do if the KPIs you selected to create a dashboard exceed 4?
User response:
Distribute them in different dashboard
Result:

Correct!
Question results
Assuming that the "ajax" request in the following code segment returns a JSON, what
would it print in the console:

$.ajax({
url: "data.php?Consult=0",
type: 'get',
success: function(RecoveredData) {
$.each(RecoveredData, function(i,o){
Concoles.log( parseInt(ox));
Concoles.log( parseInt(oy));
});
User response:
A graph with the data that included the JSON of the request
Result:

You need to reinforce the topic: Creating real-time graphics


Question results
Carlos is developing a bar graph but has a problem with this piece of code and the
graph is NOT displayed. What is the error due to?:

chart: {
type: 'graph-bar'
},
User response:
The object you are programming does not exist
Result:

You need to reinforce the topic: Creation of web graphics


Question results
To install a library from a No-SQL database, Juan renamed the library and placed it in
the extensions folder and then restarted the server, but the library was NOT installed.
What is the failure due to?
User response:
You did not register the library in the php configuration
Result:

Correct!
Question results
Karen is having trouble programming a php web service and needs to fetch the kitchen
collection from a NoSQL database called "Departments". What is the error in the code?

$mongo = new MongoClient();


$db = $mongo->selectDB('kitchen');
$collection = new MongoCollection($db, 'Departments');
User response:
The collection name and base will be exchanged
Result:

Correct!
Level 04

Lesson 01

Data science

Its objective is to have a better understanding of Big Data using study techniques other
than conventional ones. Data science is a mix of: Statistics and mathematics, Computer
science, Business administration.

Features

For a data scientist to obtain knowledge of Big Data, they must perform the following
tasks with the data:

 Acquire them
 Analyze them
 Filter them
 Extract them
 represent them
 Refine them
 Interact with them

Framework for Big Data

The structure that serves as the basis for developing and organizing software with
various integrated tools is called a framework.

The most used in Big Data is Apache Hadoop, which allows the processing of large
data sets distributed in clusters with the help of simple programming models. It is
designed to scale vertically and can have thousands of computers, where each one can
offer storage and processing. local.

Apache Hadoop includes four modules:

 Common
 Distributed file system
 YARN
 MapReduce

Other tools: Database, DataWarehouse Infrastructure, Engines.


The differences between Apache Hadoop and others are:

 Handle data fluidly


 Has a simplified programming model
 It is easy to manage

Formation of a work team

The requirements to start taking advantage of Big Data are the following:

 The entire company must know the impact of the appropriate use of data on the
business and its daily work
 It is not necessary to know technical aspects
 The company director must be the first to understand the benefit that this
technology brings and communicate it to others
 Hire a Big Data specialist

The person in charge of Big Data:

It must define the needs and opportunity areas of the company, the appropriate
technology to satisfy those needs, these may be applications with Machine Learning,
other types of real-time analytics or just more robust business intelligence. It must also
be defined whether the processing will be in the cloud or internal.

Hiring suitable personnel

 Project Manager
 Cloud Computing

If you want to work internally, then you must hire more staff:

 Personnel dedicated to servers, networks and software development.

Another necessary person is the Data Scientist who is an expert in Big Data analytics or
at least a business analyst who relies on cloud tools.

Creative freedom in the work team.


Lesson 02

Assessment

Which of the protocols is recommended to connect your Web service with the relational
database?

User response:
HTTP
Result:

Correct!
Question results

What type of database schema is most recommended for your integrator application?

User response:
Star
Result:

Correct!
Question results

In the relational database you created, the following are required fields for your model,
except:

User response:
Comment
Result:

Correct!
Question results

What is the ID of the following social network profile \"capacitateparaelempleo\"?

User response:
388594514598480
Result:

Correct!
Question results

In your application, which of the following methods is correct to display the frequency of
likes based on time?
User response:
Histogram
Result:

Correct!
Question results

In the application you created, what is the minimum number of nested loops you need
to get comments for each post on the social network?

User response:
2
Result:

Correct!
Question results

To build a two-element dashboard, in addition to time-based sentiment behavior, what


other KPI can you include?

User response:
Likes per week
Result:

Correct!
Question results

To enable your application server web page, you must perform the following tasks
except:

User response:
Upload files via FTP
Result:

Correct!

You might also like