100% found this document useful (1 vote)
62 views

4 BNI Python Training

Uploaded by

valkriez
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
62 views

4 BNI Python Training

Uploaded by

valkriez
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 126

PROGRAMMING FOR

EVERYBODY

Adam Aulia Rahmadi


TRAINER PROFILE

ADAM AULIA RAHMADI

Data Engineer / Manage data Support business Help data scientist Experience 4 years
Scientist – PT XL pipelines from with big data build model in in python , 3 years
AXIATA Cloudera technology large scale in big data
PYTHON INTRODUCTION

Adam Aulia Rahmadi


FOUNDER

Guido van Rossum

Version :
• 2.*
• 3.*
WHY PYTHON ?

• Simple
• Extensive Support Libraries
• Integration Feature
WHY
PYTHON ?
TOP
COMPANIES
THAT USING
PYTHON
PYTHON TOP
LIBRARY
WHAT PYTHON CAN DO
INTERMEZO
DATA ANALYTICS CYCLE

Data Engineer
Traditional DW

HADOOP Tech
Python

Others Tools (Tableau,


Knime, etc) Csv/Hadoop/DB

/presentation /reporting

Data Scientist
Variable
Operator
String and functions
Conditionals
Iterations
OVERVIEW
List
Tuple
Dictionary
File
Error handling
BASIC PYTHON

Adam Aulia Rahmadi


GOOGLE
COLAB

• https://
colab.research.google.com/
JUPYTER NOTEBOOK

Open cmd , type jupyter notebook


PYTHON INSTALL PACKAGE

3
Types Value
String message = 'And now for
PRIMITIVE something completely
different'
VARIABLE
Integer n = 17

Double / Float pi = 3.1415926535897931

Boolean True / False


Types Value
List []
[1,2,’abc’]
Tuple ()
COMPLEX (1,2,’abc’)
VARIABLES
Dictionary {}
a={}
a[‘day’]=‘monday’
a[‘date’]= 20191011
Type Values
+ Addition
- subtraction
MATH * Multiplication
OPERATORS
/ division
** Power
| Modulo
Types Values
> Greater than
< Less than
BOOLEAN >= Greater than or equal
EXPRESSIONS
<= Less than or equal
!= Not equal
== equal
STRING OPERATORS

• first = 10 • first = '100' • first = ‘100’


second = 15 second = '150' second = 3
print(first + second) print(first + second) print(first*second)
CONDITIONAL

• One conditional
CONDITIONAL

• Two conditionals
CHAINED CONDITIONAL

• More than two conditionals


NESTED
CONDITIONAL
Type Values
Build in functions Max(), min(), avg(), type(), len(), int(),
str()

FUNCTIONS Import functions Import random, math


User defined functions def function_name(argument_variable):
return something
FUNCTIONS
LOOP

• For loop • While loop


STRINGS

• A string is a sequence of characters. You can access the


characters one at a time
with the bracket operator
STRINGS

• Traversal through a string with a loop


STRINGS (THE IN OPERATOR )
STRINGS (STRING COMPARISON )
LIST

• Sequence • Initialize

• Mutable
LIST

• Traversing a list
LIST OPERATIONS

• The + operator concatenates lists:

• Similarly, the * operator repeats a list a given number of times:


LIST SLICES
LIST METHODS

• Append

• Extend
LIST AND
FUNCTIONS
DELETING ELEMENT

• Remove
• Pop

• del
TUPLES

• Sequence
• immutable (no append)

• Initialize
LIST AND STRINGS

• String to list • List to string


DICTIONARY
DICTIONARY

• Create dictionary

• Show dictionary key • Show dictionary values


• a.keys() • a.values()
DICTIONARY
LOOPING AND DICTIONARY
FILES

• Reading files
FILES

• Searching through a file


FILES

• Write to file
ERROR HANDLING
SOURCE

• http://do1.dr-chuck.com/pythonlearn/EN_us/pythonlearn.pdf
PANDAS
INTRODUCTION

Adam Aulia Rahmadi


WHY PANDAS ?
PANDAS
CORE
CREATING DATAFRAMES FROM
SCRATCH
CREATING DATAFRAMES FROM
SCRATCH

• Add index
CREATING DATAFRAME FROM
SCRATCH

• Search by index
COMMON FILES

Pandas
READ DATA FROM CSV
READ DATA FROM CSV

• Set index while read data


READING DATA FROM JSON

• Read from json


READING DATA FROM A SQL
DATABASE

• Read from database


CONVERTING BACK TO A CSV, JSON, OR SQL
VIEWING YOUR DATA
HANDLING DUPLICATES
COLUMNS
MISSING VALUES

• Check null • summary null each columns

• Drop null (rows) • Drop null (columns)


IMPUTATION
UNDERSTANDING YOUR VARIABLES

• Statistics columns
DATAFR AME S LIC IN G, S ELECTING, EXTRACTING

Columns wise

Select genre, rating from movies_df

Row wise
D ATA F R A M E S L I C I N G , S E L E C T I N G , E X T R A C T I N G
DATAFR AME S LIC IN G, S ELECTING, EXTRACTING

Conditional selections

Select * from movies_df where director = ‘Ridley Scott’

Select * from movies_df where rating >= 8.6 limit 3

Select * from movies_df where director = ‘Ridley Scott’ and director == ‘Christopher Nolan’
DATAFR AME S LIC IN G, S ELECTING, EXTRACTING

Select * from movies_df where director in (‘Ridley Scott’ ,‘Christopher Nolan’)


APPLY FUNCTIONS
PANDAS AGGREGATION

Take one genre

Apply aggregation

Select year, new_genre, sum(revenue_millions) as sum_revenue_mio, count(new_genre) as count_genre from movies_df


Groupby year, new_genre
PANDAS PIVOT

Dataframe = movies_df_gb
Index = ‘year’
Columns = ‘new_genre’ rows to column
Values = ‘count_genre’ fill the cell
aggfunc = function aggregation (max,min,sum,etc)
PANDAS JOIN

• Create dataframe
PANDAS JOIN
Create a date range

Work with timestamp data

Convert string to time stamp

DATE AND TIME


Index and slice time series data into data frame

Resample time series for different time period


aggregation / summary

Understanding unix /epoch time


DATE AND TIME

• Create timestamp

• Timestamp function
• Year, month, day , hour, minutes, seconds, ms, day/month name, day in week/month/year
DATE AND TIME

• Exploration
DATE AND TIME

Date range

Format : mm/dd/yyyy
DATE AND TIME

• Different format
DATE AND TIME

• Slicing data

• Daily aggregation

• Monthly aggregation
DATE AND TIME

• Daily aggregation alternative ways

get year
get year, month
get year, month, day
get hour
get dayname
DATE AND TIME

• Time delta
UNIX TIME

• https://www.unixtimestamp.com/index.php

From unixtime to datetime

From datetime to unixtime


SOURCE

• Source
• https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introd
uction-for-beginners/
• https://towardsdatascience.com/basic-time-series-manipulation-with-pandas-
4432afee64ea
PYTHON DATA VISUALIZATION

Adam Aulia Rahmadi


BASIC LIBRARY

• Matplotlib
• Seaborn
TYPE OF CHART

• Bar chart
• Line chart
• Scatter plot
• heatmap
SETUP JUPYTER
DATA PREPARATION
BAR CHART

Purpose is to comparing
few items
BAR CHART
DATA CHART
LINE CHART

• Purpose is to look for the trend


LINE CHART
SCATTER PLOT

• To look comparison between 2 variable


SCATTER PLOT
HEATMAP

• Purpose is to visualize a correlation or volume


HEATMAP

• movies_df_gb =
movies_df[['year','new_genre','revenue_millions']].groupby(['year','new_genre']).agg({'new_genre':'count','revenue_
millions':'sum'})
• movies_df_gb.columns = ['count_genre','sum_revenue_mio']
• movies_df_gb = movies_df_gb.reset_index()

movies_df_gb_pvt = pd.pivot_table(movies_df_gb,
values='count_genre',
index=['year'],columns=['new_genre'],
aggfunc=np.sum).fillna(0)
HEAT MAP
SPARK INTRODUCTION

Adam aulia rahmadi


• Apache Spark is an open-source distributed general-purpose
cluster-computing framework. Spark provides an interface for programming
entire clusters with implicit data parallelism and fault tolerance. Originally
developed at the University of California, Berkeley's AMPLab, the Spark
codebase was later donated to the Apache Software Foundation, which has
maintained it since.
• Wikipedia
DATA TYPES
DATAFRAME VS RDD

• Rdd
a = sc.parallelize([1,2,3,4])

• dataframe
Df = a.toDF(‘a’)
START SPARK ENGINE

• Basic configuration
LOAD DATA

• From csv
LOAD DATA

• Convert from pandas


SHOW DATA
SPARK UI

http://localhost:4040
SCHEMA AND COLUMNS

• Rename columns
DATAFRAME EXPLORATION

• select • Fill null column

• filter

• count • Null values


DATAFRAME EXPLORATION

• Filter

• aggregation

• join

• pivot
USER DEFINED FUNCTION (UDF)
TEMPORARY TABLE
EXPORT RESULT

Save to csv

Save to parquet

Result folder

Csv folder parquet folder


TURN OFF SPARK ENGINE

• spark.stop()
PYTHON HBASE

Adam Aulia Rahmadi


SET UP

Environment Python Library


• Jupyter • Happy base
• Hbase • Pandas
SYSTEM ARCHITECTURE

Python(Jupyter notebook) + API (flask_app.py) HBASE (VM ubuntu)


SETUP HBASE ON UBUNTU VM

-- 1. step by step vbox ubuntu -- 2. step by step install java ubuntu


hbase-site.xml
<property>
<name>hbase.rootdir</name>
• 1. install ubuntu • sudo apt update <value>file:///home/hduser/HBASE/hbase</value>
</property>
<property>
• 2. install additional .iso • sudo apt install default-jre <name>hbase.zookeeper.property.dataDir</name>
<value>/home/hduser/HBASE/zookeeper</value>
• 3. set up bidirectional • sudo apt install default-jdk </property>

• 4. set up host network adapter • Add to ~/.bashrc


(after download hbase) • Type java –version
• 5. test ping
-- 3. step by step install stand alone hbase

~/.bashrc
• download hbase and extract hbase
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64" https://downloads.apache.org/hbase/1.4.13/hbase-1.4.13-bin.tar.gz/
export PATH=$PATH:$JAVA_HOME/bin
export HBASE_HOME=/home/adam/hduser/hbase-1.4.12 • configure conf/hbase_env.sh (app java_home)
export PATH=$PATH:$HBASE_HOME/bin
• Export hbase_home to ~/.bashrc
• configure conf/hbase-site.xml
• Start hbase, hbase shell and thrift
RETRIEVE DATA
READ CSV

1. library

2. Import csv
CREATE CONNECTION AND INSERT
SOURCE

• https://www.digitalocean.com/community/tutorials/how-to-install-java-with-ap
t-on-ubuntu-18-04
• https://www.guru99.com/hbase-installation-guide.html

You might also like