0% found this document useful (0 votes)
65 views43 pages

Julia Part II: Julia For Data Science

This document discusses Julia for data science. It introduces the Julia programming language and some of its key features for data analysis and visualization. The document is divided into sections on getting started with Julia, plotting data using the PyPlot package, an overview of useful Julia packages for data science tasks, and parallel processing in Julia. It provides examples of working with data frames, JSON data, and probability distributions.

Uploaded by

dev2025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views43 pages

Julia Part II: Julia For Data Science

This document discusses Julia for data science. It introduces the Julia programming language and some of its key features for data analysis and visualization. The document is divided into sections on getting started with Julia, plotting data using the PyPlot package, an overview of useful Julia packages for data science tasks, and parallel processing in Julia. It provides examples of working with data frames, JSON data, and probability distributions.

Uploaded by

dev2025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Julia Part II

Julia for Data Science

Prof. Matthew Roughan


[email protected]
http://www.maths.adelaide.edu.au/matthew.roughan/

UoA

Oct 31, 2017

M.Roughan (UoA) Julia Part II Oct 31, 2017 1 / 41


A basic problem about any body of data is to make it more
easily and effectively handleable by minds – our minds,
her mind, his mind.
John W. Tukey, Exploratory Data Analysis,
Addison-Wesley, 1977

M.Roughan (UoA) Julia Part II Oct 31, 2017 2 / 41


Section 1

Get Started

M.Roughan (UoA) Julia Part II Oct 31, 2017 3 / 41


Interface Cuteness
Matlab uses help, Julia switches into help mode by typeing ?
I lookfor in Matlab becomes apropos, e.g.,
apropos("determinant")
In Julia can access OS commands by typing ;, e.g.,
;pwd
Useful things to know
I history with up and down keys
I matches partial strings
I auto-complete with TAB
Standard shell-commands
I Ctrl-c interrupt process
I Ctrl-a start of the line
I Ctrl-e end of the line
I Ctrl-d exit
Startup file ˜/.juliarc.jl

M.Roughan (UoA) Julia Part II Oct 31, 2017 4 / 41


Other useful bits and pieces
Comments in shell-style #
Functions that modify their arguments have a name like sort!
Useful commands
whos()
@which sin(2)
versioninfo()
Numerical constants
pi
golden
e
im
eulergamma
Long numbers: 1_000_000
Others useful constants
JULIA_HOME # path to julia executable
nothing # function that returns void
M.Roughan (UoA) Julia Part II Oct 31, 2017 5 / 41
Section 2

Plotting

M.Roughan (UoA) Julia Part II Oct 31, 2017 6 / 41


Plot packages

There are several plotting packages


PyPlot: emulates Matlab, through Python’s matplotlib
Gadfly: emulates R’s ggplot
Plots: aims to become front end for all backends
GR, UnicodePlots, Plotly, PlotlyJS, Vega, Winston, StatsPlots,
PlotRecipes, GLVisualize, PGFPlots, Qwt, ...

M.Roughan (UoA) Julia Part II Oct 31, 2017 7 / 41


PyPlot
https://github.com/JuliaPy/PyPlot.jl
You should have it installed (see startup sheet)
I it uses PyCall to call Python
I uses Julia’s multimedia backend to display using various Julia
graphical backends (Qt, GTK, ...)
I it should be fairly portable
Syntax is intended to be similar to Matlab
I as implemented in matplotlib
http://matplotlib.org/api/pyplot_api.html
using PyPlot
x = linspace(0,2*pi,1000);
y = sin.(3 * x + 4 * cos.(2 * x));
plot(x, y, color="red", linewidth=2.0,
linestyle="--")
title("A sinusoidally modulated sinusoid")

M.Roughan (UoA) Julia Part II Oct 31, 2017 8 / 41


Main commands
You can get a listing of commands by typing PyPlot.TAB TAB
Some examples

plot
gcf()
xlim
xlabel
xkcd
surf
bar
figure
fill
pie
text
scatter

When running in a script, you need to use show() to get the fig to
display.
M.Roughan (UoA) Julia Part II Oct 31, 2017 9 / 41
Example 1

using PyPlot
x = 0:0.1:2*pi;
y = 0:0.1:pi;
X = repmat(x, 1, length(y));
Y = repmat(y’, length(x), 1);
S = [cos(x[i]) + sin(y[j]) for i=1:length(x),
j=1:length(y) ]
surf(X, Y , S, cmap=ColorMap("jet"), alpha=0.7)
xlabel("x")
ylabel("y")

M.Roughan (UoA) Julia Part II Oct 31, 2017 10 / 41


Example 2

using PyPlot
xkcd()
plot( [0,1], [0,1])
title(L"Plot of $\Gamma_3(x)$")
savefig("plot.svg")
# or PNG or EPS or PDF

LaTeXString defined by L”....”

M.Roughan (UoA) Julia Part II Oct 31, 2017 11 / 41


More Examples

https://gist.github.com/gizmaa/7214002
https://lectures.quantecon.org/jl/julia_plots.html

M.Roughan (UoA) Julia Part II Oct 31, 2017 12 / 41


Section 3

A Stupidly Short Tour of Packages

M.Roughan (UoA) Julia Part II Oct 31, 2017 13 / 41


Installing Packages

Packages are a collection of code encapsulated into a set of


Modules, and (usually) put on GitHub in a standard format
Adding a package can be done in a few ways, but the most
standard is
Pkg.add("PyPlot")
Pkg.update()
I takes care of dependencies
I installs code
Get status, and see where code is
Pkg.status()
Pkg.Dir.path()
LOAD_PATH

M.Roughan (UoA) Julia Part II Oct 31, 2017 14 / 41


Using Packages

Packages are a collection of code encapsulated into a set of


Modules, and (usually) put on GitHub in a standard format
Commands to use or import
using PyPlot
import PyPlot
I using simple access to all exported functions
I import uses names space of module, e.g., PyPlot.plot
Other ways to import code
include( "Code/my_code.jl" )
reload( "PyPlot" )

M.Roughan (UoA) Julia Part II Oct 31, 2017 15 / 41


Lots of Packages
https://pkg.julialang.org/
1518 registered packages!
Some trending packages
https://github.com/trending/julia
I Deep Learning https://github.com/denizyuret/Knet.jl
I IJulia is a Jupyter interactive environment
https://github.com/JuliaLang/IJulia.jl
I Gadfly is ggplot-like plotting
https://github.com/GiovineItalia/Gadfly.jl
I PyCall lets you call Python
https://github.com/JuliaPy/PyCall.jl
I Convex programming
https://github.com/JuliaOpt/Convex.jl
I

I will talk about a couple of direct use in Data Science

M.Roughan (UoA) Julia Part II Oct 31, 2017 16 / 41


DataFrames

Concept comes from R (as fas as I know)


Like a 2D array except
I can have missing values
I multiple data types
F quantitative
F categorical (strings)
I labelled columns
Nice mapping from Frame to CSV (or similar)
https:
//en.wikibooks.org/wiki/Introducing_Julia/DataFrames

M.Roughan (UoA) Julia Part II Oct 31, 2017 17 / 41


DataFrames
Download the following dataset, and put in a local folder called Data
https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/Titanic.csv

using DataFrames
data = readtable("Data/Titanic.csv",
nastrings=["NA", "na", "n/a", "missing"])
head(data)
size(data)
showcols(data)
data[:Name]
temp = deepcopy(data)
push!( temp, @data([1314, "my bit", "nth", NA, "male
tail(temp)
deleterows!(temp, 3:5)
data[ data[:,:Sex] .=="female", : ]
data[ :height ] = @data( rand(size(data,1)) )
sort!(data, cols = [order(:Sex), order(:Age)])

M.Roughan (UoA) Julia Part II Oct 31, 2017 18 / 41


JSON

JSON = JavaScript Object Notation


Data exchange format
I increasingly popular
I lightweight
I portable
Stores name/value pairs
I so it maps to a Dictionary well
I but lots of other data can be stored as JSON
http://www.json.org/

M.Roughan (UoA) Julia Part II Oct 31, 2017 19 / 41


JSON

Download the following dataset, and put in a local folder called Data
https://raw.githubusercontent.com/corysimmons/colors.json/master/colors.json

import JSON
c = JSON.parsefile("Data/colors.json")
c["purple"]
JSON.print(c)

M.Roughan (UoA) Julia Part II Oct 31, 2017 20 / 41


Distributions

Package for probability distributions and associate facilities


I moments
I pdf, cdf, logpdf, mgf
I samples
I Estimation: MLE, MAP
Included here because
I its useful
I its a nice example of a Julia package
F type hierarchy used to provide structure to RVs
e.g., Distributions → Univariate → Continuous → Normal
F multiple dispatch used to call correct version of generically named
functions
F easy to add a new one

https:
//juliastats.github.io/Distributions.jl/latest/

M.Roughan (UoA) Julia Part II Oct 31, 2017 21 / 41


Distributions

using Distributions
srand(123)

d = Normal(0.0, 1.0)
x = rand(d, 10)
quantile.( d, [ 0.5, 0.975] )
params(d)
minimum(d)
location(d)
scale(d)

x = rand(d, 100)
fit_mle(Normal, x)

M.Roughan (UoA) Julia Part II Oct 31, 2017 22 / 41


Section 4

Parallel Processing

M.Roughan (UoA) Julia Part II Oct 31, 2017 23 / 41


Julia Macros

Macros look a bit like functions, but begin with @, e.g.,


@printf("Hello %s\n", "World!")
@printf "Hello %s\n" "World!"
Why?
I Macros are parsed at compile time, to construct custom code for
run time
F e.g., for @printf, we want to interpret the format string at compile
time,
F In C, the printf function re-parses the format string each time it is
called, which is inefficient
F Also means that C compilers need to be very smart to avoid many
hard-to-debug mistakes of the wrong types of arguments being
passed to printf

M.Roughan (UoA) Julia Part II Oct 31, 2017 24 / 41


Julia Macros

Julia uses quite a few macros, and you can define your own
@time [sin(i) for i in 1:100000];
@which sin(1)
@show 2 + 2
macroexpand(quote @time sin(i) end)
Macros can be MUCH faster ways of implementing code
https://statcompute.wordpress.com/2014/10/10/
julia-function-vs-macro/
Macros can be used to automate annoying bits of replicated code,
e.g., @time
It’s part of the meta-programming paradigm of Julia
I ideas from Lisp
I Julia code is represented (internally) as Julia data
I so you can change the “data”

M.Roughan (UoA) Julia Part II Oct 31, 2017 25 / 41


What Julia Does

1 Raw Julia code is parsed


I converted into an Abstract Syntax Tree (AST), held in Julia
I syntax errors are found
2 Create a deeper AST
I Macros play here - they can create and modify unevaluated code
3 Parsed code is run
I hopefully really fast

M.Roughan (UoA) Julia Part II Oct 31, 2017 26 / 41


So what does that have to do with Parallel
Programming?

Julia has several functions and macros to aid in parallel


processing
I think the coolest is the “Map/Reduce” functionality introduced by
@parallel macro
I maybe you can see why it is a macro?

M.Roughan (UoA) Julia Part II Oct 31, 2017 27 / 41


Setting up for Multi-Processor Ops

There are two approaches for a single, multicore machine

> julia -p 4

julia > addprocs(3)


julia > procs()
julia > nprocs()

I’m not going to get into how to build a cluster

M.Roughan (UoA) Julia Part II Oct 31, 2017 28 / 41


Map Reduce

Many simple processes can be massively parallelised easily by


decomposing them into Map-Reduce operations
Map: apply an (independent) function or mapping to a small piece
of data
Reduce: combine the results of all the mappings into a summary
It’s a particularly good framework for multiple simulations run in
parallel

M.Roughan (UoA) Julia Part II Oct 31, 2017 29 / 41


@parallel

First make sure that all processes have the required environment

@everywhere cd("/home/mroughan/Presentation/Julia/C
@everywhere include("my_code.jl")

Now run parallelised loop, aggregating results with operator +

nheads = @parallel (+) for i = 1:200_000_000


Int(rand(Bool))
end

But take care – data is not automatically shared!!!!!!!!

M.Roughan (UoA) Julia Part II Oct 31, 2017 30 / 41


Section 5

Tips and tricks

M.Roughan (UoA) Julia Part II Oct 31, 2017 31 / 41


Type stability

Use @time to compare the speed of these two functions for large n

function t1(n) function t2(n)


s = 0 s = 0.0
for i in 1:n for i in 1:n
s += s/i s += s/i
end end
end end

M.Roughan (UoA) Julia Part II Oct 31, 2017 32 / 41


Don’t avoid loops

Use @time to compare the speed of these two functions for large n

function t1(n) function t2(n)


x = zeros(n) x = collect(1:n).ˆ2
for i in 1:n end
x[i] = iˆ2
end
return x
end

M.Roughan (UoA) Julia Part II Oct 31, 2017 33 / 41


Avoid global variables

Apart from the usual arguments


Hard for compiler to optimise around, because type may change
I if you need them, and they don’t change, define them as constants
const DEFAULT_VAL = 0
Note variables defined in the REPL are global
Execute code in functions, not global scope
I write functions, not scripts

M.Roughan (UoA) Julia Part II Oct 31, 2017 34 / 41


Pre-allocate outputs

Use @time to compare the speed of these two functions for large n

function t1(n) function t2(n)


x = zeros(Int64, n) x = [1]
for i in 1:n for i in 2:n
x[i] = iˆ2 push!(x, iˆ2)
end end
return x return x
end end

M.Roughan (UoA) Julia Part II Oct 31, 2017 35 / 41


Access arrays in memory order, along columns

2D arrays stored in column order (as in Fortran)


I C and Python numpy are in row order
Accessing in this order avoids jumping around in memory
I get the best value out of pipeline and cache

M.Roughan (UoA) Julia Part II Oct 31, 2017 36 / 41


Lots more tips

https://docs.julialang.org/en/latest/manual/
performance-tips/
https://github.com/Gnimuc/JuliaSO
http://blog.translusion.com/posts/julia-tricks/
https://julialang.org/blog/2017/01/moredots

M.Roughan (UoA) Julia Part II Oct 31, 2017 37 / 41


Standard Tools

Debugging https://github.com/Keno/Gallium.jl
BenchmarkTools package
https://github.com/JuliaCI/BenchmarkTools.jl
Profiler https:
//docs.julialang.org/en/latest/manual/profile/
Lint package https://github.com/tonyhffong/Lint.jl
Unit testing https:
//docs.julialang.org/en/stable/stdlib/test/
Literate programming (aka Knitr, ...)
https://github.com/mpastell/Weave.jl, and iJulia

M.Roughan (UoA) Julia Part II Oct 31, 2017 38 / 41


Standard Tools

There is a lot more to learn


I function definition
I creating modules
I types
I interfaces to other languages
I ...
I tried to concentrate on things where I think it is hard to get
started learning yourself

M.Roughan (UoA) Julia Part II Oct 31, 2017 39 / 41


Final Comment

Julia is v.shiny, but it’s not all roses


Current version is 0.6
I each 0.1 increment has introduced “breaking” changes
I the core is still evolving
I it’s getting better, but change is painful
Some libraries aren’t all there
I stagnation, ...
Plotting
I argggh!

M.Roughan (UoA) Julia Part II Oct 31, 2017 40 / 41


Conclusion

I don’t like endings, so here are some quotes to go on with.

We – or the Black Chamber – have a little agreement


with [Knuth]; he doesn’t publish the real Volume 4 of the
Art of Computer Programming, and they don’t render him
metabolically challenged.
Charles Stross, The Atrocity Archive, 2001

M.Roughan (UoA) Julia Part II Oct 31, 2017 41 / 41


Some more useful references
https://github.com/trending/julia
https://docs.julialang.org/en/latest/manual/
performance-tips/

M.Roughan (UoA) Julia Part II Oct 31, 2017 42 / 41


Bonus frames

M.Roughan (UoA) Julia Part II Oct 31, 2017 42 / 41

You might also like