Julia basic concepts

Author
Alejandro Morales Sierra

Centre for Crop Systems Analysis - Wageningen University

Published

March 20, 2023

Introduction

This is not a tutorial or introduction to Julia, but a collection of basic concepts about Julia that are particularly useful to understand VPL. It is assumed that the reader has some experience with programming in other languages, such as Matlab, R or Python. These concepts should be complemented with general introductory material about Julia, which can be found at the official Julia website.

Julia is a dynamic, interactive programming language, like Matlab, R or Python. Thus, it is very easy to use and learn incrementally. The language is young and well designed, with an emphasis on numerical/scientific computation although it is starting to occupy some space in areas such as data science and machine learning. It has a clear syntax and better consistency than older programming languages.

Unlike Matlab, R or Python, Julia was designed from the beginning to be fast (as fast as statically compiled languages like C, C++ or Fortran). However, achieving this goal does require paying attention to certain aspects of the language, such as type stability and dynamic memory allocation, which are not always obvious to the user coming from other scientific dynamic languages. In the different sections below, a few basic Julia concepts are presented, first by ignoring performance considerations and focusing on syntax, and then by showing how to improve the performance of the code. Some concepts are ignored as they are not deemed relevant for the use of VPL.

Running Julia

There are different ways of executing Julia code (most popular ones are VS Code and Jupyter notebook):

  • Interactive Julia console from terminal/console (REPL)
  • Plugins for code editors
    • Visual Studio Code (most popular)
    • Atom/Juno (less popular now)
    • vim, Emacs and others (less popular)
  • Code cells inside a Jupyter notebook
  • Code cells inside Pluto notebook (a Julia implementation of a reactive notebook)

The first time in a Julia session that a method is called it will take extra time as the method will have to be compiled (i.e. Julia uses a Just-in-Time compiler as opposed to an interpreter). Also, the first time you load a package after installation/update it will take extra time to load due to precompilation (this reduces JIT compilation times somewhat). Also, code editors and notebooks may need to run additional code to achieve their full functionality, which may add some delays in executing the code.

Basic concepts

Functions

A function is defined with the following syntax.

function foo(x)
    x^2
end
foo(2.0)
4.0

Very short functions can also be defined in one line

foo2(x) = x^2
foo2 (generic function with 1 method)
foo2(2.0)
4.0

Functions can also be defined with the “\(\to\)” syntax. The result can be assigned to any variable.

foo3 = x -> x^2
#11 (generic function with 1 method)
foo3(2.0)
4.0

A begin - end block can be used to store a sequence of statements in multiple lines and assign them to “short function or a”\(\to\) function.

foo4 = begin
    x -> x^2
end
#13 (generic function with 1 method)
foo4(2.0)
4.0

Once created, there is no difference among foo, foo2, foo3 and foo4.

Anonymous functions are useful when passing a function to another function as argument. For example, the function bar below allows applying any function f to an argument x. In this case we could pass any of the variables defined above, or just create an anonymous funcion in-place.

function bar(x, f)
    f(x)
end
bar(2.0, x -> x^2)
4.0

Types

A Type in Julia is a data structure that can contain one or more fields. Types are used to keep related data together, and to select the right method implementation of a function (see below). It shares some of the properties of Classes in Object-Oriented Programming but there are also important differences.

Julia types can be immutable or mutable.

Immutable means that, once created, the fields of an object cannot be changed. They are defined with the following syntax:

struct Point
  x
  y
  z
end
p = Point(0.0, 0.0, 0.0)
Point(0.0, 0.0, 0.0)
p.x = 1.0
LoadError: setfield!: immutable struct of type Point cannot be changed

Mutable means that the fields of an object can be modified after creation. The definition is similar, just needs to add the keyword mutable

mutable struct mPoint
  x
  y
  z
end
mp = mPoint(0.0, 0.0, 0.0)
mPoint(0.0, 0.0, 0.0)
mp.x = 1.0
mp
mPoint(1.0, 0.0, 0.0)

We can always check the type of an object with the function typeof

typeof(p)
Point

If you forget the fields of a type, try to using fieldnames in the type (not the object). It will return the name of all the fields it contains (the “:” in front of each name can be ignored)

fieldnames(Point)
(:x, :y, :z)

Note that, for performance reasons, the type of each field should be annotated in the type definition as in:

struct pPoint
  x::Float64
  y::Float64
  z::Float64
end
pPoint(1.0, 2.0, 3.0)
pPoint(1.0, 2.0, 3.0)

Also, note that there are no private fields in a Julia type (like Python, unlike C++ or Java).

Methods

Methods are functions with the same name but specialized for different types.

Methods are automatically created by specifying the type of (some of) the arguments of a function, like in the following example

function dist(p1::pPoint, p2::pPoint)
    dx = p1.x - p2.x
    dy = p1.y - p2.y
    dz = p1.z - p2.z
    sqrt(dx^2 + dy^2 + dz^2)
end
dist (generic function with 1 method)
p1 = pPoint(1.0, 0.0, 0.0)
p2 = pPoint(0.0, 1.0, 0.0)
dist(p1, p2)
1.4142135623730951

Note that this function will not work for mPoints

mp1 = mPoint(1.0, 0.0, 0.0)
mp2 = mPoint(0.0, 1.0, 0.0)
dist(mp1, mp2)
LoadError: MethodError: no method matching dist(::mPoint, ::mPoint)

So we need to define dist for mPoint as arguments, or use inheritance (see below).

Abstract types

Types cannot inherit from other types.

However, when multiple types share analogus functionality, it is possible to group them by “abstract types” from which they can inherit. Note that abstract types do not contain any fields, so inheritance only works for methods. “abstract types are defined by how they act” (C. Rackauckas)

For example, we may define an abstract type Vec3 defined as any type for which a distance (dist) can be calculated. The default implementation assumes that the type contains fields x, y and z, though inherited methods can always be overriden.

Inheritance is indicated by the “<:” syntax after the name of the type in its declaration.

# Vec3 contains no data, but dist actually assumes that x, y and z are fields of any type inheriting from Vec3
abstract type Vec3 end
function dist(p1::Vec3, p2::Vec3)
    dx = p1.x - p2.x
    dy = p1.y - p2.y
    dz = p1.z - p2.z
    sqrt(dx^2 + dy^2 + dz^2)
end
# Like before, but inhering from Vec3
struct Point2 <: Vec3
  x::Float64
  y::Float64
  z::Float64
end
mutable struct mPoint2 <: Vec3
  x::Float64
  y::Float64
  z::Float64
end
struct Point3 <: Vec3
  x::Float64
  y::Float64
end

The method now works with Point2 and mPoint2

p1 = Point2(1.0, 0.0, 0.0)
p2 = Point2(0.0, 1.0, 0.0)
dist(p1, p2)
mp1 = mPoint2(1.0, 0.0, 0.0)
mp2 = mPoint2(0.0, 1.0, 0.0)
dist(mp1, mp2)
1.4142135623730951

The method will try to run with Point3 but it will raise an error because Point3 does not have the field z.

p3 = Point3(1.0, 0.0)
dist(p1, p3)
LoadError: type Point3 has no field z

Optional and keyword arguments

Functions can have optional arguments (i.e. arguments with default values) as well as keyword arguments, which are like optional arguments but you need to use their name (rather than position) to assign a value.

An example of a function with optional arguments:

opfoo(a, b::Int = 0) = a + b
opfoo(1)
1
opfoo(1,1)
2

An example of a function with keyword arguments

kwfoo(a; b::Int = 0) = a + b
kwfoo(1)
1
kwfoo(1, b = 1)
2

Modules

Within a Julia session you cannot redefine Types. Also, if you assign different data to the same name, it will simply overwrite the previous data (note: these statements are simplifications of what it actually happens, but it suffices for now).

To avoid name clashes, Julia allows collecting functions, methods, types and abstract types into Modules. Every Julia package includes at least one module.

A module allows exporting a subset of the the names defined inside of it:

module Mod

export fooz

fooz(x) = abs(x)

struct bar
   data
end

end
Main.Mod

In order to use a module the using command must be used (the . is required and indicates that the module was defined in the current scoppe, as modules can be nested).

using .Mod

Exportednames can be used directly

fooz(-1)
1

Unexported names can still be retrieved, but must be qualified by the module name.

b = Mod.fooz(-1.0)
1.0

Adding methods to existing functions

If a function is defined inside a module (e.g., a Julia package) we can add methods to that function by accessing it through the module name. Let’s define a function abs_dist that calculates the Manhattan (as opposed to Euclidean) distance between two points. We will put it inside a module called Funs to emulate a Julia package.

module Funs
  export manhattan
  function manhattan(p1, p2)
      dx = p1.x - p2.x
      dy = p1.y - p2.y
      dz = p1.z - p2.z
      abs(dx + dy + dz)
  end
end
using .Funs
manhattan(p1, p2)
manhattan(p1, p3)
LoadError: type Point3 has no field z

We see that we have the same error as before when using p3. Let’s add methods for when one the first or second argument is a Point3 that ignores the z:

Funs.manhattan(p1::Point3, p2) = abs(p1.x - p2.x + p1.y - p2.y)
Funs.manhattan(p1, p2::Point3) = abs(p1.x - p2.x + p1.y - p2.y)
manhattan(p1, p3)
manhattan(p3, p1)
0.0

You can find all the methods of a function by using methods() on the function name:

methods(manhattan)
# 3 methods for generic function manhattan:
  • manhattan(p1::Point3, p2) in Main at In[34]:1
  • manhattan(p1, p2::Point3) in Main at In[34]:2
  • manhattan(p1, p2) in Main.Funs at In[33]:3

Macros

A macro is a function or statement that starts with @. The details of macros are not explained here, but it is important to know that they work by modifying the code that you write inside the macro, usually to provided specific features or to achieve higher performance. That is, a macro will take the code that you write and substitute it by some new code that then gets executed.

An useful macro is @kwdef provided by the module Base, which allows assigning default values to the fields of a type and use the fields as keyword arguments in the constructors. This macro needs to be written before the type definition. For example, a point constructed in this manner would be:

Base.@kwdef struct kwPoint
    x::Float64 = 0.0
    y::Float64 = 0.0
    z::Float64 = 0.0
end
kwPoint()
kwPoint(1,1,1)
kwPoint(y = 1)
kwPoint(0.0, 1.0, 0.0)

Dot notation

Dot notation is a very useful feature of Julia that allows you to apply a function to each element of a vector. For example, if you want to calculate the square of each element of a vector you can do:

x = [1,2,3]
y = x.^2
3-element Vector{Int64}:
 1
 4
 9

The dot notation can be used with any function, not just mathematical functions. For example, if you want to calculate the absolute value of each element of a vector you can do:

abs.(y)
3-element Vector{Int64}:
 1
 4
 9

If the operation is more complex, the ‘.’ should be used in all the steps or, alternatively, use the macro @. that does the same:

abs.(y) .+ x.^3
@. abs(y) + x^3
3-element Vector{Int64}:
  2
 12
 36

The dot notation can also be used with functions that take more than one argument, but make sure that all the arguments have the same length

min.(x, y)
max.(x, y)
3-element Vector{Int64}:
 1
 4
 9

Improving performance

Type instability

As indicated above, annotating the fields of a data type (struct or mutable struct) is required for achieve good performance. However, neither arguments of functions nor variables created through assignment require type annotation. This is because Julia uses type inference (i.e. it tries to infer the type of data to be stored in each newly created varaible) and compiles the code to machine level based on this inference. This leads to the concept of type instability: if the type of data stored in a variable changes over time, the compiler will need to accomodate for this, which results (for technical reasons beyond the scope of this document) in a loss of performance.

Here is a classic example of type instability. The following function will add up the squares of all the values in a vector:

function add_squares(x)
  out = 0
  for i in eachindex(x)
    out += x[i]^2
  end
  return out
end
add_squares (generic function with 1 method)

It looks innocent enough. The issue here is that out is initialized as an integer (0), but then it is assigned the result of sqrt(x), which may be a real value (e.g. 1.0), which would have to be stored as a floating point number. Because out has different types at different points in the code, the resulting code will be slower than it could be, but still correct (this is why Julia is useful for rapid code development compared to static languages like C++ or Java).

add_squares(collect(1:1000)) # type stable
add_squares(collect(1:1000.0)) # not type stable
3.338335e8

How do we measure performance then? The @time macro is useful for this if dealing with a slow function. Otherwise it is better to use @btime from the BenchmarkTools package (see documentation of the package to understand why we use $).

using BenchmarkTools
v1 = collect(1:1000)
v2 = collect(1:1000.0)
@btime add_squares($v1)
@btime add_squares($v2)
  136.448 ns (0 allocations: 0 bytes)
  1.650 μs (0 allocations: 0 bytes)
3.338335e8

The second code is 12 times slower than the first one. We can detect type instability by using the @code_warntype macro. This macro will print a internal representation of the code before it is compiled. The details are complex, but you just need to look for things in red (which indicate type instability).

@code_warntype add_squares(v1)
@code_warntype add_squares(v2)
MethodInstance for add_squares(::
Vector{Int64})
  from add_squares(x) in Main at In[41]:1
Arguments
  #self#::Core.Const(add_squares)
  x::Vector{Int64}
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  out::Int64
  i::Int64
Body::Int64

1 ─ 
      (out = 0)

│   %2  = Main.eachindex(x)::Base.OneTo{Int64}
│         (@_3 = Base.iterate(%2))
│   %4  = (@_3 === nothing)::Bool
│   %5  = Base.not_int(%4)::Bool
└──       goto #4 if not %5
2 ┄ %7  = @_3::Tuple{Int64, Int64}
│         (i = Core.getfield(%7, 1))
│   %9  = Core.getfield(%7, 2)::Int64
│   %10 = out::Int64
│   %11 = Base.getindex(x, i)::Int64
│   %12 = Core.apply_type(Base.Val, 2)::Core.Const(Val{2})
│   %13 = (%12)()::Core.Const(Val{2}())
│   %14 = Base.literal_pow(Main.:^, %11, %13)::Int64
│         (out = %10 + %14)
│         (@_3 = Base.iterate(%2, %9))
│   %17 = (@_3 === nothing)::Bool
│   %18 = Base.not_int(%17)::Bool
└──       goto #4 if not %18
3 ─       
goto #2
4 ┄       return out

MethodInstance for add_squares(::Vector{Float64})
  from add_squares(x) in Main at In[41]:1
Arguments
  #self#::Core.Const(add_squares)
  x::Vector{Float64}
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  out::Union{Float64, Int64}
  i::Int64
Body::Union{Float64, Int64}
1 ─       (out = 0)
│   %2  = Main.eachindex(x)::Base.OneTo{Int64}
│         (@_3 = Base.iterate(%2))
│   %4  = (@_3 === nothing)::Bool
│   %5  = Base.not_int(%4)::Bool
└──       goto #4 if not %5
2 ┄ %7  = @_3::Tuple{Int64, Int64}
│         (i = Core.getfield(%7, 1))
│   %9  = Core.getfield(%7, 2)::Int64
│   %10 = out::Union{Float64, Int64}
│   %11 = Base.getindex(x, i)::Float64
│   %12 = Core.apply_type(Base.Val, 2)::Core.Const(Val{2})
│   %13 = (%12)()::Core.Const(Val{2}())
│   %14 = Base.literal_pow(Main.:^, %11, %13)::Float64
│         (out = %10 + %14)
│         (@_3 = Base.iterate(%2, %9))
│   %17 = (@_3 === 
nothing)::Bool
│   %18 = Base.not_int(%17)::Bool
└──       goto #4 if not %18
3 ─       goto #2
4 ┄       return out

How do we fix this? We could write different methods for different types of x, but this is not very practical. Instead, we can use the zero() function combined with eltype() to initialize out with the correct type.

function add_squares(x)
  out = zero(eltype(x)) # Initialize out with the correct type with value of zero
  for i in eachindex(x)
    out += x[i]^2
  end
  return out
end
add_squares (generic function with 1 method)

You could also initialize out to the first element of x and iterate over the rest of the elements, but this may not always possible (e.g. if x is empty or has one value only), so the logic will get more complex.

Now the code is type stable:

@code_warntype add_squares(v1)
@code_warntype add_squares(v2)
MethodInstance for add_squares(::Vector{Int64})
  from add_squares(x) in Main at In[45]:1
Arguments
  #self#::Core.Const(add_squares)
  x::Vector{Int64}
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  out::Int64
  i::Int64
Body::Int64
1 ─ %1  = Main.eltype(x)::Core.Const(Int64)
│         (out = Main.zero(%1))
│   %3  = Main.eachindex(x)::Base.OneTo{Int64}
│         (@_3 = Base.iterate(%3))
│   %5  = (@_3 === nothing)::Bool
│   %6  = Base.not_int(%5)::Bool
└──       goto #4 if not %6
2 ┄ %8  = @_3::Tuple{Int64, Int64}
│         (i = Core.getfield(%8, 1))
│   %10 = Core.getfield(%8, 2)::Int64
│   %11 = out::Int64
│   %12 = Base.getindex(x, i)::Int64
│   %13 = Core.apply_type(Base.Val, 2)::Core.Const(Val{2})
│   %14 = (%13)()::Core.Const(Val{2}())
│   %15 = Base.literal_pow(Main.:^, %12, %14)::Int64
│         (out = %11 + %15)
│         (@_3 = Base.iterate(%3, %10))
│   %18 = (@_3 === nothing)::Bool
│   %19 = Base.not_int(%18)::Bool
└──       goto #4 if not %19
3 ─       goto #2
4 ┄       return out

MethodInstance for add_squares(::Vector{Float64})
  from add_squares(x) in Main at In[45]:1
Arguments
  #self#::Core.Const(add_squares)
  x::Vector{Float64}
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  out::Float64
  i::Int64
Body::Float64
1 ─ %1  = Main.eltype(x)::Core.Const(Float64)
│         (out = Main.zero(%1))
│   %3  = Main.eachindex(x)::Base.OneTo{Int64}
│         (@_3 = Base.iterate(%3))
│   %5  = (@_3 === nothing)::Bool
│   %6  = Base.not_int(%5)::Bool
└──       goto #4 if not %6
2 ┄ %8  = @_3::Tuple{Int64, Int64}
│         (i = Core.getfield(%8, 1))
│   %10 = Core.getfield(%8, 2)::Int64
│   %11 = out::Float64
│   %12 = Base.getindex(x, i)::Float64
│   %13 = Core.apply_type(Base.Val, 2)::Core.Const(Val{2})
│   %14 = (%13)()::Core.Const(Val{2}())
│   %15 = Base.literal_pow(Main.:^, %12, %14)::Float64
│         (out = %11 + %15)
│         (@_3 = Base.iterate(%3, %10))
│   %18 = (@_3 === nothing)::Bool
│   %19 = Base.not_int(%18)::Bool
└──       goto #4 if not %19
3 ─       goto #2
4 ┄       return out

And the performance is more similar:

@btime add_squares($v1)
@btime add_squares($v2)
  136.541 ns (0 allocations: 0 bytes)
  812.644 ns (0 allocations: 0 bytes)
3.338335e8

Performance annotations

Sometimes code can be annotated to improve performance. For example, the @simd can be used in simple loops to indicate that the loop can be vectorized inside the CPU (it allows to run simple CPU instructions on small sets of data simultaneously). The @inbounds macro can be used to indicate that the code will not access elements outside the bounds of an array.

function add_squares(x)
  out = zero(eltype(x)) # Initialize out with the correct type with value of zero
  @simd for i in eachindex(x)
    @inbounds out += x[i]^2
  end
  return out
end
@btime add_squares($v1)
@btime add_squares($v2)
  135.771 ns (0 allocations: 0 bytes)
  54.619 ns (0 allocations: 0 bytes)
3.338335e8

Now we actually get faster performance for floating point number, which is related to the fact that the CPU can vectorize floating point operations more efficiently than integer operations (at least in this example). You can see the actual assembly code being generated (or an approximation of it) by using the @code_native macro. Any instruction that starts with v is a vectorized instruction. Note that sometimes Julia will automatically vectorize code without the need for the @simd but this is not always the case.

@code_native add_squares(v2)
    .text
    .file   "add_squares"
    
.globl  julia_add_squares_3992          # -- Begin function julia_add_squares_3992
    .p2align    4, 0x90
    .type   julia_add_squares_3992,@function
julia_add_squares_3992:                 # @julia_add_squares_3992
; ┌ @ In[48]:1 within `add_squares`
    .cfi_startproc
# %bb.0:                                # %top
    pushq   %rbp
    .cfi_def_cfa_offset 16
    
.cfi_offset %rbp, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register %rbp
    subq    $32, %rsp
    vmovapd %xmm7, -16(%rbp)                # 16-byte Spill
    vmovapd %xmm6, -32(%rbp)                # 16-byte Spill
; │ @ In[48]:3 within `add_squares`
; │┌ @ simdloop.jl:69 within `macro expansion`
; ││┌ @ abstractarray.jl:285 within `eachindex`
; │││┌ @ abstractarray.jl:116 within `axes1`
; ││││┌ @ abstractarray.jl:95 within `axes`
; │││││┌ @ array.jl:151 within `size`
    movq    8(%rcx), %r8
; ││└└└└
; ││ @ simdloop.jl:72 within `macro expansion`
; ││┌ @ int.jl:83 within `<`
    testq   %r8, %r8
; ││└
    je  .LBB0_1
# %bb.2:                                # %L14.lr.ph
    movq    (%rcx), %rcx
; ││ @ simdloop.jl:75 within `macro expansion`
    cmpq    $16, %r8
    jae .LBB0_4
# %bb.3:
    
vxorpd  %xmm0, %xmm0, %xmm0
    xorl    %edx, %edx
    jmp .LBB0_7
.LBB0_1:
    vxorpd  %xmm0, %xmm0, %xmm0
    jmp .LBB0_8
.LBB0_4:                                # %vector.ph
    movq    %r8, %rdx
    andq    $-16, %rdx
    
vxorpd  %xmm0, %xmm0, %xmm0
    xorl    %eax, %eax
    vxorpd  %xmm1, %xmm1, %xmm1
    vxorpd  %xmm2, %xmm2, %xmm2
    vxorpd  %xmm3, %xmm3, %xmm3
    .p2align    4, 0x90
.LBB0_5:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
; ││ @ simdloop.jl:77 within `macro expansion` @ In[48]:4
; ││┌ @ array.jl:924 within `getindex`
    vmovupd (%rcx,%rax,8), %ymm4
    vmovupd 32(%rcx,%rax,8), %ymm5
    vmovupd 64(%rcx,%rax,8), %ymm6
    vmovupd 96(%rcx,%rax,8), %ymm7
; ││└
; ││┌ @ float.jl:383 within `+`
    vfmadd231pd %ymm4, %ymm4, %ymm0     # ymm0 = (ymm4 * ymm4) + ymm0
    vfmadd231pd %ymm5, %ymm5, %ymm1     # ymm1 = (ymm5 * ymm5) + ymm1
    vfmadd231pd %ymm6, %ymm6, %ymm2     # ymm2 = (ymm6 * ymm6) + ymm2
    vfmadd231pd %ymm7, %ymm7, %ymm3     # ymm3 = (ymm7 * ymm7) + ymm3
; ││└
; ││ @ simdloop.jl:78 within `macro expansion`
; ││┌ @ int.jl:87 within `+`
    
addq    $16, %rax
    cmpq    %rax, %rdx
    jne .LBB0_5
# %bb.6:                                # %middle.block
; ││└
; ││ @ simdloop.jl:75 within `macro expansion`
    vaddpd  %ymm0, %ymm1, %ymm0
    vaddpd  %ymm0, %ymm2, %ymm0
    vaddpd  %ymm0, %ymm3, %ymm0
    vextractf128    $1, %ymm0, %xmm1
    vaddpd  %xmm1, %xmm0, %xmm0
    vpermilpd   $1, %xmm0, %xmm1        # xmm1 = xmm0[1,0]
    vaddsd  %xmm1, %xmm0, %xmm0
    cmpq    %rdx, %r8
    je  .LBB0_8
    .p2align    4, 0x90
.LBB0_7:                                # %L14
                                        # =>This Inner Loop Header: Depth=1
; ││ @ simdloop.jl:77 within `macro expansion` @ In[48]:4
; ││┌ @ array.jl:924 within `getindex`
    vmovsd  (%rcx,%rdx,8), %xmm1            # xmm1 = mem[0],zero
; ││└
; ││┌ @ float.jl:383 within `+`
    vfmadd231sd %xmm1, %xmm1, %xmm0     # xmm0 = (xmm1 * xmm1) + xmm0
; ││└
; ││ @ simdloop.jl:78 within `macro expansion`
; ││┌ @ int.jl:87 within `+`
    
incq    %rdx
; ││└
; ││ @ simdloop.jl:75 within `macro expansion`
; ││┌ @ int.jl:83 within `<`
    cmpq    %rdx, %r8
; ││└
    jne .LBB0_7
.LBB0_8:                                # %L32
    vmovaps -32(%rbp), %xmm6                # 16-byte Reload
    vmovaps -16(%rbp), %xmm7                # 16-byte Reload
; │└
; │ @ In[48]:6 within `add_squares`
    addq    $32, %rsp
    popq    %rbp
    vzeroupper
    retq
.Lfunc_end0:
    .size   julia_add_squares_3992
, .Lfunc_end0-julia_add_squares_3992
    .cfi_endproc
; └
                                        # -- End function
    .section    ".note.GNU-stack","",@progbits

Notice how with some simple annotations and reorganizing the code to deal with type instability we were able to get a 30x speedup. Obviously this was a simple function with minimal runtime, so the speedup is not particularly useful, but this type of small functions are often the ones that are called many times in complex computations (e.g., ray tracing), so the speedup can be significant in actual applications. Whether you need to worry about performance depends on where the bottleneck is in your code.

For more details, see the sections of the manual on profiling and performance tips.

Global variables and type instability

Global variables are any variable defined in a module outside of a function that is accessed from within a function. Global variables are not recommended in general as they can easily introduce bugs in your code by making the logic of the program much harder to reason about. However, they can also introduce performance issues as any global variable that is not annotated with its type will lead to type instability.

For example, let’s say we modify the add_squares function to use a global variable (a bit odd, but it is just to illustrate the point) to enable differen options in the code.

function add_squares(x)
  out = zero(eltype(x))
  if criterion > 0
    @simd for i in eachindex(x)
      @inbounds out += x[i]^2
    end
  else
    @simd for i in eachindex(x)
      @inbounds out -= x[i]^2
    end
  end
  return out
end
criterion = 1
@code_warntype add_squares(v2)
@btime add_squares(v2)
MethodInstance for add_squares(::Vector{Float64})
  from add_squares(x) in Main at In[50]:1
Arguments
  #self#::Core.Const(add_squares)
  x::Vector{Float64}
Locals
  out::Float64
  @_4::Union{Nothing, Tuple{Int64, Int64}}
  r#370::Base.OneTo{Int64}
  i#371::Int64
  n#372::Int64
  i#373::Int64
  val@_9::Float64
  i@_10::Int64
  @_11::Union{Nothing, Tuple{Int64, Int64}}
  r#374::Base.OneTo{Int64}
  i#375::Int64
  n#376::Int64
  i#377::Int64
  val@_16::Float64
  i@_17::Int64
Body::Float64
1 ── %1  = Main.eltype(x)::Core.Const(Float64)
│          (out = Main.zero(%1))
│    %3  = (Main.criterion > 0)::Any
└───       goto #10 if not %3
2 ── %5  = Main.eachindex(x)::Base.OneTo{Int64}
│          (r#370 = %5)
│    %7  = Base.simd_outer_range::Core.Const(Base.SimdLoop.simd_outer_range)
│    %8  = (%7)(r#370)::Core.Const(0:0)
│          (@_4 = Base.iterate(%8))
│    %10 = (
@_4
::Core.Const((0, 0)) === nothing)::Core.Const(false)
│    %11 = Base.not_int(%10)::Core.Const(true)
└───       goto #9 if not %11
3 ── %13 = @_4::Core.Const((0, 0))
│          (i#371 = Core.getfield(%13, 1))
│    %15 = Core.getfield(%13, 2)::Core.Const(0)
│    %16 = Base.simd_inner_length::Core.Const(Base.SimdLoop.simd_inner_length)
│    %17 = r#370::Base.OneTo{Int64}
│    %18 = (%16)(%17, i#371::Core.Const(0))::Int64
│          (n#372 = %18)
│    %20 = Main.zero(n#372)::Core.Const(0)
│    %21 = (%20 < n#372)::Bool
└───       goto #7 if not %21
4 ── %23 = Main.zero(n#372)::Core.Const(0)
└───       (i#373 = %23)
5 ┄─ %25 = (i#373 < n#372)::Bool
└───       goto #7 if not %25
6 ── %27 = Base.simd_index::Core.Const(Base.SimdLoop.simd_index)
│    %28 = r#370::Base.OneTo{Int64}
│    %29 = i#371::Core.Const(0)
│          (i@_10 = (%27)(%28, %29, i#373))
│          nothing
│    %32 = out::Float64
│    %33 = Base.getindex(x, i@_10)::Float64
│    %34 = Core.apply_type(Base.Val, 2)::Core.Const(Val{2})
│    %35 = (%34)()::Core.Const(Val{2}())
│    %36 = Base.literal_pow(Main.:^, %33, %35)::Float64
│    %37 = (%32 + %36)::Float64
│          (out = %37)
│          (val@_9 = %37)
│          nothing
│          val@_9
│          (i#373 = i#373 + 1)
│          $(Expr(:loopinfo, Symbol("julia.simdloop"), nothing))
└───       goto #5
7 ┄─       (@_4 = Base.iterate(%8, %15))
│    %46 = (@_4::Core.Const(nothing) === nothing)::Core.Const(true)
│    %47 = Base.not_int(%46)::Core.Const(false)
└───       goto #9 if not %47
8 ──       Core.Const(:(goto %13))
9 ┄─       Main.nothing
└───       goto #18
10 ─ %
52 = Main.eachindex(x)::Base.OneTo{Int64}
│          (r#374 = %52)
│    %54 = Base.simd_outer_range::Core.Const(Base.SimdLoop.simd_outer_range)
│    %55 = (%54)(r#374)::Core.Const(0:0)
│          (@_11 = Base.iterate(%55))
│    %57 = (@_11::Core.Const((0, 0)) === nothing)::Core.Const(false)
│    %58 = Base.not_int(%57)::Core.Const(true)
└───       goto #17 if not %58
11 ─ %60 = @_11::Core.Const((0, 0))
│          (i#375 = Core.getfield(%60, 1))
│    %62 = Core.getfield(%60, 2)::Core.Const(0)
│    %63 = Base.simd_inner_length::Core.Const(Base.SimdLoop.simd_inner_length)
│    %64 = r#374::Base.OneTo{Int64}
│    %65 = (%63)(%64, i#375::Core.Const(0))::Int64
│          (n#376 = %65)
│    %67 = Main.zero(n#376)::Core.Const(0)
│    %68 = (%67 < n#376)::Bool
└───       goto #15 if not %68
12 ─ %70 = Main.zero(n#376)::Core.Const(0)
└───       (i#377 = %70)
13 ┄ %72 = (i#377 < n#376)::Bool
└───       goto #15 if not %72
14 ─ %74 = Base.simd_index::Core.Const(Base.SimdLoop.simd_index)
│    %75 = r#374::Base.OneTo{Int64}
│    %76 = i#375::Core.Const(0)
│          (i@_17 = (%74)(%75, %76, i#377))
│          nothing
│    %79 = out::Float64
│    %80 = Base.getindex(x, i@_17)::Float64
│    %81 = Core.apply_type(Base.Val, 2)::Core.Const(Val{2})
│    %82 = (%81)()::Core.Const(Val{2}())
│    %83 = Base.literal_pow(Main.:^, %80, %82)::Float64
│    %84 = (%79 - %83)::Float64
│          (out = %84)
│          (val@_16 = %84)
│          nothing
│          val@_16
│          (i#377 = i#377 + 1)
│          $(Expr(:loopinfo, Symbol("julia.simdloop"), nothing))
└───       goto #13
15 ┄       (@_11 = Base.iterate(%55, %62))
│    %93 = (@_11::Core.Const(nothing) === nothing)::Core.Const(true)
│    %94 = Base.not_int(%93)::Core.Const(false)
└───       goto #17 if not %94
16 ─       Core.Const(:(goto %60))
17 ┄       Main.nothing
18 ┄       return out

  94.864 ns (1 allocation: 16 bytes)
3.338335e8

It is not a major hit in performance as criterion is only accessed once, but this can be a problem if the global variable is accessed many times in the code. The short term solution is to annotate the type of the global variable (but really we should be writing code without global variables). This can be frustating as once a global variable is created, you cannot annotate its type (even if it does not change!) without restarting the Julia session (unless it is inside a module).

function add_squares(x)
  out = zero(eltype(x))
  if criterion2 > 0
    @simd for i in eachindex(x)
      @inbounds out += x[i]^2
    end
  else
    @simd for i in eachindex(x)
      @inbounds out -= x[i]^2
    end
  end
  return out
end
criterion2::Int64 = 1
@code_warntype add_squares(v2)
@btime add_squares(v2)
MethodInstance for add_squares(::Vector{Float64})
  from add_squares(x) in Main at In[51]:1
Arguments
  #self#::Core.Const(add_squares)
  x::Vector{Float64}
Locals
  out::Float64
  @_4::Union{Nothing, Tuple{Int64, Int64}}
  r#385::Base.OneTo{Int64}
  i#386::Int64
  n#387::Int64
  i#388::Int64
  val@_9::Float64
  i@_10::Int64
  @_11::Union{Nothing, Tuple{Int64, Int64}}
  r#389::Base.OneTo{Int64}
  i#390::Int64
  n#391::Int64
  i#392::Int64
  val@_16::Float64
  i@_17::Int64
Body::Float64
1 ── %1  = Main.eltype(x)::Core.Const(Float64)
│          (out = Main.zero(%1))
│    %3  = (Main.criterion2 > 0)::Bool
└───       goto #10 if not %3
2 ── %5  = Main.eachindex(x)::Base.OneTo{Int64}
│          (r#385 = %5)
│    %7  = Base.simd_outer_range::Core.Const(Base.SimdLoop.simd_outer_range)
│    %8  = (%7)(r#385)::Core.Const(0:0)
│          (@_4 = Base.iterate(%8))
│    %10 = (@_4::Core.Const((0, 0)) === nothing)::Core.Const(false)
│    %11 = Base.not_int(%10)::Core.Const(true)
└───       goto #9 if not %11
3 ── %13 = @_4::Core.Const((0, 0))
│          (i#386 = Core.getfield(%13, 1))
│    %15 = Core.getfield(%13, 2)::Core.Const(0)
│    %16 = Base.simd_inner_length::Core.Const(Base.SimdLoop.simd_inner_length)
│    %17 = r#385::Base.OneTo{Int64}
│    %18 = (%16)(%17, i#386::Core.Const(0))::Int64
│          (n#387 = %18)
│    %20 = Main.zero(n#387)::Core.Const(0)
│    %21 = (%20 < n#387)::Bool
└───       goto #7 if not %21
4 ── %23 = Main.zero(n#387)::Core.Const(0)
└───       (i#388 = %23)
5 ┄─ %25 = (i#388 < n#387)::Bool
└───       goto #7 if not %25
6 ── %27 = Base.simd_index::Core.Const(Base.SimdLoop.simd_index)
│    %28 = r#385::Base.OneTo{Int64}
│    %29 = i#386::Core.Const(0)
│          (i@_10 = (%27)(%28, %29, i#388))
│          nothing
│    %32 = out::Float64
│    %33 = Base.getindex(x, i@_10)::Float64
│    %34 = Core.apply_type(Base.Val, 2)::Core.Const(Val{2})
│    %35 = (%34)()::Core.Const(Val{2}())
│    %36 = Base.literal_pow(Main.:^, %33, %35)::Float64
│    %37 = (%32 + %36)::Float64
│          (out = %37)
│          (val@_9 = %37)
│          nothing
│          val@_9
│          (i#388 = i#388 + 1)
│          $(Expr(:loopinfo, Symbol("julia.simdloop"), nothing))
└───       goto #5
7 ┄─       (@_4 = Base.iterate(%8, %15))
│    %46 = (@_4::Core.Const(nothing) === nothing)::Core.Const(true)
│    %47 = Base.not_int(%46)::Core.Const(false)
└───       goto #9 if not %47
8 ──       Core.Const(:(goto %13))
9 ┄─       Main.nothing
└───       goto #18
10 ─ %52 = Main.eachindex(x)::Base.OneTo{Int64}
│          (r#389 = %52)
│    %54 = Base.simd_outer_range::Core.Const(Base.SimdLoop.simd_outer_range)
│    %55 = (%54)(r#389)::Core.Const(0:0)
│          (@_11 = Base.iterate(%55))
│    %57 = (@_11::Core.Const((0, 0)) === nothing)::Core.Const(false)
│    %58 = Base.not_int(%57)::Core.Const(true)
└───       goto #17 if not %58
11 ─ %60 = @_11::Core.Const((0, 0))
│          (i#390 = Core.getfield(%60, 1))
│    %62 = Core.getfield(%60, 2)::Core.Const(0)
│    %63 = Base.simd_inner_length::Core.Const(Base.SimdLoop.simd_inner_length)
│    %64 = r#389::Base.OneTo{Int64}
│    %65 = (%63)(%64, i#390::Core.Const(0))::Int64
│          (n#391 = %65)
│    %67 = Main.zero(n#391)::Core.Const(0)
│    %68 = (%67 < n#391)::Bool
└───       goto #15 if not %68
12 ─ %70 = Main.zero(n#391)::Core.Const(0)
└───       (i#392 = %70)
13 ┄ %72 = (i#392 < n#391)::Bool
└───       goto #15 if not %72
14 ─ %74 = Base.simd_index::Core.Const(Base.SimdLoop.simd_index)
│    %75 = r#389::Base.OneTo{Int64}
│    %76 = i#390::Core.Const(0)
│          (i@_17 = (%74)(%75, %76, i#392))
│          nothing
│    %79 = out::Float64
│    %80 = Base.getindex(x, i@_17)::Float64
│    %81 = Core.apply_type(Base.Val, 2)::Core.Const(Val{2})
│    %82 = (%81)()::Core.Const(Val{2}())
│    %83 = Base.literal_pow(Main.:^, %80, %82)::Float64
│    %84 = (%79 - %83)::Float64
│          (out = %84)
│          (val@_16 = %84)
│          nothing
│          val@_16
│          (i#392 = i#392 + 1)
│          $(Expr(:loopinfo, Symbol("julia.simdloop"), nothing))
└───       goto #13
15 ┄       (@_11 = Base.iterate(%55, %62))
│    %93 = (@_11::Core.Const(nothing) === nothing)::Core.Const(true)
│    %94 = Base.not_int(%93)::Core.Const(false)
└───       goto #17 if not %94
16 ─       Core.Const(:(goto %60))
17 ┄       Main.nothing
18 ┄       return out

  79.072 ns (1 allocation: 16 bytes)
3.338335e8