Performance

We will give a few examples the illustrate the performance of the collection.

Linear Algebra Properties of Typed Matrices

LinearAlgebra.jl provides several linear algebra operations. By utilizing the Julia type system, we can improve the performance of some of these operations for special matrices. The default method for the issymmetric function, for example, checks that a matrix satisfies the definition of symmetry by accessing each matrix element. The matrix Minij is known to be symmetric, and TypedMatrices.jl defines a new method for issymmetric that simply returns true.

On the Minij matrix of order 1000 this specialised method is over 80,000 times faster than the default implementations in the median case.

julia> a = Minij(1000)
1000×1000 Minij{Int64}:
...

julia> b = Matrix(Minij(1000))
1000×1000 Matrix{Int64}:
...

julia> @benchmark issymmetric(a)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):   9.810 ns … 89.790 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.310 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.798 ns ±  2.083 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▅▇▅▆▆▃▄▄▂ ▃▂▁▃ ▁▂▁▂▂▂▁▁▂ ▁                                 ▂
  ███████████████████████████▇▆▇▇▇▆▆▄▆▃▄▆▄▄▅▃▅▅▅▄▆▄▄▅▄▄▅▅▄▅▂▅ █
  9.81 ns      Histogram: log(frequency) by time      17.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark issymmetric(b)
BenchmarkTools.Trial: 4883 samples with 1 evaluation.
 Range (min … max):  593.700 μs …  13.507 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     873.400 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.009 ms ± 515.315 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    █▂   ▁▁
  ▂▆██▇▆████▇▅▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  594 μs           Histogram: frequency by time         2.69 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

Known Algorithm Working on Hilbert

The following example shows a known algorithm that works on Hilbert matrices. The variable a is of type Hilbert, whereas b is a variable of type Matrix representing the same matrix. Computing the determinant of b is 280 times slower and requires almost 1,000 times more memory than computing that of a.

julia> a = Hilbert{BigFloat}(100)
100×100 Hilbert{BigFloat}:
...

julia> b = Matrix(Hilbert{BigFloat}(100))
100×100 Matrix{BigFloat}:
...

julia> t3 = @benchmark det(a)
BenchmarkTools.Trial: 6985 samples with 1 evaluation.
 Range (min … max):  334.500 μs … 740.291 ms  ┊ GC (min … max):  0.00% … 68.80%
 Time  (median):     564.100 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   706.671 μs ±   8.853 ms  ┊ GC (mean ± σ):  10.32% ±  0.82%

   ▅█▅▂▂▅▄▁
  ▃█████████▇▆▆▄▄▄▄▅▅▅▅▆▆▇███▇▆▇▅▅▄▅▅▄▃▃▃▃▃▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  334 μs           Histogram: frequency by time         1.23 ms <

 Memory estimate: 69.02 KiB, allocs estimate: 3233.

julia> t4 = @benchmark det(b)
BenchmarkTools.Trial: 32 samples with 1 evaluation.
 Range (min … max):  127.925 ms … 229.261 ms  ┊ GC (min … max): 8.86% … 4.76%
 Time  (median):     158.327 ms               ┊ GC (median):    9.49%
 Time  (mean ± σ):   160.932 ms ±  23.576 ms  ┊ GC (mean ± σ):  9.48% ± 3.52%

      ▃▃ █ ▃         ▃█
  ▇▁▁▁██▁█▁█▁▇▇▇▁▇▇▇▇██▁▁▁▁▇▇▇▁▁▇▇▁▇▁▁▁▁▇▁▁▁▁▁▁▇▁▁▇▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  128 ms           Histogram: frequency by time          229 ms <

 Memory estimate: 66.22 MiB, allocs estimate: 1333851.

Trade-off between Performance and Memory

For algorithms not implemented in TypedMatrices.jl, the package trades performance off for—potentially substantial—memory savings. For example, generating the variable a, which is of type Cauchy, only requires 63.229 μs and 114.16 KiB of memory, while generating b, which is the same matrix but has type Matrix, requires 3.862 ms and 7.74 MiB of memory. And once generated, storing b requires 500,000 times more memory than storing a.

julia> @benchmark a = Cauchy{Float64}(1000)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  27.100 μs … 191.819 ms  ┊ GC (min … max):  0.00% … 99.94%
 Time  (median):     32.100 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   63.229 μs ±   1.919 ms  ┊ GC (mean ± σ):  35.05% ±  4.29%

  ▅█▇▅▄▃▃▂▂▃▂▃▃▅▅▃▁▁                 ▁▁                        ▂
  ███████████████████▇▇███▇▇▆▇▇▇▇▆▆▇████▇▇▆▆▄▄▃▂▂▄▅▅▅▆▆▇▇▆▆▇▅▅ █
  27.1 μs       Histogram: log(frequency) by time       125 μs <

 Memory estimate: 114.16 KiB, allocs estimate: 36.

julia> @benchmark b = Matrix(Cauchy{Float64}(1000))
BenchmarkTools.Trial: 1288 samples with 1 evaluation.
 Range (min … max):  2.413 ms … 18.386 ms  ┊ GC (min … max):  0.00% … 48.63%
 Time  (median):     3.271 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   3.862 ms ±  1.674 ms  ┊ GC (mean ± σ):  15.96% ± 19.84%

  ▂█▄   ▁▂
  ████▄▆██▆▅▅▅▃▃▃▃▄▄▄▃▃▃▃▃▃▃▃▃▃▃▃▂▂▂▂▃▃▃▃▃▂▂▂▂▂▃▂▁▂▂▁▂▂▂▂▂▂▂ ▃
  2.41 ms        Histogram: frequency by time        9.53 ms <

 Memory estimate: 7.74 MiB, allocs estimate: 38.

julia> Base.summarysize(a)
16

julia> Base.summarysize(b)
8000040

On the other hand, accessing an element of a requires computation, whereas those of b have been pre-computed and are already available in memory. This implies a performance penalty, which is not unexpected. In view of this trade-off, however, one can use extremely large matrices on machines with a moderate amount of memory, which allows users to tackle otherwise intractably large problems. This is especially true for algorithms that only need to access a subset of the matrix elements.

julia> @benchmark det(a)
BenchmarkTools.Trial: 111 samples with 1 evaluation.
 Range (min … max):  20.537 ms … 353.410 ms  ┊ GC (min … max): 0.00% … 90.93%
 Time  (median):     34.151 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.104 ms ±  42.894 ms  ┊ GC (mean ± σ):  7.81% ±  9.53%

   ▄█▅▃ ▁
  ▅████▇█▁▆▅▁▁▁▅▅▁▁▅▁▁▅▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▅
  20.5 ms       Histogram: log(frequency) by time       301 ms <

 Memory estimate: 7.64 MiB, allocs estimate: 4.

julia> @benchmark det(b)
BenchmarkTools.Trial: 175 samples with 1 evaluation.
 Range (min … max):  18.639 ms … 314.529 ms  ┊ GC (min … max): 0.00% … 91.89%
 Time  (median):     26.317 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   28.670 ms ±  22.610 ms  ┊ GC (mean ± σ):  7.81% ±  8.48%

      ▂ ▂▂ ██  ▂▂ ▃   ▅ ▅▂ ▃▂
  ▅▁▃▇█▇██████▆██▅█▅█▅███████▇▆▁▁▆▆▃▁▆▅▅▅▁▁▁▁▁▁▁▃▅▃▁▁▅▁▁▁▁▃▃▁▃ ▃
  18.6 ms         Histogram: frequency by time         44.1 ms <

 Memory estimate: 7.64 MiB, allocs estimate: 4.

julia> @benchmark sum(a)
BenchmarkTools.Trial: 3104 samples with 1 evaluation.
 Range (min … max):  1.124 ms …   7.772 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.400 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.604 ms ± 579.750 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃▁█▄ ▂
  █████▇██▇█▆▃▄▄▃▄▃▄▃▄▃▃▃▃▃▃▃▂▃▃▂▂▂▂▂▂▃▃▂▃▂▂▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▁▂ ▃
  1.12 ms         Histogram: frequency by time        3.56 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

julia> @benchmark sum(b)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  243.900 μs …  2.106 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     329.800 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   355.504 μs ± 91.684 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▂▅█▇▅▃▃▂▁▁▁▁
  ▁▄██████████████▇▆▆▆▆▆▆▆▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁ ▃
  244 μs          Histogram: frequency by time          647 μs <

 Memory estimate: 16 bytes, allocs estimate: 1.