Skip to content

Commit 0eeb1db

Browse files
KristofferCjrevels
authored andcommitted
rip -O3 (#372)
1 parent 0fe35e3 commit 0eeb1db

File tree

1 file changed

+1
-85
lines changed

1 file changed

+1
-85
lines changed

docs/src/user/advanced.md

Lines changed: 1 addition & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -200,90 +200,6 @@ julia> vector_hessian(cumprod, [1, 2, 3])
200200
Likewise, you could write a version of `vector_hessian` which supports functions of the
201201
form `f!(y, x)`, or perhaps an in-place Jacobian with `ForwardDiff.jacobian!`.
202202

203-
## SIMD Vectorization
204-
205-
Many operations on ForwardDiff's dual numbers are amenable to [SIMD
206-
vectorization](https://en.wikipedia.org/wiki/SIMD#Hardware). For some ForwardDiff
207-
benchmarks, we've seen SIMD vectorization yield [speedups of almost
208-
3x](https://github.com/JuliaDiff/ForwardDiff.jl/issues/98#issuecomment-253149761).
209-
210-
To enable SIMD optimizations, start your Julia process with the `-O3` flag. This flag
211-
enables [LLVM's SLPVectorizerPass](http://llvm.org/docs/Vectorizers.html#the-slp-vectorizer)
212-
during compilation, which attempts to automatically insert SIMD instructions where possible
213-
for certain arithmetic operations.
214-
215-
Here's an example of LLVM bitcode generated for an addition of two `Dual` numbers without
216-
SIMD instructions (i.e. not starting Julia with `-O3`):
217-
218-
```julia
219-
julia> using ForwardDiff: Dual
220-
221-
julia> a = Dual(1., 2., 3., 4.)
222-
Dual{Nothing}(1.0,2.0,3.0,4.0)
223-
224-
julia> b = Dual(5., 6., 7., 8.)
225-
Dual{Nothing}(5.0,6.0,7.0,8.0)
226-
227-
julia> @code_llvm a + b
228-
229-
define void @"julia_+_70852"(%Dual* noalias sret, %Dual*, %Dual*) #0 {
230-
top:
231-
%3 = getelementptr inbounds %Dual, %Dual* %1, i64 0, i32 1, i32 0, i64 0
232-
%4 = load double, double* %3, align 8
233-
%5 = getelementptr inbounds %Dual, %Dual* %2, i64 0, i32 1, i32 0, i64 0
234-
%6 = load double, double* %5, align 8
235-
%7 = fadd double %4, %6
236-
%8 = getelementptr inbounds %Dual, %Dual* %1, i64 0, i32 1, i32 0, i64 1
237-
%9 = load double, double* %8, align 8
238-
%10 = getelementptr inbounds %Dual, %Dual* %2, i64 0, i32 1, i32 0, i64 1
239-
%11 = load double, double* %10, align 8
240-
%12 = fadd double %9, %11
241-
%13 = getelementptr inbounds %Dual, %Dual* %1, i64 0, i32 1, i32 0, i64 2
242-
%14 = load double, double* %13, align 8
243-
%15 = getelementptr inbounds %Dual, %Dual* %2, i64 0, i32 1, i32 0, i64 2
244-
%16 = load double, double* %15, align 8
245-
%17 = fadd double %14, %16
246-
%18 = getelementptr inbounds %Dual, %Dual* %1, i64 0, i32 0
247-
%19 = load double, double* %18, align 8
248-
%20 = getelementptr inbounds %Dual, %Dual* %2, i64 0, i32 0
249-
%21 = load double, double* %20, align 8
250-
%22 = fadd double %19, %21
251-
%23 = getelementptr inbounds %Dual, %Dual* %0, i64 0, i32 0
252-
store double %22, double* %23, align 8
253-
%24 = getelementptr inbounds %Dual, %Dual* %0, i64 0, i32 1, i32 0, i64 0
254-
store double %7, double* %24, align 8
255-
%25 = getelementptr inbounds %Dual, %Dual* %0, i64 0, i32 1, i32 0, i64 1
256-
store double %12, double* %25, align 8
257-
%26 = getelementptr inbounds %Dual, %Dual* %0, i64 0, i32 1, i32 0, i64 2
258-
store double %17, double* %26, align 8
259-
ret void
260-
}
261-
```
262-
263-
If we start up Julia with `-O3` instead, the call to `@code_llvm` will show that LLVM
264-
can SIMD-vectorize the addition:
265-
266-
```julia
267-
julia> @code_llvm a + b
268-
269-
define void @"julia_+_70842"(%Dual* noalias sret, %Dual*, %Dual*) #0 {
270-
top:
271-
%3 = bitcast %Dual* %1 to <4 x double>* # cast the Dual to a SIMD-able LLVM vector
272-
%4 = load <4 x double>, <4 x double>* %3, align 8
273-
%5 = bitcast %Dual* %2 to <4 x double>*
274-
%6 = load <4 x double>, <4 x double>* %5, align 8
275-
%7 = fadd <4 x double> %4, %6 # SIMD add
276-
%8 = bitcast %Dual* %0 to <4 x double>*
277-
store <4 x double> %7, <4 x double>* %8, align 8
278-
ret void
279-
}
280-
```
281-
282-
Note that whether or not SIMD instructions can actually be used will depend on your machine
283-
and Julia build. For example, pre-built Julia binaries might not emit vectorized LLVM
284-
bitcode. To overcome this specific issue, you can [locally rebuild Julia's system
285-
image](http://docs.julialang.org/en/latest/devdocs/sysimg).
286-
287203
## Custom tags and tag checking
288204

289205
The `Dual` type includes a "tag" parameter indicating the particular function call to
@@ -315,4 +231,4 @@ want to disable this checking.
315231
```julia
316232
cfg = GradientConfig(nothing, x)
317233
gradient(f, x, cfg)
318-
```
234+
```

0 commit comments

Comments
 (0)