@@ -200,90 +200,6 @@ julia> vector_hessian(cumprod, [1, 2, 3])
200200Likewise, you could write a version of ` vector_hessian ` which supports functions of the
201201form ` f!(y, x) ` , or perhaps an in-place Jacobian with ` ForwardDiff.jacobian! ` .
202202
203- ## SIMD Vectorization
204-
205- Many operations on ForwardDiff's dual numbers are amenable to [ SIMD
206- vectorization] ( https://en.wikipedia.org/wiki/SIMD#Hardware ) . For some ForwardDiff
207- benchmarks, we've seen SIMD vectorization yield [ speedups of almost
208- 3x] ( https://github.com/JuliaDiff/ForwardDiff.jl/issues/98#issuecomment-253149761 ) .
209-
210- To enable SIMD optimizations, start your Julia process with the ` -O3 ` flag. This flag
211- enables [ LLVM's SLPVectorizerPass] ( http://llvm.org/docs/Vectorizers.html#the-slp-vectorizer )
212- during compilation, which attempts to automatically insert SIMD instructions where possible
213- for certain arithmetic operations.
214-
215- Here's an example of LLVM bitcode generated for an addition of two ` Dual ` numbers without
216- SIMD instructions (i.e. not starting Julia with ` -O3 ` ):
217-
218- ``` julia
219- julia> using ForwardDiff: Dual
220-
221- julia> a = Dual(1. , 2. , 3. , 4. )
222- Dual{Nothing}(1.0 ,2.0 ,3.0 ,4.0 )
223-
224- julia> b = Dual(5. , 6. , 7. , 8. )
225- Dual{Nothing}(5.0 ,6.0 ,7.0 ,8.0 )
226-
227- julia> @code_llvm a + b
228-
229- define void @" julia_+_70852" (% Dual* noalias sret, % Dual* , % Dual* ) # 0 {
230- top:
231- % 3 = getelementptr inbounds % Dual, % Dual* % 1 , i64 0 , i32 1 , i32 0 , i64 0
232- % 4 = load double, double* % 3 , align 8
233- % 5 = getelementptr inbounds % Dual, % Dual* % 2 , i64 0 , i32 1 , i32 0 , i64 0
234- % 6 = load double, double* % 5 , align 8
235- % 7 = fadd double % 4 , % 6
236- % 8 = getelementptr inbounds % Dual, % Dual* % 1 , i64 0 , i32 1 , i32 0 , i64 1
237- % 9 = load double, double* % 8 , align 8
238- % 10 = getelementptr inbounds % Dual, % Dual* % 2 , i64 0 , i32 1 , i32 0 , i64 1
239- % 11 = load double, double* % 10 , align 8
240- % 12 = fadd double % 9 , % 11
241- % 13 = getelementptr inbounds % Dual, % Dual* % 1 , i64 0 , i32 1 , i32 0 , i64 2
242- % 14 = load double, double* % 13 , align 8
243- % 15 = getelementptr inbounds % Dual, % Dual* % 2 , i64 0 , i32 1 , i32 0 , i64 2
244- % 16 = load double, double* % 15 , align 8
245- % 17 = fadd double % 14 , % 16
246- % 18 = getelementptr inbounds % Dual, % Dual* % 1 , i64 0 , i32 0
247- % 19 = load double, double* % 18 , align 8
248- % 20 = getelementptr inbounds % Dual, % Dual* % 2 , i64 0 , i32 0
249- % 21 = load double, double* % 20 , align 8
250- % 22 = fadd double % 19 , % 21
251- % 23 = getelementptr inbounds % Dual, % Dual* % 0 , i64 0 , i32 0
252- store double % 22 , double* % 23 , align 8
253- % 24 = getelementptr inbounds % Dual, % Dual* % 0 , i64 0 , i32 1 , i32 0 , i64 0
254- store double % 7 , double* % 24 , align 8
255- % 25 = getelementptr inbounds % Dual, % Dual* % 0 , i64 0 , i32 1 , i32 0 , i64 1
256- store double % 12 , double* % 25 , align 8
257- % 26 = getelementptr inbounds % Dual, % Dual* % 0 , i64 0 , i32 1 , i32 0 , i64 2
258- store double % 17 , double* % 26 , align 8
259- ret void
260- }
261- ```
262-
263- If we start up Julia with ` -O3 ` instead, the call to ` @code_llvm ` will show that LLVM
264- can SIMD-vectorize the addition:
265-
266- ``` julia
267- julia> @code_llvm a + b
268-
269- define void @" julia_+_70842" (% Dual* noalias sret, % Dual* , % Dual* ) # 0 {
270- top:
271- % 3 = bitcast % Dual* % 1 to < 4 x double> * # cast the Dual to a SIMD-able LLVM vector
272- % 4 = load < 4 x double> , < 4 x double> * % 3 , align 8
273- % 5 = bitcast % Dual* % 2 to < 4 x double> *
274- % 6 = load < 4 x double> , < 4 x double> * % 5 , align 8
275- % 7 = fadd < 4 x double> % 4 , % 6 # SIMD add
276- % 8 = bitcast % Dual* % 0 to < 4 x double> *
277- store < 4 x double> % 7 , < 4 x double> * % 8 , align 8
278- ret void
279- }
280- ```
281-
282- Note that whether or not SIMD instructions can actually be used will depend on your machine
283- and Julia build. For example, pre-built Julia binaries might not emit vectorized LLVM
284- bitcode. To overcome this specific issue, you can [ locally rebuild Julia's system
285- image] ( http://docs.julialang.org/en/latest/devdocs/sysimg ) .
286-
287203## Custom tags and tag checking
288204
289205The ` Dual ` type includes a "tag" parameter indicating the particular function call to
@@ -315,4 +231,4 @@ want to disable this checking.
315231 ``` julia
316232 cfg = GradientConfig(nothing , x)
317233 gradient(f, x, cfg)
318- ```
234+ ```
0 commit comments