Caffe: Proper Caffe opencl branch installation Instructions for Intel GPU

Created on 16 Dec 2016  ·  87Comments  ·  Source: BVLC/caffe

I am sorry that I have to open this but both in the opencl github branch and the google forums dont have any kind (updated) step by step installation instructions for installing Caffe Opencl on Intel GPU with Intel Opencl drivers especially for someone new.

(a) Do these instructions still work?
cmake -DUSE_GREENTEA=ON -DUSE_INTEL_SPATIAL=ON -DUSE_ISAAC=ON path_to_caffe_source
make -jn
make -jn runtest

on this branch https://github.com/BVLC/caffe/tree/opencl? or

What about?
cmake -DUSE_GREENTEA=ON -DUSE_INTEL_SPATIAL=ON -DUSE_ISAAC=ON -DBUILD_SHARED_LIBS=OFF -DUSE_CUDNN=OFF -DUSE -DBUILD_docs=OFF -DBUILD_python=OFF -DBUILD_matlab=OFF /root/caffe-opencl

(b) Is atlaspack still needed for compiling opencl-caffe when clblas is there??? It keeps asking for atlaspack???

(c) what about Vienna CL? Does that branch still depend on them? Is it needed?

(D) What is libdnn for? in place of ?

(e) What about ISAAC?

(f) The windows branch for example talks "If CUDA is not installed Caffe will default to a CPU_ONLY build" Does this mean it will not work in Opencl Mode in non-cuda builds??

Kindly update and provide step-by-step instructions
Thank you

OpenCL question windows

Most helpful comment

@atlury
There is a Windows section in the Readme that guides how to compile and install on Windows.
The only step missing in that description is downloading ViennaCL-DEV:
https://github.com/viennacl/viennacl-dev

It can be put in any one of the paths where CMake will find it, such as next to the folder into which you cloned Caffe.

The build instructions are different from the Linux instructions, since it is a script that automatically takes care of CMake configuration and downloading dependencies.

Usually there's no huge need to worry about configuration on Windows, since it's designed to just work. However I will give you a quick explanation:
(a) No and no. Use scripts/build_win.cmd as described in the Readme.
(b) Yes no matter how you compile it, a CPU BLAS is always needed. But build_win.cmd will take care of that for you, and it's default configuration is to use OpenBLAS.
(c) Yes, ViennaCL is needed, clone from here: https://github.com/viennacl/viennacl-dev
(d) LibDNN is the convolution engine default for OpenCL GPUs, replacement for cuDNN.
There's also additional Intel kernels for Intel GPUs available and enabled by default.
(e) ISAAC, clBLAS and CLBlast are strictly optional. You need to compile these separately on Windows and add them to the dependencies if you want to use them. I do not guarantee or support the compilation of any of these libraries, they are supported by the respective project maintainers.
(f) No, on the OpenCL branch, this is not true. Default here is USE_GREENTEA=ON, USE_CUDA=OFF, CPU_ONLY=OFF.

I will update the Readme after Christmas when I have holidays. I unfortunately don't have time for a detailed step-by-step right now.
CC: @willyd

All 87 comments

@atlury
There is a Windows section in the Readme that guides how to compile and install on Windows.
The only step missing in that description is downloading ViennaCL-DEV:
https://github.com/viennacl/viennacl-dev

It can be put in any one of the paths where CMake will find it, such as next to the folder into which you cloned Caffe.

The build instructions are different from the Linux instructions, since it is a script that automatically takes care of CMake configuration and downloading dependencies.

Usually there's no huge need to worry about configuration on Windows, since it's designed to just work. However I will give you a quick explanation:
(a) No and no. Use scripts/build_win.cmd as described in the Readme.
(b) Yes no matter how you compile it, a CPU BLAS is always needed. But build_win.cmd will take care of that for you, and it's default configuration is to use OpenBLAS.
(c) Yes, ViennaCL is needed, clone from here: https://github.com/viennacl/viennacl-dev
(d) LibDNN is the convolution engine default for OpenCL GPUs, replacement for cuDNN.
There's also additional Intel kernels for Intel GPUs available and enabled by default.
(e) ISAAC, clBLAS and CLBlast are strictly optional. You need to compile these separately on Windows and add them to the dependencies if you want to use them. I do not guarantee or support the compilation of any of these libraries, they are supported by the respective project maintainers.
(f) No, on the OpenCL branch, this is not true. Default here is USE_GREENTEA=ON, USE_CUDA=OFF, CPU_ONLY=OFF.

I will update the Readme after Christmas when I have holidays. I unfortunately don't have time for a detailed step-by-step right now.
CC: @willyd

@naibaf7
Thanks for the quick response. What about Linux Instructions?

Is OpenCL BLAS and ISAAC still needed??
https://github.com/01org/caffe/wiki/clCaffe

@atlury
Two ways on Linux: Use CMAKE and use 'make all -j8' or copy the makefile.config.example to makefile.config and compile using make all -j8; make pycaffe -j8; make runtest -j8.
Note that the compiled results from Makefile and CMAKE are slightly different on Linux. The Makefile is older, but easier, and the CMAKE is more complex.

This branch is not the same as https://github.com/01org/caffe/wiki/clCaffe
therefore it has different requirements. However the Intel spatial kernels from there have been merged into this branch.

Strict requirements:

  • ViennaCL, OpenCL and normal Caffe requirements such as Gflags, HDF5, etc.
  • You can get the OpenCL SDK either with CUDA, AMD APP SDK or Intel OpenCL SDK. This is true for both Windows and Linux. Mac OS X should provide it's own OpenCL implementation.

Optional requirements:

  • clBLAS (from AMD)
  • CLBlast (from @cnugteren)
  • ISAAC
  • cuDNN
  • CUDA

Thanks @naibaf7
And also for linux, LIBDNN is for most nVidia and AMD chips only? And we should use Intel spatial for Intel iGPUs?

@atlury
Intel spatial does not support efficient back propagation and not all shapes of convolutions, but yes, it is the fastest forward propagation on Intel iGPUs.
But I suggest you try both and check what works best for your networks and devices.

@naibaf7

Fabian, will the windows build support compiling with Mingw-64. Kindly let me know. If any instructions specific to it? Mocosoft studio is too bloated..

@atlury Currently no, not that I am aware of. @willyd is the main contributor and maintainer of windows building, so maybe he can answer that.
While microsoft studio might be a bit bloated, it's quite convenient with it since @willyd precompiled all dependencies fro VS2015 and VS2013. So I imagine using mingw-64 is a lot more work.

I have no intention to support mingw-64 as CUDA does not support mingw as a host compiler on windows. That being said I welcome any PRs related to support mingw64 if they don't add too much complexity to the build.

@willyd
Cool, what I thought. I am in this case in favor of simplicity, since Windows support without MinGW64 does not look like a major pitfall to me. It's somewhat preferable to use the standard compiler with each respective operating system.
I'm mostly worried about the support overhead when people use tricky build configurations.

@naibaf7

Does the windows opencl build include support for engine: SPATIAL? When I include engine: SPATIAL or engine: INTEL_SPATIAL, it get one of the following errors

_Layer conv1 has unknown engine._
_Error parsing text-format caffe.NetParameter: 18:3: Unknown enumeration value of "SPATIAL" for field "engine"._

The wiki is confusing read.me https://github.com/BVLC/caffe/tree/opencl

It mentions both _add entry engine: SPATIAL_ to all convolution layer specification. as well as _"engine: INTEL_SPATIAL <-------------------------- this line!"_

Which one?

And it runs fine without the engine: spatial in prototxt.

_opencl-caffe-test.exe imagenet_deploy.prototxt bvlc_reference_caffenet.caffemodel imagenet_mean.binaryproto synset_words.txt truck.jpg
Use GPU with device ID 0
---------- Prediction for truck.jpg ----------
0.9872 - "n03417042 garbage truck, dustcart"
0.0110 - "n04467665 trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi"
0.0013 - "n03496892 harvester, reaper"
0.0002 - "n04428191 thresher, thrasher, threshing machine"
0.0001 - "n04461696 tow truck, tow car, wrecker"_

Also here are a few "other" observations
a) Works better when compiled as DLL instead of static. Especially solves the error _"Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type"_ (visual studio 2015)
b) It doesnt seem to pick up the OpenCL.lib so work around is to manually copy it from the opencl-sdk folder into the build folder (what does it expect the path variable name to be?)
c) The libraries extracted in the build folder could be compiled to latest (say for example opencv 3.2 etc)

Further

C:\Downloads\xxx.caffe-opencl-build\bin>caffe device_query
I0108 12:35:04.885713 19872 common.cpp:382] Total devices: 3

I0108 12:35:04.888244 19872 common.cpp:383] CUDA devices: 0
I0108 12:35:04.889102 19872 common.cpp:384] OpenCL devices: 3

I0108 12:35:04.889681 19872 common.cpp:408] Device id: 0

I0108 12:35:04.890744 19872 common.cpp:410] Device backend: OpenCL
I0108 12:35:04.891839 19872 common.cpp:412] Backend details: Intel(R) Corporation: OpenCL 1.2
I0108 12:35:04.893450 19872 common.cpp:414] Device vendor: Intel(R) Corporation
I0108 12:35:04.894731 19872 common.cpp:416] Name: Intel(R) HD Graphics 4400
I0108 12:35:04.895730 19872 common.cpp:418] Total global memory: 1708759450

I0108 12:35:04.897233 19872 common.cpp:408] Device id: 1
I0108 12:35:04.898505 19872 common.cpp:410] Device backend: OpenCL
I0108 12:35:04.899590 19872 common.cpp:412] Backend details: Intel(R) Corporation: OpenCL 1.2
I0108 12:35:04.901091 19872 common.cpp:414] Device vendor: Intel(R) Corporation
I0108 12:35:04.902592 19872 common.cpp:416] Name: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
I0108 12:35:04.904093 19872 common.cpp:418] Total global memory: 8513761280

I0108 12:35:04.905594 19872 common.cpp:408] Device id: 2
I0108 12:35:04.907114 19872 common.cpp:410] Device backend: OpenCL
I0108 12:35:04.908617 19872 common.cpp:412] Backend details: Intel(R) Corporation: OpenCL 2.1
I0108 12:35:04.910100 19872 common.cpp:414] Device vendor: Intel(R) Corporation
I0108 12:35:04.911598 19872 common.cpp:416] Name: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
I0108 12:35:04.913100 19872 common.cpp:418] Total global memory: 8513761280

Looks good to me, although it seems you have both a newer OpenCL 2.1 and an older OpenCL 1.2 installed. As it's still a Haswell CPU I am not sure if Intel already has a 2.1/2.0 driver for your chip. But you should try to update your OpenCL SDK for your GPU.

Anyways, if you want to use INTEL_SPATIAL you need to also enable it at compile time. After that it becomes the standard engine on Intel GPU devices.
You can do that here:
https://github.com/BVLC/caffe/blob/opencl/scripts/build_win.cmd#L82
(scripts/build_win.cmd, line 82)

however the Intel spatial kernel has not been thoroughly tested on Windows yet.

I will try to update opencl sdk and i just saw your commits, will try to it enable, recompile and test them and report back.
Thanks

Okie with if NOT DEFINED USE_INTEL_SPATIAL set USE_INTEL_SPATIAL=1

Build_win.cmd throws the following error.

C:\Downloads\caffe-opencl\buildALL_BUILD.vcxproj" (default target) (1) ->
C:\Downloads\caffe-opencl\build\src\caffe\caffe.vcxproj" (default target) (3) ->

(ClCompile target) -> C:\Downloads\caffe-opencl\src\caffe\layers\conv_layer_spatial.cpp(1453): error C2572: 'caffe::ConvolutionLayerSpatial::swizzleWeights': redefinition of default argument: parameter 1 [C:\Downloads\caffe-opencl\build\src\caffe\caffe.vcxproj]

C:\Downloads\caffe-opencl\src\caffe\layers\conv_layer_spatial.cpp(1458): error C2572: 'caffe::ConvolutionLayerSpatial::swizzleWeights': redefinition of default argument: parameter 1 [C:\Downloads\caffe-opencl\build\src\caffe\caffe.vcxproj]

Ok, I'll look into that.
@gongzg for reference.

Hi all,
Thank you for great work!
I managed to compile and run caffe-opencl on Windows and Intel HD 4400 with USE_INTEL_SPATIAL=0 (caffe time is sadly around 2x slower than running caffe-cpu on 2-core i5-4210U, unless I am doing something wrong). However, when compiling with USE_INTEL_SPATIAL=1, I also get the same error as @atlury (and I believe I have the same hardware on my Lenovo X240). I am curious to see if using INTEL_SPATIAL will help make run caffe-opencl faster on this GPU than on a CPU ...

@gfursin It should, by a large margin. LibDNN expects the GPU to have a different memory architecture than what Intel chips have, so it does not run optimally at the moment.
We're currently investigating how to fix the Intel kernels so that they work on Windows as well.

Super! Thanks a lot!

By the way, @atlury, when selecting device 1 and 2, "caffe time" crashed each time after around 10 seconds - did you have the same behavior? Thanks!

@gfursin No I did no run caffe time (I will try to and report). I was frustrated with windows and later shifted to Ubuntu 17.04. See my comment here on linux. It works with spatial and I get more than 30 fps (VGG) in linux. https://github.com/BVLC/caffe/pull/5165

There is an Intel paper published here (clcaffe)
http://www.slideshare.net/IntelSoftware/clcaffe-unleashing-the-power-of-intel-graphics-for-deep-learning-acceleration

Where the following benchmarks (page 28 GT3 GPU) were supported using INTEL SPATIAL in convolution layers.
Alexnet - 290 Images/Second
GoogleNet - 77 Images/Second
VGGA - 55 Images/Second
Overfeat - 91 Images/Second

I really want to test out Object Detection (not just classification) as well using INTEL SPATIAL but there is no example as such anywhere. I doubt if the if the Caffe Layers are ready yet? @naibaf7 ?

@gongzg are there any source code for the above tests that we can try?

Further LiDNN has been made to work with tiny-dnn which is exciting (although not many pre-trained models in there). I also want to test out quantization and see how opencl can help there (8-bit, XNOR etc). Finally object detection in opencl in real time would be awesome!!! i hope @naibaf7 can thrown in some light.

@atlury I'll get back to you next week regarding the more difficult questions.
Intel spatial automatically gets used when you compile with the option enabled.
For object segmentation and detection I suggest you read my ISBI 2016 paper and technical report. I have SK-Net and U-Net architectures described there that can do this very fast. AlexNet can be converted to such a SK-Net.
You need to use LibDNN though to keep memory usage low in SK/U-Net.

Wow I just read your paper...the concept of Strided kernels seems very impressive. Not hijack this thread but all these will eventually need to be tested in Opencl under windows but before that....

Is this a python only implementation? No c++? Are there any pre-trained models? Is this where the repo is https://github.com/naibaf7/PyGreentea/tree/master/examples ? Yes I am gonna use LibDNN...

@atlury Yes the original interface was C++ but we switched to python. However if you want to provide the data through HDF5 or your own C++ interface that will work too. Just use the network generator codes that I provide in python to help you create the correct prototxt for SK/U-type networks.
Here's a slightly older but full technical report: https://arxiv.org/abs/1509.03371, it includes performance numbers before LibDNN was programmed.
We do not provide pre-trained models at this point since the datasets (EM classification) we use these on & our results are not yet published.

@atlury Some of the benchmark data are measured by using the convnet-benchmarks and you can reproduce it at your platform. We don't have other examples to share publicly currently.

@atlury - thanks a lot for references! I had many troubles installing and using OpenCL for Intel GPU on Ubuntu in the past (had to recompile Linux kernel), but maybe latest drivers will work ok - need to check that. By the way, in #5165 you have a snapshot of a webcam + Caffe classification with FPS measurements - may I ask you which program did you use for that? Thanks a lot!!!

@gfursin

Please do the following.

  1. Use http://cdimage.ubuntu.com/daily-live/current/

  2. Install opencl SDK and opencl Run time from (kernel patch is not required)
    https://software.intel.com/en-us/intel-opencl/download
    https://software.intel.com/en-us/articles/opencl-drivers

  3. Download https://github.com/BVLC/caffe/tree/opencl
    (a) Please compile with Viennacl, libdnn, intel spatial, opencv etc enabled. Please make a shared library. I dont enable python since I dont use it often.

  4. VGG caffemodel, prototxt
    Download
    http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel
    https://gist.githubusercontent.com/ksimonyan/211839e770f7b538e2d8/raw/0067c9b32f60362c74f4c445a080beed06b07eb3/VGG_ILSVRC_16_layers_deploy.prototxt

include engine: INTEL_SPATIAL for all convolutional layers in your deploy.proto

Get the synset_words.txt

  1. Test using this program
    https://gist.github.com/atlury/f65fd41eb805cc5f77f666a59e71eae2

Just make sure the input_dim is 1 (in your proto) and not 10 (you are only giving it one image at a time) with 3 channels and the resizing is automatic.
input_dim: 1
input_dim: 3
input_dim: 224
input_dim: 224

Any additional help buzz me on skype:atlury or gtalk:atlury

Please note that this will only work in linux and opencl support for windows is still being worked on by @naibaf7

Thank you very much @atlury for all details - very much appreciated - I will test it soon! By the way, I started automating installation of Caffe on Windows (CPU and OpenCL mode) using Collective Knowledge Framework, but it still needs more testing: https://github.com/dividiti/ck-caffe
I am waiting for a feedback from my colleagues and if it works fine, we will make an official release in a couple of weeks (possibly with a support for Android devices too) ...

Hi all. Thank you a lot for the library and also the discussions above.

I am currently trying to build the latest commit which fixed the Windows OpenCL build with USE_GREENTEA=1, USE_LIBDNN=1 and USE_INTEL_SPATIAL=1, together with the built the header files for ViennaCL. My build_win.cmd is attached here: build_win.txt

However, halfway building the library always ends up with the following error:
ninja: build stopped: subcommand failed.
ERROR: Build failed
In detail:
E:\caffe-opencl\src\caffe\layers\conv_layer_spatial.cpp(1514) : error C2572: 'ca ffe::ConvolutionLayerSpatial<float>::swizzleWeights' : redefinition of default p arameter : parameter 4 ..\..\include\caffe/layers/conv_spatial_layer.hpp(164) : see declaration of 'caffe::ConvolutionLayerSpatial<float>::swizzleWeights' E:\caffe-opencl\src\caffe\layers\conv_layer_spatial.cpp(1519) : error C2572: 'ca ffe::ConvolutionLayerSpatial<double>::swizzleWeights' : redefinition of default parameter : parameter 4
Apologies if this error is same as mentioned above.

Tried with USE_INTEL_SPATIAL=0 as well but with the following error:
greentea_math_functions.cpp.obj : error LNK2019: unresolved external symbol clEn queueUnmapMemObject referenced in function "void __cdecl caffe::greentea_gpu_asu m<float>(int,int,struct _cl_mem * const,int,float *)" (??$greentea_gpu_asum@M@ca ffe@@YAXHHQEAU_cl_mem@@HPEAM@Z) syncedmem.cpp.obj : error LNK2001: unresolved external symbol clEnqueueUnmapMemO bject benchmark.cpp.obj : error LNK2019: unresolved external symbol clWaitForEvents re ferenced in function "public: virtual __cdecl caffe::Timer::~Timer(void)" (??1Ti mer@caffe@@UEAA@XZ) benchmark.cpp.obj : error LNK2019: unresolved external symbol clReleaseEvent ref erenced in function "public: virtual __cdecl caffe::Timer::~Timer(void)" (??1Tim er@caffe@@UEAA@XZ) benchmark.cpp.obj : error LNK2019: unresolved external symbol clGetEventProfilin gInfo referenced in function "public: virtual float __cdecl caffe::Timer::MicroS econds(void)" (?MicroSeconds@Timer@caffe@@UEAAMXZ) bin\caffe.dll : fatal error LNK1120: 34 unresolved externals LINK failed. with 1120

Kindly advise on this. Very appreciated.

@yshen92 The first error is known and being worked on.
The second error you get usually means your OpenCL DLL is invalid, it lacks some symbols. Can you tell us what hardware you have and which OpenCL SDKs you have installed?
This issue usually comes with OpenCL SDKs coming from nVidia within CUDA. Intel and AMD OpenCL SDKs should be fine with version 1.2 and 2.0.

@naibaf7 Thanks a lot for the reply.

I am building the library on a Windows 8 Pro 64-bit Dell with Intel HD Graphics 4000 and NVIDIA NVS 5200M. And just installed the latest Intel OpenCL SDK v6.3. It appears that the OpenCL directory was pointing to the one that comes with CUDA.

So in an attempt to use the Intel OpenCL SDK in the build, I removed CUDA and did some rather crude modification to FindOpenCL.cmake at line 46, 48, 52, and 53 as follows:

IF("${ISWIN64}" STREQUAL "Win64")
FIND_LIBRARY(OPENCL_LIBRARIES OpenCL.lib "${OPENCL_LIB_DIR}" "$ENV{CUDA_LIB_PATH}" "$ENV{CUDA_PATH}/lib/x64" "$ENV{INTELOCLSDKROOT}/lib/x64")
ELSE("${ISWIN64}" STREQUAL "Win64")
FIND_LIBRARY(OPENCL_LIBRARIES OpenCL.lib "${OPENCL_LIB_DIR}" "$ENV{CUDA_LIB_PATH}" "$ENV{CUDA_PATH}/lib/Win32" "$ENV{INTELOCLSDKROOT}/lib/x86")
ENDIF("${ISWIN64}" STREQUAL "Win64")

FIND_PATH(OPENCL_INCLUDE_DIRS CL/cl.h PATHS "${_OPENCL_INC_CAND}" "$ENV{CUDA_INC_PATH}" "$ENV{CUDA_PATH}/include" "$ENV{INTELOCLSDKROOT}/include")
FIND_PATH(_OPENCL_CPP_INCLUDE_DIRS CL/cl.hpp PATHS "${_OPENCL_INC_CAND}" "$ENV{CUDA_INC_PATH}" "$ENV{CUDA_PATH}/include" "$ENV{INTELOCLSDKROOT}/include")
Basically it's just to add in the Intel SDK paths. Not sure if I am doing it right though. As without these the script fails to locate the SDK.

However, I am still getting the same error as above (building without Intel Spatial).
And here is my configuration file taken directly from the script for your reference.
ConfigInfo.txt
Any idea what I did wrong?

Hi @atlury - I finally found a bit of time to install Ubuntu 17.04 and it was quite straightforward to install Intel GPU drivers without any kernel rebuild - thanks! I also installed Caffe-OpenCL.

There is still a problem with kernel caching so start up cost is very high, but @naibaf7 and @psyhtest are trying to improve it. The temporal solution to slightly speed up kernel caching is to play with environment variables VIENNACL_CACHE_PATH and CUDA_CACHE_DISABLE (see https://github.com/dividiti/ck-caffe/issues/44#issuecomment-277205871).

I have a question though: I didn't understand how to add engine: INTEL_SPATIAL for all convolutional layers in the deploy.proto? I am still a novice user (I am more on compiler side trying to optimize sub-libraries). Do you mind to send me a sample, please - will be very appreciated!

Another note: if it's of any interest, I added support to assemble Caffe with OpenCL, ViennaCL and USE_INTEL_SPATIAL via CK framework (i.e. one-click rebuild):

ck install package:lib-caffe-bvlc-opencl-libdnn-clblast-universal --env.USE_INTEL_SPATIAL=ON

You can find details here: https://github.com/dividiti/ck-caffe/wiki/Installation

I'm profiling ViennaCL caching on a couple of platforms at the moment. I suspect it may work quite satisfactorily after a couple of changes.

@gfursin: CUDA_CACHE_DISABLE has the effect of disabling NVIDIA's own caching mechanism for kernels, so will only slow things down (on NVIDIA-powered platforms of course). However, I need it for cache profiling.

@psyhtest - thanks for your note since I misunderstood you - I thought it was a strange bug and a temporal workaround ;) ...

@naibaf7 what's the status of windows support now? I wonder whethe you already started to work on the intel spatial engine enablement for windows as well.

@gongzg I started setting up the Windows environment on the Intel laptop in order to test this, but didn't get further than that yet. So the Intel spatial engine still has compile issues on Windows, but the rest works.

@naibaf7 Thanks for the update. Then, I will check whether we have some internal resource to solve the interl spatial engine issues on Windows.

Is this branch working for windows and amd gpu?

Yes, it is. AMD GPUs are fully supported with LibDNN under both Windows and Linux. Full Intel GPU support will follow on the 28th of April.

@naibaf7
David, can you please let me know when the support for windows OpenCL is ready with python for Intel GPU? I have a few things to test and report.

@hillarycanas - I also confirm that I managed to compile and run OpenCL version of Caffe with libDNN on AMD with Windows 10 a few days ago (I used quite old AMD E1-2500 APU with Radeon HD Graphics just for a test). You can see performance results at http://tinyurl.com/k3dhsc2 (search for AMD).
I used CK package "lib-caffe-bvlc-opencl-libdnn-viennacl-universal" to compile and run Caffe on Windows (see https://github.com/dividiti/ck-caffe/wiki/Installation#AMD_GPU).

Sorry for bothering again. After user feedback, we added support to automatically package Caffe library and binaries for Windows in the CK, and it is now possible to install and benchmark Caffe CPU and OpenCL version on different Windows machines with minimal installation:

$ pip install ck
$ ck pull repo --url=https://github.com/dividiti/ck-caffe
$ ck install package:lib-caffe-bvlc-master-cpu-bin-win
$ ck install package:lib-caffe-bvlc-opencl-libdnn-viennacl-bin-win
$ ck crowdbench caffe --env.CK_CAFFE_BATCH_SIZE=1

@gfursin
This is great :)
Caffe can easily go mainstream like this :)

Hi,

I tried running the official OpenCL branch of caffe located at:
https://github.com/BVLC/caffe/tree/opencl on my Mac with the following hardware details:
Model Name: MacBook Pro
Model Identifier: MacBookPro12,1
Processor Name: Intel Core i5
Processor Speed: 2.7 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 3 MB
Memory: 8 GB
Graphics: Intel Iris Graphics 6100 1536 MB

I could run the Classification tutorial without any trouble. However, when I switch to the GPU mode, and try to run net.forward(), the kernel dies each time. Does this mean that this branch of OpenCL caffe does not support intel integrated Graphics card?

Thanks,

Sal

@saliltambe The OpenCL branch does support intel iGPU. Although most of the testing is on Linux system, it should work on Mac as well. What's the specific error you met when you run net.forward()?

@gongzg Thanks a lot for the reply. I get the following error message when running in jupyter notebook: "The kernel appears to have died. It will restart automatically." I am running a very small batch size so I don't think the kernel dies because of running out of memory. Moreover, I do not get an error when I switch to GPU mode using: caffe.set_mode_gpu(). I only get the error when I run net.forward().

@saliltambe
We need a bit more information: How did you compile it (compiler version, Caffe settings, which BLAS libraries did you enable)?
Can you run ./build/test/test_all.testbin or run make runtest (if you used Makefiles)?

@naibaf7
Hi fabian, here's the details:
-- General:
-- Version : 1.0.0
-- Git : unknown
-- System : Darwin
-- C++ compiler : /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++

-- Release CXX flags : -O3 -DNDEBUG -fPIC -Wall -std=c++11 -DCMAKE_BUILD -Wno-sign-compare -Wno-uninitialized

-- Debug CXX flags : -g -fPIC -Wall -std=c++11 -DCMAKE_BUILD -Wno-sign-compare -Wno-uninitialized

  • -- Build type : Release
  • -- BUILD_SHARED_LIBS : ON
  • -- BUILD_python : ON
  • -- BUILD_matlab : OFF
  • -- BUILD_docs : ON
  • -- CPU_ONLY : OFF
  • -- USE_OPENCV : ON
  • -- USE_FFT : OFF
  • -- USE_LEVELDB : ON
  • -- USE_LMDB : ON
  • -- USE_NCCL : OFF
  • -- ALLOW_LMDB_NOLOCK : OFF
  • -- USE_HDF5 : ON



    • -- Dependencies:

  • -- BLAS : Yes (vecLib)
  • -- Boost : Yes (ver. 1.64)
  • -- glog : Yes
  • -- gflags : Yes
  • -- protobuf : Yes (ver. 3.3.0)
  • -- lmdb : Yes (ver. 0.9.19)
  • -- LevelDB : Yes (ver. 1.20)
  • -- Snappy : Yes (ver. 1.1.4)
  • -- OpenCV : Yes (ver. 2.4.13.2)
  • -- CUDA : No



    • -- Python:

  • -- Interpreter : /Users/stambe/anaconda/bin/python2.7 (ver. 2.7.13)
  • -- Libraries : /Users/stambe/anaconda/lib/libpython2.7.dylib (ver 2.7.13)
  • -- NumPy : /Users/stambe/anaconda/lib/python2.7/site-packages/numpy/core/include (ver 1.12.1)

  • -- Documentaion:
  • -- Doxygen : No
  • -- config_file :

  • -- Install:
  • -- Install path : /Users/stambe/Programs/caffe-opencl/build/install

  • -- Configuring done

CMake Warning (dev) in src/caffe/CMakeLists.txt:

Policy CMP0022 is not set: INTERFACE_LINK_LIBRARIES defines the link interface. Run "cmake --help-policy CMP0022" for policy details. Use the cmake_policy command to set the policy and suppress this warning.
Target "caffe" has an INTERFACE_LINK_LIBRARIES property which differs from its LINK_INTERFACE_LIBRARIES properties.

INTERFACE_LINK_LIBRARIES:
caffeproto;/usr/local/lib/libboost_system-mt.dylib;/usr/local/lib/libboost_thread-mt.dylib;/usr/local/lib/libboost_filesystem-mt.dylib;/usr/local/lib/libglog.dylib;/usr/local/lib/libgflags.dylib;$<$>:/usr/local/lib/libprotobuf.dylib>;$<$:/usr/local/lib/libprotobuf.dylib>;/usr/local/lib/libhdf5_cpp.dylib;/usr/local/lib/libhdf5.dylib;/usr/lib/libpthread.dylib;/usr/lib/libz.dylib;/usr/lib/libdl.dylib;/usr/lib/libm.dylib;/usr/local/lib/libhdf5_hl_cpp.dylib;/usr/local/lib/libhdf5_hl.dylib;/usr/local/lib/libhdf5_cpp.dylib;/usr/local/lib/libhdf5.dylib;/usr/lib/libpthread.dylib;/usr/lib/libz.dylib;/usr/lib/libdl.dylib;/usr/lib/libm.dylib;/usr/local/lib/libhdf5_hl_cpp.dylib;/usr/local/lib/libhdf5_hl.dylib;/usr/local/lib/liblmdb.dylib;/usr/local/lib/libleveldb.dylib;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/System/Library/Frameworks/OpenCL.framework;opencv_core;opencv_highgui;opencv_imgproc;-lcblas;-framework Accelerate;/usr/local/lib/libboost_python-mt.dylib

LINK_INTERFACE_LIBRARIES:

caffeproto;/usr/local/lib/libboost_system-mt.dylib;/usr/local/lib/libboost_thread-mt.dylib;/usr/local/lib/libboost_filesystem-mt.dylib;/usr/local/lib/libglog.dylib;/usr/local/lib/libgflags.dylib;/usr/local/lib/libprotobuf.dylib;/usr/local/lib/libhdf5_cpp.dylib;/usr/local/lib/libhdf5.dylib;/usr/lib/libpthread.dylib;/usr/lib/libz.dylib;/usr/lib/libdl.dylib;/usr/lib/libm.dylib;/usr/local/lib/libhdf5_hl_cpp.dylib;/usr/local/lib/libhdf5_hl.dylib;/usr/local/lib/libhdf5_cpp.dylib;/usr/local/lib/libhdf5.dylib;/usr/lib/libpthread.dylib;/usr/lib/libz.dylib;/usr/lib/libdl.dylib;/usr/lib/libm.dylib;/usr/local/lib/libhdf5_hl_cpp.dylib;/usr/local/lib/libhdf5_hl.dylib;/usr/local/lib/liblmdb.dylib;/usr/local/lib/libleveldb.dylib;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/System/Library/Frameworks/OpenCL.framework;opencv_core;opencv_highgui;opencv_imgproc;-lcblas;-framework Accelerate;/usr/local/lib/libboost_python-mt.dylib

When I run make runtest -j8, I get the following error:
stambe-osx:build stambe$ make runtest -j8
[ 1%] Built target gtest
[ 2%] Built target caffeproto
[ 71%] Built target caffe
[ 71%] Building CXX object src/caffe/test/CMakeFiles/test.testbin.dir/test_deconvolution_layer.cpp.o
[ 72%] Building CXX object src/caffe/test/CMakeFiles/test.testbin.dir/test_db.cpp.o
[ 73%] Building CXX object src/caffe/test/CMakeFiles/test.testbin.dir/test_eltwise_layer.cpp.o
[ 73%] Building CXX object src/caffe/test/CMakeFiles/test.testbin.dir/test_dummy_data_layer.cpp.o
[ 73%] Building CXX object src/caffe/test/CMakeFiles/test.testbin.dir/test_embed_layer.cpp.o
[ 73%] Building CXX object src/caffe/test/CMakeFiles/test.testbin.dir/test_euclidean_loss_layer.cpp.o
[ 73%] Building CXX object src/caffe/test/CMakeFiles/test.testbin.dir/test_filter_layer.cpp.o
[ 75%] Building CXX object src/caffe/test/CMakeFiles/test.testbin.dir/test_filler.cpp.o
/Users/stambe/Programs/caffe-opencl/src/caffe/test/test_db.cpp:23:27: error: use of undeclared
identifier 'EXAMPLES_SOURCE_DIR'
root_images_(string(EXAMPLES_SOURCE_DIR) + string("images/")) {}

Is there a problem with my installation? Please note that I am not having trouble running caffe on CPU.

thanks,
Sal

Thanks to the given pointers and leads, I have managed to build and run Caffe OpenCL successfully on two separate machines with the following specs:

(A)

  • CPU: Intel i7-7567U 3.5GHz
  • iGPU: Intel Iris Plus 650
    (B)
  • CPU: Intel i7-4930K 3.4GHz
  • GPU: NVIDIA Geforce GTX 650

I could run AlexNet, ResNet50 & ResNet152 on both (A) & (B) without a hitch. The only issue I have is batch processing on ResNets on (A) - the predicted class is never correct. Since the matter only arises in (A) AND only ResNets' batch process, I reckon the root cause could be originated from LibDNN and more specifically Intel's spatial kernel. I could disable the Intel engine but then it wouldn't attain the speed that I'm overly satisfied with now.

Any more pointers and leads on this one? Thanks in advance.

@gongzg Any pointers on why the spatial kernel could have issues with batch processing?

@naibaf7 There are some bugs in the current spatial convolution engine and I haven't got time to submit those PR to the upstream OpenCL branch yet. Part of the reason is that all my current work depends on the FP16 PR so I want to wait for the FP16 patch to be reviewed and merged. For now, I would like to recommend @jstumpin to try github.com/01org/caffe's inference-optimize branch to check whether the problem is gone. The corresponding wiki page is at Intel OpenCL caffe wiki

@jstumpin It's better to apply the layer fusion for resnet, and you will see a noticable performance gain.

Great pointers (and news too) all around. Heard about github.com/01org/caffe before but wasn't very keen since I'm obliged to go for Windows deployment. I'm getting [given specs in (A)]:

  1. AlexNet

    • 110 images/sec, 164 images/sec (batch = 15)

  2. ResNet50

    • 11 images/sec, 14 images/sec (batch = 5)

  3. ResNet152

    • 5 images/sec, 6 images/sec (batch = 2)

The followings are deprived of Intel spatial:

  1. AlexNet

    • 113 images/sec, 114 images/sec (batch = 15)

  2. ResNet50

    • 9 images/sec, 14 images/sec (batch = 5)

  3. ResNet152

    • 5 images/sec, 6 images/sec (batch = 2)

For completeness, I've also benchmarked against OpenCV 3.3.0 and Caffe's CPU-only (with Intel MKL, all of them are). Suffice to say, CPU-only is a no-go, regardless of network topology. I decrease the batch size as the network complexity grows to be fair with system (B) as the GPU memory is limited. Moreover, I didn't observe any gain beyond those experimented batch sizes. BTW, (B) is actually equipped with NVIDIA Quadro K4200, not a Geforce as initially advertised (it is irrelevant anyway since we are targeting (B) for deployment).

To do:
[1] Rebuild Caffe from inference-optimize branch
[2] Apply layer fusion and re-evaluate performance

Thanks!

@jstumpin, inference-optimzie branch also support windows, when you clone the code, please check the README for the instructions to build on Windows. If you found any issue, just open a ticket there.

@gongzg Yes review & refactor of FP16 is in progress.

Albeit correct label, why does prediction value exceeding 1? This holds true when using OpenCL with or without Intel spatial (CPU-only is fine). Shouldn't softmax's output be probabilistic?

I am trying the latest OpenCL branch on my Intel NUC. The hardware is

  • Intel Core i7-7567U CPU @3.5GHz
  • Intel Iris Plus Graphics 650
  • Windows 10 Pro
  • Visual Studio 14 2015 Win64

I cloned the latest viennacl in a folder parallel to Caffe folder. Then, I run the scripts/build_win.cmd without modifying it (as I did not see any needs to modify it). But get strange error which is not found in all the discussion above. Please see in the output below. Before going there, one more info is that I built windows branch successfully.

Here is the output of build_win.cmd. Really appreciate if anyone could help me out on this!

================================

c:\DL\caffe\scripts>build_win.cmd
The system cannot find the drive specified.
The system cannot find the drive specified.
INFO: ============================================================
INFO: Summary:
INFO: ============================================================
INFO: MSVC_VERSION = 14
INFO: WITH_NINJA = 0
INFO: CMAKE_GENERATOR = "Visual Studio 14 2015 Win64"
INFO: CPU_ONLY = 0
INFO: USE_CUDA = 0
INFO: CUDA_ARCH_NAME = Auto
INFO: USE_CUDNN = 0
INFO: USE_GREENTEA = 1
INFO: USE_LIBDNN = 1
INFO: USE_OPENMP = 0
INFO: USE_INDEX64 =
INFO: USE_INTEL_SPATIAL = 0
INFO: DISABLE_DEVICE_HOST_UNIFIED_MEMORY = 0
INFO: CMAKE_CONFIG = Release
INFO: USE_NCCL = 0
INFO: CMAKE_BUILD_SHARED_LIBS = 0
INFO: PYTHON_VERSION = 2
INFO: BUILD_PYTHON = 1
INFO: BUILD_PYTHON_LAYER = 1
INFO: BUILD_MATLAB = 0
INFO: PYTHON_EXE = "python"
INFO: RUN_TESTS = 0
INFO: RUN_LINT = 0
INFO: RUN_INSTALL = 0
INFO: ============================================================
-- Selecting Windows SDK version to target Windows 10.0.15063.
-- The C compiler identification is MSVC 19.0.24215.1
-- The CXX compiler identification is MSVC 19.0.24215.1
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PythonInterp: C:/Users/NUC-Sonavex/AppData/Local/Programs/Python/Python35/python.exe (found suitable version "3.5.3", minimum required is "2.7")
-- Looking for pthread.h
-- Looking for pthread.h - not found
-- Found Threads: TRUE
-- Boost version: 1.61.0
-- Found the following Boost libraries:
-- system
-- thread
-- filesystem
-- chrono
-- date_time
-- atomic
-- Found GFlags: C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/include
-- Found gflags (include: C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/include, library: gflags_shared)
-- Found Glog: C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/include
-- Found glog (include: C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/include, library: glog)
-- Found Protobuf: C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/bin/protoc.exe (found version "3.1.0")
-- Found PROTOBUF Compiler: C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/bin/protoc.exe
-- Found LMDB: C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/include
-- Found lmdb (include: C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/include, library: lmdb)
-- Found LevelDB: C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/include
-- Found LevelDB (include: C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/include, library: leveldb)
-- Found ZLIB: optimized;C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/lib/caffezlib.lib;debug;C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/lib/caffezlibd.lib (found version "1.2.8")
-- Found Snappy: C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/include
-- Found Snappy (include: C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/include, library: snappy_static;optimized;C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/lib/caffezlib.lib;debug;C:/Users/NUC-Sonavex/.caffe/dependencies/libraries_v140_x64_py35_1.1.0/libraries/lib/caffezlibd.lib)
-- -- CUDA is disabled. Building without it...
-- Found ViennaCL include: C:/DL/viennacl
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
OPENCL_INCLUDE_DIRS
used as include directory in directory C:/DL/caffe/scripts/build/CMakeFiles/CMakeTmp
_OPENCL_64_LIBRARIES
linked by target "cmTC_7259e" in directory C:/DL/caffe/scripts/build/CMakeFiles/CMakeTmp

CMake Error at cmake/Modules/FindOpenCL.cmake:106 (TRY_COMPILE):
Failed to configure test project build system.
Call Stack (most recent call first):
cmake/Modules/FindViennaCL.cmake:37 (find_package)
cmake/Dependencies.cmake:116 (find_package)
CMakeLists.txt:127 (include)

-- Configuring incomplete, errors occurred!
See also "C:/DL/caffe/scripts/build/CMakeFiles/CMakeOutput.log".
See also "C:/DL/caffe/scripts/build/CMakeFiles/CMakeError.log".
ERROR: Configure failed

@bxk-sonavex Just as I mentioned above, some of the Intel specific patches haven't been reviewed and merged, so please try github.com/01org/caffe inference-optimize branch. You can refer the following wiki for detail instructions on Intel platform:
https://github.com/01org/caffe/wiki/clCaffe

@gongzg That may not help given that some parts of the Intel OpenCL implementation don't work on Windows. But working on it, as you know :)

@naibaf7 Does that mean that this branch is NOT working at all with Intel iCPU on Windows? Then, where can I find a Caffe version supporting OpenCL on Windows with Intel iGPU?

@gongzg Is the clCaffe you suggested working on Windows with Intel iGPU?

@bxk-sonavex It will work, but not with the Intel convolutions, therefore non optimal performance.
At the moment I think you can't find that, but I am working on a solution.
Your problem has more to do with missing OpenCL headers. What OpenCL have you installed? The Intel SDK?

@naibaf7 Yes, I am using Intel SDK v6.3. I found a workaround here (https://github.com/BVLC/caffe/issues/5575) and it works for me. Now I got the opencl branch compiled. Further, I tested my build using the mnist example provided in the examples folder. When using CPU (by modifying lenet_solver.prototxt), the train_lenet ran without any problem and the final training accuracy is 0.9902, which is as expected.

I1107 13:53:43.139747 3512 solver.cpp:421] Test net output #0: accuracy = 0.9902
I1107 13:53:43.139747 3512 solver.cpp:421] Test net output #1: loss = 0.0277191 (* 1 = 0.0277191 loss)

However, when using GPU, I got "caffe.exe has stopped working" error message window and the accuracy is just 0.1009.

I1107 14:11:15.651798 7872 solver.cpp:421] Test net output #0: accuracy = 0.1009
I1107 14:11:15.651798 7872 solver.cpp:421] Test net output #1: loss = 87.31 (* 1 = 87.31 loss)

Could you give me some leads on what happened? How to solve it? Or is this the thing that @gongzg mentioned?

That may not help given that some parts of the Intel OpenCL implementation don't work on Windows. But working on it, as you know :)

The places I modified from the default build_win.cmd are

set WITH_NINJA=1 
set CMAKE_BUILD_SHARED_LIBS=1 
set PYTHON_VERSION=3 
set RUN_INSTALL=1

Should I set the USE_INTEL_SPATIAL?

When set USE_INTEL_SPATIAL=1, the branch cannot be compiled. The error is

ninja: build stopped: subcommand failed.

@naibaf7 The 01org version works fine on Windows now. I'm still busy on other things so I haven't got enough time to submit all fixes to this OpenCL branch. Will do that when I have some time in the near future. @bxk-sonavex You can try the 01org version following the wiki page, and if you met any problem with that, please let me know.

@gongzg Thanks! Following the instruction on https://github.com/01org/caffe/wiki/clCaffe#windows-support-for-intel-gen-platform, I got the error message:

fatal error C1083: Cannot open include file: 'caffe/proto/caffe.pb.h': No such file or directory

FYI:
https://github.com/ptillet/isaac.git is only compatible with NVIDIA hardware and cannot even be compiled, so I clone the https://github.com/intel/isaac.

UPDATE:
Manually generated the files via

build\libraries\bin\protoc.exe src\caffe\proto\caffe.proto --cpp_out=.\

Supposedly, the files should be generated automatically.

Then I got the following error:

"C:\DL\clCaffe\build\src\caffe\test\runtest.vcxproj" (default target) (1) ->
(CustomBuild target) ->
  C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code -1073741515. [C:\DL\clCaffe\build\src\caffe\test\runtest.vc
xproj]

    2345 Warning(s)
    1 Error(s)

Time Elapsed 00:03:55.08
ERROR: Tests failed

Disabled RUN_TESTS and building the third time...

@bxk-sonavex It seems that it was already built successfully. You need to copy the dll files to the executable files's directory:
"
Please be noted that, after the building finished successfully, before you try to run the application, you need to copy the dl.dll (dlfcn) and isaac.dll (isaac) into the same directory or put them into a system directory.
"

@gongzg Added the folders of the two dlls in the system path instead of copying them to the test folder. Now got another error, which looks pretty serious...

"C:\DL\clCaffe\build\src\caffe\test\runtest.vcxproj" (default target) (1) ->
(CustomBuild target) ->
  CUSTOMBUILD : Fatal error : Intel iGPU device found but doesn't support cl_intel_subgroups_short. [C:\DL\clCaffe\build\src\caffe\test\runtest.vcxproj]

    2333 Warning(s)
    1 Error(s)

Time Elapsed 00:05:41.97
ERROR: Tests failed

I am using Intel Iris Plus Graphics 650 and intel_sdk_for_opencl_setup_6.3.0.1904. Any thoughts and solution?

@bxk-sonavex You need to update your Intel Graphics driver to the latest version.

@gongzg Thanks, that solved the compiling error. When running the tests, I got a whole bunch of errors like (may not catch all of them)

C:\DL\clCaffe\src\caffe\test\test_argmax_layer.cpp(132): error : Expected: (bottom_data[i * dim + j]) <= (max_val), actual: -0.402832 vs -0

C:\DL\clCaffe\src\caffe\test\test_convolution_layer_spatial.cpp(735): error : The difference between top_data[i] and ref_top_data[i] is 1.8
077674604790599e+28, which exceeds delta, where [C:\DL\clCaffe\build\src\caffe\test\runtest.vcxproj]
  top_data[i] evaluates to -1.8077674604790599e+28,
  ref_top_data[i] evaluates to 7.1034564971923828, and
  delta evaluates to 9.9999997473787516e-05.

C:\DL\clCaffe\src\caffe\test\test_convolution_layer_spatial.cpp(735): error : The difference between top_data[i] and ref_top_data[i] is 1
.803808228419822e+28, which exceeds delta, where [C:\DL\clCaffe\build\src\caffe\test\runtest.vcxproj]

    2418 Warning(s)
    17672 Error(s)

Time Elapsed 00:10:25.65
ERROR: Tests failed

Should these errors be concerned?

Anyway, I am testing the build using the mnist example. It's extremely slow, even much much slower than the original Caffe using CPU. And there are some warnings (repeated several times)

warning: Linking two modules of different data layouts: '' is 'e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024' whereas '<origin>' is 'e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32:64'

warning: Linking two modules of different target triples: ' is 'spir64' whereas '<origin>' is 'vISA_64'

Any idea?

@bxk-sonavex

Why dont you work with running caffe in linux for the time being? Devs i guess are focused more on getting the FP16, INT8 code etc running smoothly especially naibaf7 (david).

Proper windows support will come eventually.

Just a suggestion though.

@atlury I'd love to!!! But our system is Windows 10 + Intel Iris ... Have any idea on when the Windows support will come? Or, any other DL platform works (using GPU)?

@gongzg Just want to update you with the performance
CPU: 7 minutes 33 seconds, accuracy = 0.9914
GPU: 29 minutes 34 seconds, accuracy = 0.8406

Wondering what is the performance on Linux. Then, I could have a basic idea on how much speed up using Intel GPU (OpenCL) vs CPU. Thanks!!

@bxk-sonavex

Ben did you enable the opencl kernels? Did you try using INTEL_SPATIAL?

@atlury What do you mean "enable the opencl kernels"? Yes, I followed the instruction here (https://github.com/01org/caffe/wiki/clCaffe#how-to-build) and did "set USE_INTEL_SPATIAL=1" in command line (not directly modifying the build_win.cmd file).

UPDATE:
INFO: ============================================================
INFO: Summary:
INFO: ============================================================
INFO: MSVC_VERSION = 14
INFO: WITH_NINJA = 0
INFO: CMAKE_GENERATOR = "Visual Studio 14 2015 Win64"
INFO: CPU_ONLY = 0
INFO: USE_CUDA = 0
INFO: USE_CUDNN = 0
INFO: USE_GREENTEA = 1
INFO: USE_LIBDNN = 1
INFO: USE_OPENMP = 0
INFO: USE_INDEX64 =
INFO: USE_INTEL_SPATIAL = 1
INFO: USE_ISAAC = 1
INFO: CMAKE_CONFIG = Release
INFO: USE_NCCL = 0
INFO: CMAKE_BUILD_SHARED_LIBS = 0
INFO: PYTHON_VERSION = 2
INFO: BUILD_PYTHON = 0
INFO: BUILD_PYTHON_LAYER = 0
INFO: BUILD_MATLAB = 0
INFO: PYTHON_EXE = "python"
INFO: RUN_TESTS = 1
INFO: RUN_LINT = 0
INFO: RUN_INSTALL = 1
INFO: ============================================================

@bxk-sonavex

Ben you will need to include INTEL_SPATIAL for all convolutional layers in your deploy.proto. I have personally tested it in real time in linux.

https://github.com/BVLC/caffe/pull/5165

"I have tested on an Intel tv stick, webcam using Intel Spatial kernels and using 19-layer vgg model. I am able to get real time classification and all under 3.5 watts"

Windows should also work.

@bxk-sonavex for the issue on 01org version, please open an issue there. There are some test failures due to FP16 precision issue on those gradient test cases which is not critical. The performance is extremely slow which should be caused by the auto-tuning. It should be much faster when you run it again. You can firstly try to use the build/tools/caffe to measure the forward performance for AlexNet.

By the way, I just noticed that @CNugteren released new 1.2.0 version of his autotuned CLBlast library a few days ago. I checked it and it seems to be working with Caffe on my Windows 10 Lenovo Laptop with old Intel 4400 GPU (as well as on Linux) - so it can be a nice addition to Caffe since previous CLBlast version was seg-faulting on Windows!

If you are interested, you can check the speed of Caffe with LibDNN and CLBlast for example on SqueezeDet as following (the same procedure on both Windows and Linux):

$ pip install ck
$ ck pull repo --url=https://github.com/dividiti/ck-caffe
$ ck install package:lib-caffe-bvlc-opencl-libdnn-clblast-universal-20171015

It will take some time since CK will attempt to detect your environment and compilers,
and will then rebuild all necessary dependencies on your machine.

After that you can just install SqueezeDet and run internal time:

$ ck install package:caffemodel-deepscale-squeezenet-1.1
$ ck run program:caffe --cmd_key=time_gpu

The first run can be a bit slow due to kernel compilation and caching so the second run will be much faster!

You can also benchmark image classification:

$ ck pull repo:ctuning-datasets-min
$ ck run program:caffe --cmd_key=classify

Not related to Intel but just a note that there seems to be a minor bug when compiling Caffe with CLBlast 1.2.0 for Android ARM64 using Android GCC 4.9.x ("to_string" not found in std class):

$ ck install package:lib-caffe-bvlc-opencl-libdnn-clblast-universal-20171015 --target_os=android21-arm64 --env.DISABLE_DEVICE_HOST_UNIFIED_MEMORY=ON
$ ck compile program:caffe-time-opencl --target_os=android21-arm64
$ ck run program:caffe-time-opencl --target_os=android21-arm64

Would be nice to fix it since CLBlast 1.1.0 works fine on Android... In such case, it will be working with Caffe across all platforms.

Hope it's of any help and have a good weekend!

there seems to be a minor bug when compiling Caffe with CLBlast 1.2.0 for Android ARM64 using Android GCC 4.9.x ("to_string" not found in std class):

Not sure whether you mean that this is a bug in CLBlast or in Caffe? In any case, CLBlast has this implemented in a special Android header. Perhaps that could be used within Caffe as well?

@CNugteren - I just checked and the problem is not in CLBlast. I just forgot a patch in the CK which was fixing LibDNN for Android (so my fault). I have added it (https://github.com/dividiti/ck-caffe/blob/master/package/lib-caffe-bvlc-opencl-clblast-universal/patch.android/android.fgg.patch3) and it's now possible to compile Caffe with CLBlast and libDNN. I checked classification and benchmarking examples on my Samsung S7 - works fine. So sorry for this false alarm and thanks for releasing a new CLBlast - I can now use it in Caffe on Linux, Windows and Android.

@gfursin Is this a version using CPU or GPU (OpenCL)? I thought it is saying that the OpenCL is not working on Windows yet (or at least not with Intel iGPU yet). What are you using on Windows?

@bxk-sonavex

Ben sorry for the delay in responding back. I was away.

To quote @naibaf7
"The convolution method ("engine") can alternatively be selected/overwritten in the network prototxt file"

Thus add entry "engine: INTEL_SPATIAL" to all convolution layer specification.

Take AlexNet as an example, edit the file say $CAFFE_ROOT/models/bvlc_alexnet/train_val.prototxt, and add the following line to make conv1 layer to be computed using spatial convolution. Likewise change other layers

 layer {
   name: "conv1"
   type: "Convolution"
   bottom: "data"
   top: "conv1"
   param {
     lr_mult: 1
     decay_mult: 1
   }
   param {
     lr_mult: 2
     decay_mult: 0
   }
   convolution_param {
     num_output: 96
     kernel_size: 11
     stride: 4
     engine: INTEL_SPATIAL      <-------------------------- this line!
     weight_filler {
       type: "gaussian"
       std: 0.01
     }
     bias_filler {
       type: "constant"
       value: 0
     }
   }
 }

Edit: My bad I see you had opened another thread and seems to have progressed a bit more.

@bxk-sonavex - I use Caffe OpenCL version (with libDNN and CLBlast) on Windows with old Intel 4400 GPU WITHOUT Intel Spatial - it seems to be working fine but it may be suboptimal. Here is the list of Caffe devices ("ck run program:caffe --cmd_key=query_gpu_opencl"):
output_caffe_opencl_devices.txt

Here is the output from image classification on Windows with above Caffe OpenCL version and GoogleNet:
output_caffe_opencl_image_classification.txt

I mostly check inference/object detection at this stage (we are trying to unify DNN installation, benchmarking and optimization across all possible platforms) so I didn't really stress other Caffe capabilities and models on Windows with OpenCL ...

I also just tried to compile Caffe OpenCL with Intel Spatial ON ("ck install package:lib-caffe-bvlc-opencl-libdnn-clblast-universal --env.USE_INTEL_SPATIAL=ON") and I observe the same 2 build errors as was reported earlier by @atlury):
output_caffe_build_error_with_intel_spatial.txt

Is there a build script available for Linux (Ubuntu 16.04) too?. I am getting errors when trying to compile

@rachithayp Follow the instructions carefully, it will work even on 18.0x series. We have tested it.

Hi @rachithayp . Just a note that you likely need to patch kernel to make Intel OpenCL work on Ubuntu 16.04: https://github.com/dividiti/ck-caffe/wiki/Installation#Intel_CPUGPU_Linux .

I managed to build OpenCL branch of Caffe on my Ubuntu 18.04 (Lenovo T470p laptop with Intel GPU) without patching kernel and with the latest Intel OpenCL via CK some weeks ago:

$ sudo pip install ck

$ ck pull repo --url=https://github.com/ctuning/ck-caffe

$ ck install package:lib-caffe-bvlc-opencl-viennacl-universal --env.USE_INTEL_SPATIAL=ON --env.CAFFE_BUILD_PYTHON=ON

CK will attempt to detect your available compilers, OpenCL libraries and other dependencies, and will invoke cmake for Caffe. If the build is successful, you can check installation using CK virtual env:

$ ck show env
$ ck virtual env --tags=lib,caffe
> python
import caffe

You can also try a sample image classification as follows:

$ ck compile program:caffe-classification-opencl --speed
$ ck run program:caffe-classification-opencl

Good luck.

cc @ens-lg4 and @psyhtest ...

@atlury I was able to compile using the below cmake:
cmake .. -DUSE_CUDA=OFF -DBUILD_docs=0 -DOPENCL_LIBRARIES=<> -DOPENCL_INCLUDE_DIRS=<>

But trying to compile with INTEL_SPATIAL_ON is giving below errors:
cmake .. -DUSE_GREENTEA=ON -DUSE_CUDA=OFF -DUSE_INTEL_SPATIAL=ON -DBUILD_docs=0 -DOPENCL_LIBRARIES=<> -DOPENCL_INCLUDE_DIRS=<>

/home/intel/Documents/caffe_src/opencl_caffe/src/caffe/libdnn/libdnn_conv_spatial.cpp:19:1: error: ‘LibDNNConvSpatial’ does not name a type
LibDNNConvSpatial::LibDNNConvSpatial(LibDNNConvConfig config) {
^
/home/intel/Documents/caffe_src/opencl_caffe/src/caffe/libdnn/libdnn_conv_spatial.cpp:117:25: error: expected initializer before ‘<’ token
string LibDNNConvSpatial::generate_fw_defs() {

Any idea what could be wrong?. Also there is no include/caffe/greentea folder on the opencl branch, so I copied it from "https://github.com/01org/caffe".

@rachithayp
Can you try the instruction from the chapter below? Its an rough cut of the the installation chapter from our upcoming book on opencl caffe. Thank you @naibaf7

I hope it will throw some light and help you in your opencl caffe endeavors.

python-deep-learning-installation-chap.pdf

@bxk-sonavex - I use Caffe OpenCL version (with libDNN and CLBlast) on Windows with old Intel 4400 GPU WITHOUT Intel Spatial - it seems to be working fine but it may be suboptimal. Here is the list of Caffe devices ("ck run program:caffe --cmd_key=query_gpu_opencl"):
output_caffe_opencl_devices.txt

Here is the output from image classification on Windows with above Caffe OpenCL version and GoogleNet:
output_caffe_opencl_image_classification.txt

I mostly check inference/object detection at this stage (we are trying to unify DNN installation, benchmarking and optimization across all possible platforms) so I didn't really stress other Caffe capabilities and models on Windows with OpenCL ...

I also just tried to compile Caffe OpenCL with Intel Spatial ON ("ck install package:lib-caffe-bvlc-opencl-libdnn-clblast-universal --env.USE_INTEL_SPATIAL=ON") and I observe the same 2 build errors as was reported earlier by @atlury):
output_caffe_build_error_with_intel_spatial.txt

do your HD 4400 run faster with caffe than CPU?
I compiled clCaffe, and run it on my HD 5500 , but it's 5 times slower than CPU(i3 5005U)
I don't know why.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

weather319 picture weather319  ·  3Comments

OpenHero picture OpenHero  ·  3Comments

Ruhjkg picture Ruhjkg  ·  3Comments

FreakTheMighty picture FreakTheMighty  ·  3Comments

hawklucky picture hawklucky  ·  3Comments