Lightweight-human-pose-estimation.pytorch: libtorch c++ demo

Created on 11 Jun 2019 · 18Comments · Source: Daniil-Osokin/lightweight-human-pose-estimation.pytorch

Hi, I have managed using openvino c++ demo, runs about 15fps on CPU. which is not fast enough.

Does there any plan to support a libtorch c++ demo? I want implement one if not, there are some problems for me:

in terms of preprocessing, you scaled image accroding to height, and padding the lost pixels with zeros. I think it's kind of complicated, if I resize image directly to target size say (256, 456), will it work?
I managed traced the model and loaded from libtorch, but there are 4 outputs, so I toke the last 2 outputs tensor. I saw the keypoints extract and group method in python, it's very hard implement on C++, does there any snippets could help directly extract all keypoints in instance from the output tensor?

Hope you reply soon.

Source

jinfagang

Most helpful comment

@stereomatchingkiss thank for reply. I also successfully created C++ version follow human pose estimation of openvino sample too.

tucachmo2202 on 2 Sep 2021

🚀1 👍1

All 18 comments

Hi, I don't have such plans. And I'm not sure, that libtorch c++ demo will be faster, than our c++ demo with OpenVINO. The network input size can be tweaked for better performance, e.g. set the shortest side to 192, can be done here, set the last parameter to 192. It may be enough for specific case. Answering the questions:

The input video aspect ratio matters. If shapes of persons are skewed enough, the model will not work. So resize should be done to preserve aspect ratio (same scale for each axis). Looks like it should work with odd image size. The demo pads to even dimensions for processing with vectorized operations (SIMD).
To take the last two tensors is ok, network has 2 stages, the second takes results from the first (the first two output tensors) and refines them. The c++ demo is open, you can find c++ implementation of grouping here.

And please, contribute c++ demo with PyTorch if you will decide to do it.

Daniil-Osokin on 11 Jun 2019

@Daniil-Osokin thanks for your reply, I already migrate necessary codes from openvino into libtorch cpp demo. After I successed, I will send a PR.

Using libtorch may not faster than openvino but I can not see any huge performance gain from openvino, but libtorch can turns on GPU support optional.

Now there is another issue around me... Those models using heatmaps always need alot memories to run even it using Mobilenet as backbone. This model runs about 79fps on my gtx1080 but token my 1.7G memories. Almost a eating memory monster. Do you have any model optimization interms of shrink the memory size?

jinfagang on 11 Jun 2019

OK..... Some trouble meet......... program runs but result not right:

jinfagang on 12 Jun 2019

👍1

I've checked memory consumption in PyTorch (by running demo.py), it shows ~800 MB. With OpenVINO the consumption is about 100 MB.

Daniil-Osokin on 12 Jun 2019

python takes 800M? Did you minus your desktop usage? Mine almost token 1.2G at least

jinfagang on 12 Jun 2019

@Daniil-Osokin Hi Danill, Did you saw any where does openvino c++ did a normalize on input image? (which is minu 128 and divide 256 ) I don't found any codes doing this. After debugging I think I missed some preprocess step in libtorch_cpp demo...

jinfagang on 12 Jun 2019

Good question. By default all models have preprocessing (if it needed at all) inside its .xml files as a separate layer right after data layer, e.g.:

<layer id="0" name="data" precision="FP32" type="Input">
            <output>
                <port id="0">
                    <dim>1</dim>
                    <dim>3</dim>
                    <dim>256</dim>
                    <dim>456</dim>
                </port>
            </output>
        </layer>
<layer id="1" name="Mul_/Fused_Mul_/FusedScaleShift_" precision="FP32" type="ScaleShift">
            <input>
                <port id="0">
                    <dim>1</dim>
                    <dim>3</dim>
                    <dim>256</dim>
                    <dim>456</dim>
                </port>
            </input>
            <output>
                <port id="3">
                    <dim>1</dim>
                    <dim>3</dim>
                    <dim>256</dim>
                    <dim>456</dim>
                </port>
            </output>
            <blobs>
                <weights offset="0" size="12"/>
                <biases offset="12" size="12"/>
            </blobs>
        </layer>

So I believe normalization from above should help (minus 128 and divide 256).

And for the memory consumption, yes it's just the memory, which taken by python.

Daniil-Osokin on 12 Jun 2019

@Daniil-Osokin thank u Danill. I have add preprocess to libtorch demo but no result, the result still not right.

Would you help me out about reviewing some preprocessing step in C++? I am out of direction now in figuring out which step is missed out. Wishing for your help..
I have take the necessary codes from openvino and integrate with libtorch. I move it to a separate repo in here It only need edit a libtorch path and then can be built, the traced c++ pt model already there ( I concat the last output to a single tensor when trace model since libtorch has some problem deal with tuple).

It can compiles and runs but I can not figure out where does the logic goes wrong...
If you can help you can check it out when you have time.

jinfagang on 12 Jun 2019

I believe the problem is that OpenCV reads image in interleaved format (HxWxC), and tensor expects planar input (CxHxW order), made issue in your repo.

Daniil-Osokin on 13 Jun 2019

@jinfagang Did this help?

Daniil-Osokin on 17 Jun 2019

@Daniil-Osokin thanks Danill it works after I edit the channel order! I will close this issue and send a PR after some codes clean.

jinfagang on 17 Jun 2019

🎉1

@jinfagang hi, your repo (https://github.com/jinfagang/light_human_pose_libtorch) can not visit any more.
and can you tell me how get the traced c++ pt model ?

chenxian9999 on 21 Jun 2019

Sorry, I am currently develop some other functionalities and private it for a while.
You can directly trace it as pytorch tutorial says. It can trace easily

jinfagang on 21 Jun 2019

👎1

@jinfagang Thank you.

chenxian9999 on 22 Jun 2019

Hi @jinfagang,
Have you done with your libtorch? Could you please share it for everyone?
Best regard!

tucachmo2202 on 11 May 2021

Hi @jinfagang,
Have you done with your libtorch? Could you please share it for everyone?
Best regard!

I am able to create a c++ version with libtorch and opencv, it is a commercial project, although the codes belongs to me, but the clients would not want me to release it publicly. The good news are most of the numpy operations are supported by libtorch, you can implement them gradually, check the output of python scripts and c++ match with each other or not. It took hours to do, but not hard.