OpenCV in Swift

(Jeremy Howard (Admin)) #21

Thanks @Matthieu, fixed above now.

0 Likes

(Vova Manannikov) #23

@jeremy did you have a chance to look at this PR?
I’ve made a simple function that does jpeg load/rotate/blur/crop/resize/etc. and timed it using opencv compiled with original install script vs. script in PR. I’m seeing 2x improvement in avg speed for my particular function (~45ms vs ~85ms) and also opencv compilation time is 2x faster (~8m vs ~16m).

Timing code:

func test_perf(_ path:String) -> Mat {
    var cvImg = imread(path)
    cvImg = cvtColor(cvImg, nil, ColorConversionCode.COLOR_BGR2RGB)
    let rotMat = getRotationMatrix2D(Size(cvImg.cols / 2, cvImg.rows / 2), 20, 1)
    cvImg = GaussianBlur(cvImg, nil, Size(25, 25))
    cvImg = warpAffine(cvImg, nil, rotMat, Size(cvImg.cols, cvImg.rows))
    cvImg = copyMakeBorder(cvImg, nil, 40, 40, 40, 40, BorderType.BORDER_CONSTANT, RGBA(0, 127, 0, 0))
    cvImg = flip(cvImg, nil, FlipMode.HORIZONTAL)
    let zoomMat = getRotationMatrix2D(Size(cvImg.cols, cvImg.rows / 2), 0, 1)
    cvImg = warpAffine(cvImg, nil, zoomMat, Size(600, 600))
    cvImg = resize(cvImg, nil, Size(300, 200), 0, 0, InterpolationFlag.INTER_AREA)
    return cvImg
}

let imgpath = FileManager.default.currentDirectoryPath + "/SwiftCV/Tests/SwiftCVTests/fixtures/zoom.jpg"
time(repeating:30) {_ = test_perf(imgpath)}

0 Likes

(Jeremy Howard (Admin)) #24

Thanks for the PR! There are 2 install options in the original file - you removed both and replaced with just one. The idea of having both is to allow people to be able to easily switch between a more aggressively optimized version or not. The more aggressive one is commented out - but that’s the one you should compare to.

I already had FAST_MATH and IPP; do you know if CPU_BASELINE is needed? When I checked my config results it seemed to be using all my CPU features AFAICT.

You can remove $(nproc --all) entirely FYI since I believe -j defaults to that.

I don’t think we want TBB (or any other threading) since it seems likely to interfere with Swift threads. You should check your tests are with SetNumThreads(0).

0 Likes

(Vova Manannikov) #25

There was just one option when I made PR :slight_smile: When merging latest changes, I thought you’ve just improved some options and commented out older variant (didn’t think it’s something that people can choose from). Maybe it makes sense to support an argument in the install script (e.g. --agressive) to switch between two? Actually, I’d consider “more aggressive” to be uncommented one because of fast_math.

Hmm, I think fast_math is only used in the uncommented line, and IPP is not used in both? Not sure if it’s enabled by default.

I was following cpu build optimization doc which says CPU_BASELINE sets minimum set of CPU optimizations. Setting it to DETECT or NATIVE runs auto-detect on CPU features and compiles exactly for your CPU and adds -march=native flag for gcc. I have consumer-level CPU (old Intel i7 something), perhaps opencv may use even more optimizations on Intel Xeon architecture (e.g. used in AWS instances), but I only checked on my PC and it still seem to give some boost.

Not exactly, -j defaults to max number of jobs regardless of CPU cores and may run too many jobs than machine can handle (e.g. memory-wise). But actually I don’t think something more than 4 makes a huge difference. What’s make difference in compile time is BUILD_LIST option that compiles only listed modules we need for loading and transforming images.

Disabled TBB and re-checked with SetNumThreads(0) (but note that your original commented out line has TBB enabled). Original commented and uncommented options have similar performance ~85ms (hmm, except that commented-out took 35m to compile vs. 16m uncommented). With options from PR, it’s still slightly ~15% faster (~70ms), but the biggest gain is compilation time of course :slight_smile:

0 Likes