Snow Leopard's Grand Central Dispatch and OpenCL boost video encoding app by 50 percent
It'll take some time before we see the true impact of OpenCL and the newly-open-sourced Grand Central Dispatch on OS X, but we're definitely intrigued by this early report from Christophe Ducommun, developer of MovieGate, who says that shifting his app to use the new tech has increased performance by around 50 percent on the same hardware. Testing on a 2007 2.66GHz quad-core Mac Pro with a GeForce 8800GT, MovieGate MPEG-2 encode speeds went from 104fps under Leopard to 150fps under Snow Leopard, and decoding CPU usage dropped from 165 percent to 70 percent. Now, yes, that's just one app, and most users don't have four cores to play with, but it's still an eye-opening result, and we're definitely hoping it's the start of a trend.
[Via MacRumors]
[Via MacRumors]



















Why isn't the program faster in the first place? A boost by 50 percent either shows Apple is really good at improving programs, or just doesn't make them good in the first place so they can update later.
OpenCL and Grand Central Dispatch make it easier and more efficient to code and handle multi-process applications. So it's not surprising that using them would make a difference in the speed.
Your statement makes no sense at all. Being a Vista user personally, its the same as me commenting on a CUDA enabled encoder on windows being faster than the non-enabled version and saying Microsoft sucks at programming or just wants to sell a faster version later.
In both cases its not the OS maker that made the 3rd party application in the first place, and this article stated that the application author took advantage of both OpenCL and GCD. I presume that was using the Nvidia card in some way in addition to also improving the CPU core usage through GCD.
Because you have to use the new APIs in your code to get the benefits. It's not like Apple widened it's pre-existing highways, it's whole new set of highways!
i'll care when final cut studio and adobe suites are benefiting from this.
even though im a pc guy i work in the industry and know how much of a big deal this will be for rendering HD footage, and personally it cant come soon enough.
Apple is known for Software design optimizations. Not hardware. With Intel, there hardware is quite generic, in fact, its about a year or two late out of the gate but the drivers are optimized and fine tuned. How else could they get Aero like visual effects in there OpenGL accelerated window manager on sub-great hardware like the Intel GMA? Its about the software! This have been true since the days of Quicktime vs Video for Windows debut. VFW was a sorry ass replacement. Anyone doing video then did them on Macs.
Granted, in the WinTel world, its now about throwing more hardware at the problem. Why code it optimized when you can just throw hardware at it? With Apple, you get dated hardware but with drivers that 'just work'. With Windows, you can get the cutting edge hardware with drivers that kinda, sorta work. In the end, it all balances out... sorta.
How about reading the article you dork. The application is NOT made by Apple, however when he used the new framework Apple made things went a whole lot faster! In other words Apple created a cool new technology, shared it with the world, and developers rejoiced! You fool!
"CPU usage dropped from 165 percent" -- It was using 165% of the cpu time? How does that work?
Each core is 100%
You don't want to know, at that cpu usage rate black holes start to appear
On multicore computers each core has 100% available. So on his quad core there is 400% available. At least this is how it shows up as on Macs.
@Weston Myers: Ya, caught that from the replies. Seems kinda misleading. Would make more sense to have any cycle on any 1 core count at 1/4 of a cycle rather than having it go up to 400%.
@ Mark,
So that if one core is using 25% I have to report it as being 1/16th? I don’t think so! Each core has a usage level of 0% to 100% so it’s easiest —and best— to report the usage of each core based on that. As long as you know how many total cores you have it’s all good.
PS: Representing anything technical with fractions is never good.
BSD unices count 100% on a single core as 100% CPU usage.
System V unices count with 100% as the maximum, so 100% of 1 of 4 cores = 25% CPU load.
This is how it is done on Unix / Linux. Apple didn't come up with this on their own.
@Eleazar: LMAO. I love space-time continuum jokes.
You are too stupid to understand the answer.
What I want to know is what percentage of the gains belonged to GCD, and which to OpenCL?
I am going to guess that over 90% was OpenCL, as any decent video encoder is already multiprocessor optimized.
Personally, I don't understand the hoopla around GCD - if you have an app which can legitimately use multiple CPUs, you're at least already using fork or similar.
As a wise computer scientist once told me 'most developers who think they want multithreading actually want multiprocessing'
Actually, 165% usage means it didn't (or couldn't) parallelize very much. I've had things hit >500% on my dual quad-core Mac Pro (same generation as the one in the article but 2x the cores and 2.8GHz cpus). I'd love to see more media encoding stuff parallelize more and get a boost from GCD like this.
Like d889 above, I can't wait to see them update the Final Cut studio bits too.
To add to what UnixSystemsEngineer said.. most linux distriubtion (most unix systems in general IME.. well, osx, linux and solaris at least) tend to default to putting per core usage like this in top and such, or at least the gnu versions of top do. However, many have a toggle or flag to do the other output for those that prefer.. I know the usual versions of top on most linux distros use the I key to toggle referring to it as Irix or Solaris (older solaris) mode.
From 165 percent? Not as impresive as 1,21 gigowatts but still, awesome!
WHAT THE HELL IS A GIGOWATT
Something to do with "Garbage In, Garbage Out".
Multi-core show more than 100% when combined.
That's fucked up, since its like saying Quad Cores are 2 times as powerful as duals, and trust me, my friends E8400 beats my Q600 in ALOT of games.
BUT, I pwn him at video encoding, GTA, multitasking etc
Huh.. hadn't noticed that before buy you're right. Kinda ridiculous...
@Cheesus(Crust): That's because your games aren't using all four cores, and his cores are each clocked faster. A quad core literally is almost four times as powerful as a two core of the same speed, regardless of which benchmarks faster on games.
8800GT?? welcome to 2009 - get a real video card -- ooops I forgot hes using a mac; can't upgrade those things unless you want a card from 1990 in it.
And here we go again...
He's using the card hat came with it back in 2007. If he *really* wanted to upgrade he could always get a ATi 4850 for Mac or whatever. But he programs DVD encoding software, he doesn't really do any 3D designing, nor does he game with it.
As a Vista user I feel the need to point out the fact that when you make these kind of remarks it does nothing but support the image of PC users being that PC guy from the Apple commercials. He clearly stated he was using a 2007 computer and the video card was released in 2007 making it one of the best cards available on any platform at the time. Most tech sites actually recommended the 8800GT over the 8800GTX at the time as a much better buy for the performance.
If he had said something along the lines of using any Intel video card then yeah I would have to agree with you on the welcome to 1990 thing. Since he didn't your post was so outlandishly exaggerated that it rendered your intended negative post about Apple actually come across as a negative post to Apple haters.
Interestingly, Apple's code must be pretty efficient as MovieGate was already multi-threaded and multi-core aware. He switched his code to Apple's GCD & OpenCL, and it improved performance even more.
It may have been poorly multi-threaded before. Plus, using the GPU to help in a process like converting video does seem like it'd result in some pretty major gains.
@Mark
Most multi-threaded code is poorly threaded. I don't know if you know this, but it's really, really hard to write well-threaded code. That's why nobody's bothering to "add multithreading" to their super-complex game engines; you'd basically be rewriting the thing.
@KarlW: I realize that. I'm a Software Engineer ;)
Apple is really into increasing speeds these days. Which I must admit is quite a good thing because all their products seem much more zippier than competitors.
Apple's GCD implementation is a more user/developer friendly version of technology that has been around for a long time in the Windows world. It trades performance for ease of use and standardization. Using lower level parallel programming methodologies will result in a much higher performance gain but they are harder to work with.
GPGPU has been around for longer on Windows too; CUDA and Stream from nVidia and ATI respectively, have been around since 2007. Take a look at the performance gains for video encoding using CUDA:
http://badaboomit.com/?q=node/374
Depending on the source video and resolution, I've seen transcoding times in Badaboom that are just 20% of other CPU only tools or in other words a 500% performance gain :)
@Manoj252
I'm gonna be honest. I had not one clue about anything you just said.
Manoj, Apple's GCD implementation is a more user/developer friendly version of technology that has been around for a long time period, not just in the windows world.
CUDA is and has been available for use on macs as well as windows.
@Manoj
Perhaps, but GPGPU is useless without standardisation. Apple is also taking advantage of relatively new technologies such as LLVM.
GCD is a better thread scheduler. It's centralised, so it can manage multiple threaded applications at once and allocate resources between applications. You can't do that by using low-level threading techniques, since it'd still affect your application only.
"Apple's GCD implementation is a more user/developer friendly version of technology that has been around for a long time in the Windows world."
Wrong. Windows is incapable of managing threads, processes and scheduling in this manner. One of the many reasons it continues to scale so poorly on multiple pros and large amounts of ram.
YOU LIE!
[Pelosi Death Stare]
Watch it Mike. We may have to vote on scolding you, but you could make lots of money with fundraising.
That sounds awesome. Finally multi- cores are getting exploited. Does anyone know how Windows 7 handles multi core's?
By distributing threads and processes across cores... most issues with programs not getting speedups on multicore processors are due to inefficient coding practices and not due to any issues with the underlying architecture.
i hate using anything thats not CUDA encoder now.
even on a 8800gt 64bit Vista, a video only took 8 minutes where it took 30 minutes using CPU based programs
CUDA is much more difficult to use than Open CL which is why hardly anyone uses it.
I hate to burst everyone's bubble but if the developer originally took advantage of multi-threading, correctly, then switching to this should make no difference.
It's actually kind of bad, in my opinion, that it made such a huge impact on his application.
I was going to start going off on what a boob you are for essentially calling the developer too dumb to code however you think is "correct"... but, in a way, you are showing EXACTLY why Apple developed this tech in the first place : To make it easy for developers of all kinds to make their apps as efficient as possible, without much effort.
You still suck though.
How could the developer have originally offloaded CPU cycles to the GPU by "taking advantage of multithreading correctly"? That's what OpenCL provides.