Advertisement

How to tweak ShowAnalyzer to 100 percent commercial detection accuracy

If you you've ever used any automatic commercial skipping technology you know that it isn't 100 percent accurate. We remember our old ReplayTV 4080 had a button on the remote just to disable the feature when it incorrectly identified part of our favorite show as a commercial. You'd hit it and then rewind, so while nothing was lost, some are so annoyed that they'd just assume skip them the old fashioned way. Well unlike the ReplayTV, when you use ShowAnalyzer to detect commercials on your HTPC, you can tweak your settings to 100% accuracy. Until recently this wasn't actually possible because while ShowAnalyzer has been tweakable for a while, one set of settings would fix one show, and make another worse. Well in the latest beta you can set rules that allows you to tweak the settings per channel or even per show. So while there is nothing automatic about gaining 100 percent accuracy, it is actually kind of fun to try. You can even share your specific settings with others, you know, to show everyone how smart you are.



The basics


In order to tweak the settings, it is important to understand the basics of the very complex process that is automatic commercial detection (if you don't have ShowAnalzyer installed already see our How to Automatically skip commercials in Media Center feature). Basically every show on commercial TV is a series of blocks of time which are divided by prolonged black frames that have no audio. So the real key is to locate the dividers that form the blocks. Then you look at different characteristics of those blocks to try to determine which are commercials and which are part of the show. The dividers aren't usually hard to detect since normally there isn't a prolonged black frame without sound during the show. But sometimes the audio is out of sync or various other problems that can throw the whole thing off. Detecting which blocks are which is much easier than finding dividers because things like network logos -- see there is something good about them -- are usually only present during the show. But logos aren't the only way to determine if a block is a commercial, other tell tale signs are the number of audio channels and the length of the blocks. The longer blocks are almost always part of the show and the total length of commercial blocks are almost always devisable by 30 seconds (30, 60, 90, 120, 150 etc). Of course if all these were definitives instead of probabilities, we wouldn't need to know how to tweak these settings. The good news is that these exceptions are usually either per network or per show, and now we can configure profiles and apply the right settings when appropriate.

We're going to take two approaches here. First we'll give a practical example by creating a sample profile for a 30 minute comedy that always seems to throw the default settings for a loop. This is a good example because it includes two areas that are the easy to fix. The first is the initial segment that is never a commercial and the other is the last joke at the end of the show that is usually falsely detected as part of the last commercial. Then we'll run through what each of the settings do so that you'll be able to break beyond the basics -- a special thanks to the author of ShowAnalyzer, Jere Jones, for spending the time to explain what all these settings mean.

The example


First launch the ShowAnalyzer User Interface then go to Tools and the select Settings. By default you'll be looking at the default Global Settings, next lets hit New Profile on the botton left and type a name for your profile. Then click ok and select

Create New Profile

the profile you created on the left and you'll have two tabs on the right with settings. If you want to override the defaults you click the check box and then adjust. There are lots of options here but for the most part you'll be adjusting just a few under the Analyzer tab. We'll start with the easiest to wrap your head around, so scroll down the AutoMark section and then under Beginning check Mark As and Length. Since the beginning of Rules of Engagement always seems to be detected as a commercial, we'll auto mark the beginning to always be a part of the show. Start by watching the show and noting how long it is before the first commercial break. For this particular show it is 100 seconds, so we'll set Mark As to Show and Length to 100 seconds, now that block will always be marked as part of the show.


Now this show also has one last joke at the end which usually gets detected as a commercial. So we scroll to the Smoother section and under Show we check the Override box next to the Minimum setting. This is the minimum length of any block that is part of the show. We watch the show and determine that the last joke is the shortest show segment at exactly one minute. So we set the Show Minimum to 59 seconds just to be safe.


Now we can hit Save Changes and click on Global Settings on the top left. We have to set a rule so that our new profile is used when scanning this show. So click the Add button and on the first drop down select the profile we just created. Then under Condition 1 make it say "If file name contains Rules of Engagement" and finally check the Enabled check box and hit save.

Now close the configuration box and choose the problem file via the Analyze File option under the File menu. Let ShowAnalyzer rescan the file and confirm that your profile works. If it doesn't seem to use your new settings at all, then check the log file that shares the same name as your recording that is stored in the same directory and search for rules and confirm your rule worked as desired. Once you get it working you can even reuse the profile for other shows by editing or adding rules to use the profile. This is obviously a simple example, but very effective for certain shows.


The nitty gritty settings


Now for those who really want to get deep into the nitty gritty settings, here is an explanation of each setting and how to adjust it. Unfortunately some of these settings require some knowledge of SQL to access the data needed to really tweak them. The ShowAnalyzer road map includes a companion app called SchoolHouse that will read this data for you and make it easier, but while we all continue to wait for that, you can mess around with them through trial and error or learn SQL. (These are listed in the order they show up in ShowAnalyzer, not in the order of usefulness.)

Process


This is how much processing power ShowAnalyzer can use. The idle option means it will only use the CPU when nothing else is using it. Above Normal means it'll get all the CPU's time, not a good idea. Mark As Background Task tells Vista and Windows 7 that this process should have the lowest priority, even with disk IO. Unless it's a dedicated analyzer box, leave these settings alone.

Audio

These are kind of useless, but can help detect the divider since it is supposed to be silent. The problem is that what should be considered silent varies greatly. Finding a blank video frame is easy, with the exception of noise, but audio is different. What ShowAnalyzer does is to plot the audio level from an entire show. You'd expect a bell cerve, where most parts are the show are medium, some quiet, and some loud, but what really happens is a bell curve with lump on the low end. The idea is to find the hump. Take the bottom 1 or 2 percent and those are likely to be the silent frame, and thus the dividers. This is what Histogram and Volume is all about. Bottom line with Audio is don't mess with it unless you've already tried everything else without success, but odds are you never will.

Sync error is how far apart, in seconds, a blank frame and a quiet point can be and still be considered related. If you look in the ShowAnalyzer database and see the luma dropping real low and the audio is dropping real close, but in the divider table there is no entry for it, it could be because they are half a second apart. If so, then you can increase the sync error. In SchoolHouse you'll be able to easily see the audio and video line dipping at different points.

Video

The fact that long black dividers exist before and after commercial breaks is what makes commercial detection work, but how black is black? This is the main setting when it comes to detecting dividers and in fact the Average Maximum Luma Divider settings is the core of commercial detection. The default 70 (range is 0-256) is obviously a good starting point or ShowAnalyzer wouldn't work at all out of the box, but some sources have ligher and darker blacks than others. Basically ShowAnalyzer looks for these black frames in shows and from there tries to determine if a new scene is coming in Law & Order or if a commercial break has started. The bad news is that it can be really hard to know what to set this to, with the exception of some trial and error. What would be really nice is if there was an easy way to see what the Luma level of your recordings were. If you knew the average of your actual recordings and the standard deviation (variance), this setting would actually set itself -- the variance has a range of 0 to 25. A good value for Variance is 2, but if there is lots of noise, 11 is safe. Sadly the fastest way to determine the luma is to open a recording in VideoReDo and get a time code for a black delimiter before a commercial and then use a SQL viewer to look at the history file and it'll tell you the luma and deviation. So yeah, not easy at all. This is another case where SchoolHouse will come in handy and until then it is trial and error.

Classifier

So once ShowAnalzyer uses the audio and video settings to break up the show into blocks, it uses the classifier to determine which blocks are part of the show you want to watch and which are commercials. This is a weighted system so there is a Base and a number of factors including the Transition, Length and the Channel Count (audio channels that is). Every block gets a score assigned to it based on the various factors, a block with a 1 is a commercial and a block with a 0 is always the show.

These blocks will almost never actually get a perfect score of 0 or 1, so the fact that the shows and commercials usually alternate will help determine which is which. So if ShowAnalzyer was struggling to determine if a block was a commercial it would use the value of the previous blocks to guess, these are the Transitions from a show to a commercial. So the difference between the score of a show block and a commercial block is the Transition Threshold. Lets take the default .77 Show To Commercial Transition Threshold as an example. If the 3rd block has a score .11 and the 4th had a score of .89 then block four is marked as a commercial, if it was .87 then it wouldn't be. If you set the Show To Commercial threshold higher than the Commercial To Show, it is more likely to not skip a commercial then it is to skip through part of your show.

The Base setting is the starting point for the score of each block and probably isn't something that should be adjusted. Like each of the factors, the Base has a Confidence and a Weight. Basically you start with a base value and then you add characteristics and then it becomes a weighted average. The Confidence is on a scale of 0 to 1, this is the block's score we talked about before. The Weight is how important that one factor is compared to the others and is on a scale of 1 to 100.

The Length settings determines how likely a block is to be a commercial based on its length. If you are 100 percent confident that every 30 second block is a commercial then you'd set the Confidence to 1 and the Weight to 100. But it isn't this simple because not every commercial will be the same length and then there are those short blocks that aren't part of the show or a commercial, depending on your perspective -- a good example of a pseudocommercial is the numbers to text to vote on American Idol. That being said, you can easily set hard minimums and maximums for the length of blocks in the Smoother section which we'll go over next. The Maximum Individual setting is the longest a block will be and still be contention to be a commercial -- remember shows are only about 15 percent commercials. The Matches settings help determine which short segments are commercials or pseudocommercials, so a 30 second segment is divisible by 15 so it is a match, in addition the defaults are 10, 20 and 25 seconds.

Tolerance is how much give and take should be entered into the equation when considering the length. There are 29.9 frames per second so there is no number of frames that'll give you exactly 30 seconds. So this setting is how much of a tolerance all the other length settings have. The default is 1 and the minimum you'll want to use here is .5. If you set it as high as 5 seconds, just about everything will be set as a commercial so be careful with this one. In fact it is not something you'd typical adjust, but does come in handy for some specific issues.

Channel Count is the number of audio channels. If the show is in 5.1 and there are blocks that are stereo then the confidence level is .99 that the block is a commercial, since the odds of a show switching the number audio tracks is very unlikely. The same goes if the show is in stereo and one block is 5.1.

Smoother

This is where you can easily set the maximum and minimum length of a commercial and the minimum length of a show block. The defaults are pretty good since there is usually never a commercial break shorter than 60 seconds or longer than 5 minutes. The Show setting might be adjusted depending on which shows you watch, but 29 seconds is a good starting point. This is a very useful setting thanks to profiles, which is why we used it in our example above.


Logo Detection

We all hate them but network logos are a fact of life and the good news is that they help determine which blocks are commercials and which aren't. But in addition to network logos, are crawls and other garbage on the screen can help or hurt as well.

The Border settings (top, bottom, left, right) are the percentage of the screen that is ignored when looking for logos. This comes in handy if the network likes to run crawls on the screen, think ESPN. If the crawl is preventing logo detection, you can increase this number and ShowAnalyzer will ignore the crawl and find the logo. This will also help speed up the time to analyze files as ShowAnalyzer doesn't have to examine those parts. Some logos are only on the screen during the show, while others are only on the screen during commercials. Then at other times there aren't any logos at all.

The Minimum and Maximum Check Spacing determines how often ShowAnalyzer stops to decode a frame and check for a logo. Increasing the maximum and minimum settings speeds up the process, but increases your chance of missing a logo that isn't on the screen all the time. Because the logos aren't on the screen the whole time the Missing Logo Forgiveness Time is how long a logo can disappear before the show is considered to not have a logo. You want to keep it sub commercial length, which is 15 seconds.

The Missing Logo Search Time is the amount of time ShowAnalyzer keeps searching for a logo it has detected before it assumes the logo it previously found is useless. Typically you'd set this to no longer than a commercial break, so it doesn't start searching just because a comercial is on -- searching means more processing power and disk IO so this can really extend the time it takes to analyze a show if set wrong.

The Confidence and the Weight of the three types of logos can be assigned just like we did in the length section, which of course means it gets added to a block's total score. So if you had a network that always, and we mean always, showed a logo during the show and never during a commercial, you could crank up the confidence on the Show Logo and the No Logo and get much better results.

AutoMark


This allows you to manually set the first or last segment as a commercial or part of a show. Say if you wanted ShowAnalyzer to never mark the first 30 seconds of a show as a commercial (so it doesn't skip faster than you can disable AutoSkip) you'd set Mark As to Beginning of Show and the Length to 30. Or if you normally record an extra 3 minutes of each show, you might want to set the End as Commercial and the Length to 180. This is actually even more useful than Smoother and also included in our example above.

Conclusion


As you've already figured out, automatically detecting commercials is not an easy thing to do and while tweaking your way to 100 precent accuracy is not for the faint of heart; it is technically possible thanks to the new profiles feature in ShowAnalyzer. And while many of these settings aren't terribly useful until SchoolHouse is released, other settings like AutoMark and Smoother are very simple and effective at correcting specific detection problems.