What BARB’s error reveals about the bizarre world of TV ratings

UPDATE: Jack Knight has written a fantastic comment on this post, giving a lot more background to BARB and sampled ratings in general. I highly recommend reading the comments after the post. If you’re interested in the history of audience ratings, I highly recommend reading ‘Rating The Audience: The Business of Media’.

 

BARB – the organisation that measures ratings for UK TV channels – has admitted that there were errors in its tracking system, and as a result some Channel 4 and ITV shows have ended up with false ratings. Broadcast Magazine’s article says that one of the programmes given false ratings was ITV’s X Factor – their highest rating show, and one of the biggest advertising targets in broadcast television:

“The entertainment show originally recorded an overnight audience of 8.96m (33.6%) on ITV1 and ITV1 HD in figures released last Monday, but this has now grown to 9.84m (36.91%) under the revised data. Meanwhile, C4 shows including 999: What’s Your Emergency and Grand Designs have also experienced audience uplifts. However, others have fallen, such as the 22 September episode of The Comedy World Cup, which dropped from 1.8m (7.9%) to 631k (2.76%) on the back of the gaffe.”

How can mistakes like this happen in a multi-billion pound industry reliant on accurate audience metrics? Looking for an answer to that question opens up lots more questions about why such a huge and influential industry relies on relatively crude measuring techniques that haven’t changed much in decades.

TV ratings are measured using mechanical devices that record the presence of viewers in the room when the TV is on, usually by the viewers pressing a button to register that they’ve entered the room. So it really registers presence, rather than attention – the viewer could be reading a newspaper, doing the ironing or using their iphone, but for the sake of the ratings they count as an avid viewer.

Ratings technologies have been refined over time, but the basic concept hasn’t changed since it was invented by Arthur C Nielsen to measure radio audiences in the 1930s. BARB is the UK version of TV ratings, using a panel of 5,100 homes to represent the UK TV viewing public.  So each percentage point in the examples above stand for a measurement sample of just 51 homes. The amount of people in these homes is around 11,300, so each percentage point stands for a maximum of 113 people pressing their buttons when they walk into the living room. It’s often a lot less, as the percentages above are share of the total viewing audience (BARB calls this the ‘universe’) at that time – many BARB panellists might be out of their homes, or might not have the TV on at that time.

If we take the numbers of viewers in the sample above, we can work out the size of the TV viewing universe watching when these errors occured. For example, the 8.9m audience originally reported for X Factor was 33.6% of total viewers that night, so one percentage of that audience is 8.9m/33.6% – 264,880 viewers. This means that the BARB’s estimate for the total UK TV viewing audience on a Saturday night is around 26.4m people, which is 39% of the UK population of 62m people. So we could transfer this to roughly work out that the number of BARB Panellists registering themselves as viewers that night is 39% of 11,300 – 4,407 people.

Still with me? Lets now take the share of X Factor’s reported viewing to work out how many BARB panellists registered themselves as watching that programme. The original share reported was 33.6% of total viewers. We know the total BARB panellists watching TV was 4,407, so the number watching X Factor according to the original report was 33.6% of 4,407 – 1,481 people. So BARB measures 1,481 people watching a TV programme, and extrapolates that number to report an audience rating of 8.96m viewers. No matter how scientific and representational the survey, is remarkable to think that multi-billion pound creative decisions are made on such a small sample size.

Now lets look at the error size. BARB under-represented X Factor’s ratings by 3.31 percentage points, which was a difference of 880,000 viewers in the reported ratings. Again, if we take the total panellists viewing X Factor that night as 1,481 people, 3.31% is 49 people.

An error in measuring 49 people pressing a button when they walk into a room means that one of the UK’s largest media businesses under-represented the performance of their most important programme by 880,000 viewers. Is it just me, or is that completely insane?

8 comments

  1. Rob Whiting

    It is insane, and it’s not just TV ratings where this happens.

    Look at some of the adverts where some sort of survey is quoted. I’ve seen sample sizes of less than 100 on many occasions.

  2. James Cherkoff

    It’s completely insane. In my experience people talk about BARB and then raise their eyebrows knowingly as everyone knows it’s a house of cards. ‘We know the bike is broken but it’s the only bike we’ve got’, a media big cheese once commented to me.

  3. Jack Knight

    Obviously you know nothing about statistics, the only thing insane is this ignorant article. At least I had a laugh reading it, thanks for that.

    • mattlocke

      Thanks for commenting Jack. I’ve been researching how we have measured attention in the media industries for the last few years (and I worked in broadcasting at the BBC and Channel 4 for a decade), so I’d be interested in knowing why you think the article is ignorant. I know that a lot of market research has small sample bases, but the development of very granular metrics around digital media consumption is starting to make BARB look very anachronistic, especially as it is used as the main metric for a multi-billion pound advertising industry.

  4. Jack Knight

    Firstly, this idea that a sample cannot explain the population is the naïve notion that I have issue with. Your article doesn’t mention the idea of granular digital measurement and that is a good point, however, the article simply just makes the asertion that small samples cannot explain the behaviours of large populations, and that is my beef to be honest. As an example, opinion polls are of just 1,000 respondents time and time again are shown to be incredibly accurate for predicting the voting intentions of millions. The market research industry is worth billions of pounds around the world and harness incredible insights into behaviours, consumption and attitudes. It would not exist if it didn’t work and was not accurate, and the majority of this research is based on just small samples explaining the behaviours of millions.

    The BARB panel from what I understand is one of the more complicated and robust pieces of research carried out in this country, a large scale survey of 53,000 randomly selected households in the UK is carried out each year, and that’s just to find out what people have in their homes, and who they are. From this, a panel of 5,100 homes is created that reflects the ownership and demographics of people in the UK and regionally. In producing the data, demographics are weighted to ensure that any imbalances that may occur in sampling do not bias or skew the data. How they get the data seems to be from a clever box, but I ask you this, and this is the key question….what on earth is the alternative? The majority of television viewing, by a country mile is over the air broadcast, so there literally is no return path data (digital measurement) on what you in your home watched last night if it was from an over the air broadcast…without a survey, and a very well designed survey, how can we possibly estimate the viewing? At a recent Mediatel event, Jamie West from Sky said that they get set top box data back from some 100,000 homes (but don’t know whether the TV is on, or who is there) that agree to send their data back to Sky, and this data robustly backs up the data coming from BARB, a sample that is a fraction of theirs. Yes, of course there will be sampling errors, it’s sample research, but it’s the best research there is, until everything is watched via IP (and I’m sure freeview will have something to say about that) I’m not sure what the alternative is….. as an additional aside, why would the mighty all knowing Google be launching their own rival to BARB with a panel of homes even smaller than that of BARB if this methodology was so suspect as you infer, It isn’t, it works, Google know this.

    Back to your point about the granular nature of digital measurement, you are right, there are huge possibilities which is great, and surely it will be part of future measurement services, BARB would be crazy not to embrace it, but lets be frank, it is still an tiny fraction of what is being consumed in total currently (iPlayer requests are dwarfed by actual normal BBC TV consumption) and ultimately, if I’m buying advertising airtime I want to be pretty darn sure I know exactly who is watching so that I can hit my ABC1 Men and not get C2DE Women….something that the world of digital analytics continue to struggle with (who’s at the end of that mouse click?!)

    Lastly, on the website, it says BARB is owned by all the major broadcasters, so if they didn’t think it works….surely they would just change it?

    • mattlocke

      This is a great response, and I think I agree with you on nearly everything – thanks for taking the time to comment. I agree that ampling over-the-air viewing with no return path is always going to be hugely difficult, but in the time I spent in broadcasting there was no real debate about metrics amongst TV commissioning teams. I was commissioning across TV, web and games, and was very frustrated about the paucity of data we were getting from BARB compared to the reports we were getting from games and web projects. This was partly the reason I started looking into the history of BARB/Nielsen, to understand how the metrics industry has evolved and in particular how its sampling methodology works.

      There’s always been controversy about how people are chosen for ratings panels, in particular how panels tend to struggle to represent more transient communities who would not be easy to build a relationship with for a sample or survey. This is probably less of an issue for ad buyers, but if you’re in a public sector broadcaster, its a real problem. For example – Nielsen specifically tries to recruit students, as they’re valuable for advertisers, but under-represented in Nielsen Panels:

      http://www.nielsen.com/us/en/about-us/nielsen-families.html

      We’re in the middle of a significant shift of attention amongst audiences, and as a result, we need to look at how we measure attention and question traditional models. Sampling will always be part of this mix, but I think we need to really understand the models and margins for error in every method we use to measure attention. It could be worse – in the early years of the radio industry Fan Mail was used as a metrics for audience size…

      BTW – this is a fantastic book about the history of the ratings industry – well worth reading if you’re interested:

      http://www.amazon.com/Rating-Audience-The-Business-Media/dp/1849663416

  5. James Cherkoff

    The issue is less about theories of statistical modelling and more that BARB has diminishing credibility in the marketplace.

    This leaves the door open for technology companies to build more effective solutions for big-spending brands that feel nervous about the reliability of BARB.

    In the US, there are plenty of services appearing such as Bluefin, Dish, Simulmedia, TRA, Datalogix that mix data sources, including grocery shopping, to provide media investment information that’s richer than a single source of extrapolated panel data.

    As for your final point Jack. BARB works for the broadcasters because it gives them control of the valuable TV marketplace. Why on earth would they want to change that?

  6. Jenny Powell

    That the news gets 4 million viewers, but soaps get nearer 8 million viewers is unlikely to say the least. I wonder what the truth is?

Leave a comment