TNW at IBC: The BBC Tagging the World Service Archive

TNW at IBC: The BBC describes the automatic tagging of the World Service archive

The International Broadcasting Convention (IBC) took place recently in Amsterdam. Throughout the event, The Next Web streamed live interviews with industry players with the help of LiveU.

The BBC was present at the convention doing some great work with archives. George Wright, Head of Prototyping, BBC Research and Development and his team are working with sixty years worth of World Service archives, that’s around 500 terrabytes of audio.

The aim of the work is to help users find what they need more easily. The historical importance of the collection is considerable and it will become far more useful once it has been properly tagged with data that is searchable.

The archives have almost no metadata, so the team has created a speech recognition system which goes through the archive and adds tags so that users can navigate. Along with the machine recognition, listeners are volunteering to correct and add tags to ensure that it is all correct.

Celebrate King's Day with TNW Conference :tickets:

Use code GEZELLIG40 on your Business, Investor and Startup passes and get 40% off. Offer ends April 29.

The R&D team at the BBC built its speech recognition system on top of existing open source software. The audio from BBC World Service has its own idiosyncracies that make speech recognition a tricky prospect for accuracy. If you have heard past broadcasts from the global radio network, you’ll spot that people spoke English in a quite different way in the 50s in comparison with the language used today. Add this to the difficulties in recognising proper nouns and foreign words and you can see what the software is up against.

Once the material is tagged properly, it can be used in a number of ways for re-broadcasting, primary source research and to add value to future broadcasts. The archive material is mostly programming and features rather than news reports, but the standard of interviews and the historical value of the archive is unquestionable.

Check out the video where Wright describes the process of tagging audio and how the BBC’s history and future has a strong tradition of collaboration and engineering.

You can find more of our video coverage and catch up with the future of broadcasting from IBC here.

Story by Jamillah Knowles

Jamillah is the UK Editor for The Next Web. She's based in London. You can hear her on BBC Radio 5Live's Outriders. Follow on Twitter @jemi (show all) Jamillah is the UK Editor for The Next Web. She's based in London. You can hear her on BBC Radio 5Live's Outriders. Follow on Twitter @jemimah_knight or drop a line to [email protected]

Get the TNW newsletter

Get the most important tech news in your inbox each week.

TNW at IBC: The BBC describes the automatic tagging of the World Service archive

Get the TNW newsletter

Sideloaded app stores are coming to iOS in the EU: Here’s how they’ll work

The EU’s DMA is a new take on tech regulation — but that doesn’t mean it’ll work

Join TNW All Access

Following a year of ‘frustration,’ European tech welcomes 2024 with cautious optimism

EU declares aim to become ‘quantum valley’ of the world