TNW at IBC: The BBC describes the automatic tagging of the World Service archive

TNW at IBC: The BBC describes the automatic tagging of the World Service archive

The International Broadcasting Convention (IBC) took place recently in Amsterdam. Throughout the event, The Next Web streamed live interviews with industry players with the help of LiveU.

The BBC was present at the convention doing some great work with archives. George Wright, Head of Prototyping, BBC Research and Development and his team are working with sixty years worth of World Service archives, that’s around 500 terrabytes of audio.

The aim of the work is to help users find what they need more easily. The historical importance of the collection is considerable and it will become far more useful once it has been properly tagged with data that is searchable.

The archives have almost no metadata, so the team has created a speech recognition system which goes through the archive and adds tags so that users can navigate. Along with the machine recognition, listeners are volunteering to correct and add tags to ensure that it is all correct.

The R&D team at the BBC built its speech recognition system on top of existing open source software. The audio from BBC World Service has its own idiosyncracies that make speech recognition a tricky prospect for accuracy. If you have heard past broadcasts from the global radio network, you’ll spot that people spoke English in a quite different way in the 50s in comparison with the language used today. Add this to the difficulties in recognising proper nouns and foreign words and you can see what the software is up against.

Once the material is tagged properly, it can be used in a number of ways for re-broadcasting, primary source research and to add value to future broadcasts. The archive material is mostly programming and features rather than news reports, but the standard of interviews and the historical value of the archive is unquestionable.

Check out the video where Wright describes the process of tagging audio and how the BBC’s history and future has a strong tradition of collaboration and engineering.

You can find more of our video coverage and catch up with the future of broadcasting from IBC here.

Read next: Apple sees 2 million iPhone 5 orders in 24 hours, doubles previous record held by iPhone 4S

Corona coverage

Read our daily coverage on how the tech industry is responding to the coronavirus and subscribe to our weekly newsletter Coronavirus in Context.

For tips and tricks on working remotely, check out our Growth Quarters articles here or follow us on Twitter.