Want to keep the TNW Conference vibe going?? Tickets for TNW2022 are available now >>

The heart of tech

This article was published on September 17, 2012

    TNW at IBC: The BBC describes the automatic tagging of the World Service archive

    TNW at IBC: The BBC describes the automatic tagging of the World Service archive
    Jamillah Knowles
    Story by

    Jamillah Knowles

    Jamillah is the UK Editor for The Next Web. She's based in London. You can hear her on BBC Radio 5Live's Outriders. Follow on Twitter @jemi Jamillah is the UK Editor for The Next Web. She's based in London. You can hear her on BBC Radio 5Live's Outriders. Follow on Twitter @jemimah_knight or drop a line to [email protected]

    The International Broadcasting Convention (IBC) took place recently in Amsterdam. Throughout the event, The Next Web streamed live interviews with industry players with the help of LiveU.

    The BBC was present at the convention doing some great work with archives. George Wright, Head of Prototyping, BBC Research and Development and his team are working with sixty years worth of World Service archives, that’s around 500 terrabytes of audio.

    The aim of the work is to help users find what they need more easily. The historical importance of the collection is considerable and it will become far more useful once it has been properly tagged with data that is searchable.

    The archives have almost no metadata, so the team has created a speech recognition system which goes through the archive and adds tags so that users can navigate. Along with the machine recognition, listeners are volunteering to correct and add tags to ensure that it is all correct.

    The R&D team at the BBC built its speech recognition system on top of existing open source software. The audio from BBC World Service has its own idiosyncracies that make speech recognition a tricky prospect for accuracy. If you have heard past broadcasts from the global radio network, you’ll spot that people spoke English in a quite different way in the 50s in comparison with the language used today. Add this to the difficulties in recognising proper nouns and foreign words and you can see what the software is up against.

    Once the material is tagged properly, it can be used in a number of ways for re-broadcasting, primary source research and to add value to future broadcasts. The archive material is mostly programming and features rather than news reports, but the standard of interviews and the historical value of the archive is unquestionable.

    Check out the video where Wright describes the process of tagging audio and how the BBC’s history and future has a strong tradition of collaboration and engineering.

    You can find more of our video coverage and catch up with the future of broadcasting from IBC here.