The BBC was present at the convention doing some great work with archives. George Wright, Head of Prototyping, BBC Research and Development and his team are working with sixty years worth of World Service archives, that’s around 500 terrabytes of audio.
The aim of the work is to help users find what they need more easily. The historical importance of the collection is considerable and it will become far more useful once it has been properly tagged with data that is searchable.
The archives have almost no metadata, so the team has created a speech recognition system which goes through the archive and adds tags so that users can navigate. Along with the machine recognition, listeners are volunteering to correct and add tags to ensure that it is all correct.
The R&D team at the BBC built its speech recognition system on top of existing open source software. The audio from BBC World Service has its own idiosyncracies that make speech recognition a tricky prospect for accuracy. If you have heard past broadcasts from the global radio network, you’ll spot that people spoke English in a quite different way in the 50s in comparison with the language used today. Add this to the difficulties in recognising proper nouns and foreign words and you can see what the software is up against.
Once the material is tagged properly, it can be used in a number of ways for re-broadcasting, primary source research and to add value to future broadcasts. The archive material is mostly programming and features rather than news reports, but the standard of interviews and the historical value of the archive is unquestionable.
Check out the video where Wright describes the process of tagging audio and how the BBC’s history and future has a strong tradition of collaboration and engineering.
You can find more of our video coverage and catch up with the future of broadcasting from IBC here.