The Tropical Legumes data hub: Reimagining the way researchers interact with their data

It’s been a busy year here at Scriptoria, and we’ve been in our element. Key projects this year have included developing new data architectures to help the Food and Agriculture Organization of the United Nations (FAO) manage major locust outbreaks and the building of flexible data systems to handle decades of African crop pest and disease data – we’ll tell you more about these projects in a later update.

One job we’ve just recently finished, and that we’re very proud of, involved building a data cleaning, warehousing and analytics system to help crop scientists from a series of Bill & Melinda Gates Foundation funded legume programmes to more easily access and analyse the large sets of data that they’ve accumulated.

Commissioned by the US$67 million Tropical Legumes programmes, this work wasn’t simple – it involved piecing together and cleaning a very large number of datasets. Ultimately, achieving this allowed us to make available online one of the largest agriculture-focused socio-economic snapshots ever produced for Tanzania.

As part of this work, the team also had the opportunity to create a similar, if slightly smaller, data system to handle 12 years of data recording the production of improved legume seed by the programme. These improved varieties will be key to helping smallholder farmers better adapt to the changes being brought by climate change – and we’re proud to have ensured that the data will now be available in the cloud to a much larger group of scientists.

All this work has of course revolved around Scriptoria’s core approach of centralising data to improve research efficiency. Data centralisation, combined with our emphasis on ensuring that researchers are able to dive in and immediately begin analysing data (our near instant analysis of data, or NIAD, approach) are just some of the ways we’re helping the programmes we work with to catapult themselves into the big data age.