Siobhan Green, the founder of Sonjara attended the Technology Salon exploring How Can USAID Development Partners Implement IATI? and was inspired to define 8 steps to publish Open Data to prepare for the day when we all will need to be compliant with the International Aid Transparency Initiative standards. Here is her list:
1. Stop the bleeding
Organizations often get caught up in backlog of their information trapped in word and PDFs, and freak out due to the cost of migrating these into an open data format. Often this results in firms saying nothing can be done, even for data they are starting to capture NOW.
Our strongest recommendation is to identify where data is currently being captured in these non-open data formats and address that immediately. For example, if your staff submit reports in word, creating a form that captures IATI (International Aid Transparency Initiative) type meta data, and then attach the word document to that form already starts you down the path of IATI data.
Find out where your raw data for evaluations and analysis are housed and make sure the structured data is not lost (many an excel spreadsheet or database are converted to a PDF or website, and then the source data is lost, leaving only the aggregated data).
2. Look for existing standard structures
In international development, for example, many donors are using IATI as the structure for project descriptions. Many fields also have standard taxonomies or data structures for geolocation. Look around at other data sets that are open and borrow from them as appropriate and compare with what you are already capturing.
3. Design rather than retrofit
It is MUCH cheaper to design for open data before you capture it than to retrofit your data. In addition to making sure your data is structured, IATI also helps identify data points you may not be capturing in a structured way, such as geolocation or sector. Applying this information after the fact can be very hard if those who have the knowledge of the project are no longer available.
4. Can aggregate up, but cannot disaggregate down
Figure out the lowest reporting level needed, (especially geolocation) and start there. Disaggregate by gender whenever feasible/appropriate, and by other factors.
5. Structure is always better than unstructured
A pattern I see a lot is teams build structured data for analysis, like a spreadsheet or database, and then export that data (or a summary of it) into a PDF for the final report – and then the raw data disappears. What has happened is data has gone from structured (Database) to unstructured (PDF), which is a real loss for open data/interoperability.
For data management purposes, it is significantly easier, cheaper, faster to migrate data from one structure to another structure than unstructured to structured. For that reason, if your data is structured, protect it! Keep the raw data somewhere, and don’t rely on the PDF as the archive of the data.
6. Structure is better than unstructured, Part II
Structured data allows you 1000s more options than unstructured data. With structured data, you can aggregate up/summarize. You can slice and dice the data to compare within it. you can pull it together with other data sets and compare/contrast/analyze. You can use it to feed into dashboards and automatic graphic and tracking tools, and even export it to PDF.
7. Prioritize data, not platform
Technology is changing so rapidly that software platforms go obsolete pretty quickly. If you are lucky, you will have five years with your platform – maybe ten with an entire platform approach. But your data should outlast your platform. It is incredibly important that whatever platform you select is open-data friendly (even if none of the data ever is shared outside your organization). Open data friendly means that it is migrate into another platforms and can be aggregated and mixed with other data.
Sonjara’s rapid prototyping CMS and web application framework, Fakoli, is open data friendly by design. We built it so that it respects the data model, meaning it is easy to pull the data out of the system in a variety of formats.
8. Politics will thwart you before the technology does
The major barriers we have seen to open data structures are battles over taxonomy, right of access, concerns over privacy/security, and a concern about cost. These are all important questions that need to be tackled but none of them are deal-breakers. Not all data needs to be open to the world, or to even everyone in the same organization. Taxonomies need to be fluid enough to capture emerging trends, but meaningful enough to be used. And open data does not have to cost a fortune, if done thoughtfully.
Now that you’ve read this far, sign up to get invited to future events.
Go to Source
Tags: Unstructured data
, Data management
, international development