Projects supported by Research Data Alliance and European Open Science Cloud aim to simplify the data discovery process and free researchers from tedious work.
The scientific research process should be as well organised and smooth as customer service in a high-end restaurant. That’s the view of Gavin Chait, Whythawk's lead consultant and data scientist. However, in practice, researchers often end up working with messy spreadsheets that require them to spend hours or even days manually cleaning the data.
Through a project funded by European Open Science Cloud (EOSC) and supported by RDA, Chait is looking to reduce the repetitive work involved in standardising tabular data. He has developed what he describes as an intuitive “no-code” approach for schema-to-schema data transformations. The service, called Whyqd, facilitates the conversion of non-interoperable data into machine-readable formats, underpinned by a well-defined protocol that ensures the consistency and reliability of data. As the project name suggests, Whyqd service does not require users to have any programming skills.
“There is a massive skills shortage when it comes to data curation,” Chait says. “We need to have a process to ensure that data discovery and sharing is easy.” He divides the data science process into four parts. The first part involves selecting the methodology and approach. The second part is data curation, and the last two are analysis and presentation.
“To achieve a valid and replicable outcome, it’s crucial to get the first two steps right,” Chait emphasises. Whyqd is designed to simplify and formalise the source data transformation and validation process, making it more efficient and ensuring unbiased results.
About a year ago, after receiving the EOSC grant for Whyqd development, Chait started a close collaboration with the Research Data Alliance (RDA), a global platform for sharing open data. Leveraging the RDA network, he is currently working on integrating Whyqd with other developments for enabling FAIR data principles.
Once Whyqd gains popularity among researchers, Chait aims to assemble a library of common schema-to-schema transformations to encourage knowledge sharing and collaboration between researchers. Additionally, he aspires to deploy Whyqd in university departments to simplify research processes and teach students about data curation. “Although Whyqd can be seen as a tool, it is also a part of a FAIR process,” he concludes.
Creating interdisciplinary metadata schemas
To facilitate knowledge exchange across borders, RDA is supporting data exchange projects in a range of countries at different stages of development. Magdalena Szuflita-Żurawska, the head of the Scientific and Technical Information Services and Open Science Competence Center at the Gdańsk University of Technology Library, is working on making research more interdisciplinary in Poland, where data-sharing practices are still emerging.
Backed by RDA and EOSC, she is working on a project consisting of two interconnected parts. The first one is an investigation of data-sharing practices in four disciplines - architecture, civil engineering, economics, and natural language processing. The second is the registration of the Gdańsk University of Technology and its services as a research data provider at the EOSC portal and marketplace.
Through four semi-structured interviews, the project explored how researchers in the different disciplines share data, the challenges they encounter, and their requirements for standards, vocabulary, and technical tools. For example, in architecture, data is shared in diverse formats, such as drawings, photos, tables, and 3D models. The interviews addressed the discipline overview, data sharing difficulties, and best practices.
Additionally, the project authors conducted a focus group where participants prepared the data sets using metadata schema from the data repository of the Gdańsk University of Technology. “Our repository is situated in the university, which means that we cannot build a subject repository for specific disciplines,” Szuflita-Żurawska explains. “This is why our challenge is to satisfy researchers from all disciplines, and we want to make our service more flexible and interoperable.”
The project authors investigated what kind of data attributes are important for all disciplines, and what are the gaps in metadata schema. One common issue was the absence of a geolocalisation attribute, which they subsequently added to the metadata schema.
Szuflita-Żurawska emphasises the significance of the national context in data exchange, highlighting that copyright rules and regulations, for instance, vary from country to country. She pointed out that both RDA and EOSC have recently been striving to be more country-specific in their approach.
Szuflita-Żurawska, who joined RDA in 2017, says the platform offers her numerous networking opportunities. “I meet people at plenaries and conferences, and I ask them about their experiences and challenges,” she adds. “While working on this project, I also got support in resolving an issue. Every time I face a problem, I can contact RDA members, and if they do not know the solution, they can connect me with someone who will help.”
Looking ahead, Szuflita-Żurawska and her colleagues aim to expand their investigation to other disciplines and focus more on the data quality. In this regard, they have now applied to FAIRness assessment challenge under the FAIR-IMPACT Project.
More about the programme can be found on the RDA website. To view the Projects funded and Ambassadors see here
This article has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 101017536 and is supported by the EOSC Future through the RDA, Open Call mechanism.