Respiratory Infections Towards Multi-Lingual Pneumonia Research Data Collection Using the Community-Acquired Pneumonia International Cohort Study Database

Background: Although multilingual interfaces are preferred by most users when they have a choice, organizations are often unable to support and troubleshoot problems involving multiple user languages. Software that has been structured with multiple languages and data interlinking considerations early in its development is more likely to be easily maintained. We describe the process of adding multilingual support to the CAPO international Cohort study database using REDCap. Methods: Using Google Translate API we extend the supported Spanish language version of REDCap to the most recent version used by CAPO, 8.1.4. We then translate the English data dictionary for CAPO to Spanish and link the two projects together using REDCap’s hook feature. Results: The Community Acquired Pneumonia Organization database now supports data collection in Spanish for its international collaborators. REDCap’s program hook functionality facilitates both databases staying up to date. When a new case is added to the Spanish project, the case is also added to the English project and vice versa. Conclusions: We describe the implementation of multilingual functionality in a data repository for community-acquired pneumonia and describe how similar projects could be structured using REDCap as an example software environment. DOI: 10.18297/jri/vol3/iss1/2 Received Date: November 26, 2018 Accepted Date: December 20, 2018 https://ir.library.louisville.edu/jri/vol3/iss1/ Affiliations: 1Division of Infectious Diseases, University of Louisville This original article is brought to you for free and open access by ThinkIR: The University of Louisville’s Institutional Repository. It has been accepted for inclusion in The University of Louisville Journal of Respiratory Infections by an authorized editor of ThinkIR. For more information, please contact thinkir@louisville. edu. Recommended Citation: Mattingly, William A.; Buckner, Kimberley A.; and Pena, Senen (2019) “Towards MultiLingual Pneumonia Research Data Collection Using the Community-Acquired Pneumonia International Cohort Study Database,” The University of Louisville Journal of Respiratory Infections: Vol. 3 : Iss. 1 , Article 2. *Correspondence To: William A Mattingly, PhD Work Address: 501 E Broadway, Suite 120 Louisville, KY, United States, 40202 Work Email: bill.mattingly@louisville.edu ORIGINAL RESEARCH


Introduction
Sharing the results of clinical research continues to play a large role in scientific discovery [1], and more and more funders and publishers are encouraging investigators to adopt sustainable means to share the data used along with their results.For ad-hoc studies the structure of data is not crucial beyond the needs of accurate analysis.However, when data sharing and reproducibility of research are important, time and effort must be invested to structure a dataset to be efficiently combined with other datasets.Efficient means of integrating data from different sources are needed to leverage the full benefits of data sharing because new research questions can be examined using integrated, multi-source data.
Studies that are international in scope can provide large, diverse samples of patient populations, but require investment in translation, and the appropriate ontological structuring of data.In this paper we describe the process of extending the Community Acquired Pneumonia Organization (CAPO) clinical research database to support data collection for multiple languages.After English, Spanish is the most common spoken language of members of the Community Acquired Pneumonia Organization with almost 40% of member sites being in Spanish speaking countries.Starting with Spanish we establish a general multi-language workflow for data entry into the CAPO database, with the eventual goal of supporting all CAPO member languages.

Methods
The study database for CAPO currently resides in a web-hosted instance of the REDCap electronic data capture software.REDCap is a secure web application for building and managing online surveys and databases.Members of CAPO access the REDCap instance remotely from its web URL https:// id.research.louisville.edu/capousing a web browser and their assigned user credentials.Demographic and clinical history information can then be entered for new cases in the database.

System Language
REDCap was designed to support a multilingual setup, with two levels: system-level and project-level.System-level languages are stored as files in REDCap's web directory.The default is English, but other files can be added from the REDCap community consortium page.If the default language is changed in REDCap's settings, then the entire REDCap instance, including technical configuration pages, will be in the new language.Alternatively, project-level languages can be changed, meaning the home page of REDCap and configuration pages will be in the default language, but a project can be configured to use a language different from the default system-level language.While it is not currently possible for REDCap users to have different language preferences in the same project, we describe how to duplicate and link a project to allow similar functionality.
REDCap language files store each segment of text instructions for every page of the software, one per line in a text document.The REDCap software is consistently updated with new features, and these features are pushed out to REDCap users in new versions of the software.All text accompanying new software features is added to the structured REDCap English language file.REDCap users can then translate the English language file and add the translation to their REDCap language files.All REDCap users are encouraged to share their translation files with other members on the REDCap community page.There are currently nine translations that have been tested and approved by the REDCap consortium, and these are listed in Table 2.
The CAPO REDCap database is currently using version 8.1.4 of REDCap.After installing the latest Spanish translation from the REDCap consortium, much of the descriptive text for REDCap will correctly display, but all new features added since version 6.4.1 will still have English text.To complete the translation, all text for those features must be added to REDCap's Spanish language file.We perform the language translation for the remaining phrases with a first pass translation using Google's translation application programmer interface (API) and a second pass with Spanish speaking collaborators.

Project Language
REDCap supports assigning a translated language file to a specific project.This allows a multilingual setup by having one REDCap project per language for a multilingual database or repository.For the CAPO database, we duplicate the CAPO English REDCap project, set the new project to the Spanish language, and link the two projects so that data entered in the English project will populate in the Spanish project and vice versa.A diagram for this process is shown in Figure 1.

Data Dictionary Changes
Changing the REDCap system and project languages will not affect any data entered by the user when designing a case report form (CRF) or entering data.Since the CAPO database and its associated variable labels were created in English, these must also be translated for a project with a different language.It is important for variable names to remain identical across projects, as this allows data analysis and reporting to be consistent for the entire database.
Each language needs to have translated versions of the Field Label column, the Choices Labels column and the Field Note column.Each of these columns contains text that is displayed on screen during data collection to assist in the collection of accurate information.Table 3 shows a comparison of the two languages for a selection of variables from the CAPO data dictionary.

Género
Figure 1 The data collection and analysis pipeline for multilingual projects.
A REDCap system with two system supported languages, English and Spanish, is shown.The master project, English in this case, is used to perform analysis.All other projects are used for data collection.Projects are synchronized at the time of data entry.

Data Synchronization
As mentioned above, user-specific languages are not yet supported in REDCap, but two separate projects can be linked allowing users preferring English data entry to be assigned to the English project, and users preferring Spanish data entry to be assigned to the linked Spanish project.Keeping data consistent requires a synchronization operation be performed between the two projects.This is supported in CAPO by means of a REDCap hook created by the authors.
It is often desirable to link records in REDCap projects to similar information that may exist in other projects on the same REDCap instance.For example, two studies may be performed on an overlapping patient population.Following the completion of the studies, it may be beneficial to study the overlap or to consider a third study which could involve the same population.
In the past this has been managed using REDCap hooks.
A REDCap hook is software code saved on the REDCap instance that can perform an action under certain circumstances.These circumstances can include saving a record, opening a data entry form, or completing a survey.After creating a hook, the custom action is performed whenever one of the above conditions occurs.For linking the English and Spanish language CAPO projects, the hook condition is whenever a record is saved.
For the English project hook the action performed is to save all record information from the current record to the Spanish project data fields.The Spanish language hook is identical except that it saves information entered to the English project.

Organisms and Medications
Another section of the CAPO database requiring translation is the list of organisms commonly associated with CAP and medications prescribed.These are stored in the CAPO REDCap instance separate from the CAPO project.The names in these lists will generally be the same across languages, but the lists can be extended to support localized names of organisms and medications.In these instances, the numeric codes would remain the same and serve as the link between translations.Tables 4 and 5 show a selection of medications and organisms in CAPO, respectively, along with associated codes in the RXNORM and SNOMED CT Clinical Terms Ontologies.

Results
The Community Acquired Pneumonia Organization database now supports data collection in Spanish for its international collaborators, as shown in Figure 2. REDCap's program hook functionality facilitates both databases staying up to date.When a new case is added to the Spanish project, the case is also added to the English REDCap CAPO project and vice versa.
Several beneficial changes to the database occurred because of the addition of multilingual functionality.An incorrect antibiotic entry of Defixime in the CAPO medications table was discovered when there were no corresponding entries in either the RXNORM or SNOMED CT databases.We determined this was a duplicate of Cefixime that had been entered as a typo.
Several potentially ambiguous entries for organisms were made more specific by linking them to the standardized organisms listed in SNOMED CT.Pseudomonas pseudomallei was linked to the more standardized Burkholderia pseudomallei (116399000) and Rhinovirus/Enterovirus was linked to Human enterovirus (69239002).All new additions to the CAPO organisms and medications tables are entered in their own REDCap projects and verified with SNOMED CT and RXNORM lookups before being finalized.This prevents the usage of ambiguous or nonstandard names.

Discussion
Besides the organizational challenges in managing multisite studies, differences in languages pose a problem to the consistency of collected data.Case report forms that have been designed to collect accurate information, must be carefully translated so as not to lose any validity.A study found that the translation of parenting programs to help Hispanic mothers was successful when translated from English to Spanish, but cultural adaptation also played a large role, which is more complex and time consuming than simple translation.(2) A study on the interpreting practices in multilingual healthcare found that the availability of professional interpreters was very limited and needed to be planned far in advance.The ability of an individual to interpret language correctly is still highly related to their socio-economic status and related cultural factors.
(3) One study showed differing attitudes towards multilingual patient engagement at the organizational and patient level.Patients are generally enthusiastic whereas organizational and institutional members are concerned about cost and maintenance of complex systems [4].
The translation of focused data collection instruments may be more successful across cultural divides within a language.The Stroke Impact Scale (SIS) was translated to Portuguese and showed high consistency across different interviewers.[5].Surveys and data collection concepts which map to visual concepts are also easier to translate, and in some cases do not require translation to be effective [6].Systems that use pictograms and visual icons can successfully bridge language barriers and sometimes achieve a language-neutral solution for health applications [7].Systems which provide online multilingual data collection and patient care are becoming more common.
In addition to effort in translating data collection forms, multilingual research requires adjustment in the structure of data.The CDISC Operation Data Model (ODM) was implemented as a way for physician researchers to upload, comment, rate and download medical data models [8].The system includes forms from clinical and research systems and supports multi-lingual data files.A later examination of the ODM model has found that it is being used as a metadata standard for many use cases including Electronic Data Capture and Electronic Medical Records, data collection, data tabulation and analysis, and data archival [9].Systems needing more features than the ODM model can be developed keeping many of the fundamental metadata design principles of multi-lingual and flexible communication in mind.Previous efforts to this end were successful in developing a shared research environment for radiotherapy clinical data mining [10].The Pediatric Oncology Network Database (POND4KIDS) was designed to be a multilingual hematology/oncology database for use in countries with limited resources [11].The database recently reported over 1000 users in 66 different countries [12].
A recent study suggests automated translation services, like Google Translate, may begin to be efficient tools for medical translation, as they were able to extract consistent translations and data from Latin-based language manuscripts [13].Google Translate was also used to provide an effective translation of the Gene Ontology from English to German [14].The Global Public Health Intelligence Network (GPHIN) was designed to process data from news sources across the globe and uses automated translation across nine languages to provide aggregate information regarding disease outbreaks [15].Translation of a nutrition survey in Switzerland from German to French and Italian, used a software tool called GloboDiet to design multilingual surveys [16].
Medical ontologies provide ways to organize both the conceptualization and sharing of data among different researchers.The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is a widely used ontology [17] and can be used as a structure to link multilingual medical terms.The Wikimedia Foundation maintains a large set of multilingual data in the form of articles across multiple Wikipedia projects.Garcia et al, previously used the interlanguage links of Wikipedia to apply concept mapping in their efforts to develop a classifier for multilingual biomedical documents [18].
Although people will prefer to enter data or interact with electronic systems in their preferred language, the cost associated with supporting more than one language will continue to be a barrier for adoption of multilingual systems.Systems like REDCap, which are available free of charge to non-profit researchers, struggle to keep up with maintaining multiple languages and depend on volunteer submissions to stay up to date.
One clear advantage to multilingual capability is the emphasis placed on structure.When a new feature is proposed for REDCap time and thought is invested in understanding the different effects a new feature will have.Is there already a similar feature that performs the same function?Where will the button for the feature be placed?Will there be more than one way to access it?Will it conflict with any other current features?This same approach to structure was taken with REDCap's language feature list and systems like CDISC ODM and WikiData.This places emphasis on how concepts will link together and interact, which is considered by some to be the most difficult aspect of translation.
It is important to invest time early in the design of data repositories and data warehouses to ensure they will support future features like data interlinking and multilingualism.

Limitations
Currently the synchronization between databases is triggered when data collection is done directly into REDCap fields and not when data is imported using REDCap's data import tool or the REDCap API.This is a limitation of REDCap's hook feature.Future synchronization schemes will involve a centralized data project or data warehouse that periodically pulls records from language specific projects to catch all data contribution methods.

Conclusion
We describe the implementation of multilingual functionality in a data repository for community-acquired pneumonia and describe how similar projects could be structured using REDCap as an example software environment.We believe there is a strong future in cross-language collaborative studies with tools like REDCap and Google translate that would extend the reach of multi-site clinical and observational trials.
Copyright: © 2019 The author(s).This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Figure 2
Figure2The project level user interface for the database, shown above is the English version with the corresponding interface for the Spanish translation below.

Table 1
This original article is brought to you for free and open access by ThinkIR: The University of Louisville's Institutional Repository.It has been accepted for inclusion in The University of Louisville Journal of Respiratory Infections by an authorized editor of ThinkIR.For more information, please contact thinkir@louisville.edu.Glossary of terms *Correspondence To: William A Mattingly, PhD Work Address: 501 E Broadway, Suite 120 Louisville, KY, United States, 40202 Work Email: bill.mattingly@louisville.eduORIGINAL RESEARCH

Table 2
List of supported languages for REDCap and their current supported REDcap software version

Table 3
Sample of labels and choices for variables in CAPO data dictionary shown in English and Spanish

Table 5
Sample list of organisms in CAPO with associated SNOMED CT codes 3 ULJRI Vol 3 (1) 2019

Table 4
Sample list of medications in CAPO with associated RxNORM and SNOMED CT codes