Components of MIGen
MIGen was designed to be consistent with other well-established MIBBI projects, e.g. MIAME, MIFlowCyt and MISFISHIE [6].
MIGen consists of four sections:
-
1.
Experiment Overview specifies general information that should be provided for the overall experiment, e.g., experiment purpose, study personnel, study centers, etc.
-
2.
Experiment Subjects Description specifies information that should be provided to unambiguously interpret how the experimental subjects were recruited and selected, as well as subject and population characteristics collected during the study.
-
3.
Genotyping Procedure provides guidelines to report how biological samples were collected and processed from experimental subjects and how the raw data was generated. This section includes descriptions of the genomic variants assayed and descriptions of genotyping procedures and technologies.
-
4.
Data Transformation section includes specification for reporting the data processing and analysis methods.
The first version of MIGen can be found on the MIGen website [7]. Like other MIBBI standards, we emphasize that MIGen specifies the minimum information that needs to be reported, but not the order or format of how the required information is provided. Therefore, when using the MIGen standard to guide the reporting of genotyping experiments, it is not necessary to organize the report following MIGen document structure. Moreover, MIGen states that one shall refer to the appropriate minimum information document if one exists for a specific experiment technique involved in the genotyping procedure, e.g., refer to the MIFlowCyt standard if a flow cytometry technique was used; refer to the MIAME standard if a microarray technique was used.
Ontology for Biomedical Investigation application in MIGen
Of the four sections covered in MIGen, the first two sections are essentially applicable to all types of genotyping experiments and were relatively straightforward to develop. In contrast, because of the complex nature of genotyping assay techniques and various data analysis methods employed, the Genotyping Procedure and Data Transformation sections are the major challenge in MIGen development. To capture the common features of the Genotyping Procedure and Data Transformation sections for all genotyping experiments, MIGen applies the “planned process” concept from the Ontology for Biomedical Investigations (OBI) [8]. A planned process is a processual entity that realizes a plan, which is the concretization of a plan specification (ID: “obo:OBI_0000011”) [8]. There are three basic types of planned processes in OBI: biomaterial transformation, assay and data transformation, each of which is a process with three components: input, other participants and output, as illustrated in Figure 1.
The biomaterial transformation process is defined as an event with one or more biomaterials as inputs and outputs. For example, DNA extraction from a blood sample is a biomaterial transformation process, where blood is the input biological material, DNA is the output material and the DNA extraction reagents and devices used in the process are other participants. An assay is a planned process with the objective to produce information about some evaluant (ID: “obo:OBI_0000070”) [8]. It has biological material as input and data as output. For example, a microarray based genotyping assay has DNA as input and raw image data as output, where reagents, instruments and software utilized in the process are other participants. Starting with the raw data generated from the assay, we move to the data transformation processes. A data transformation process is a protocol application that produces output data from input data (ID: “obo:OBI_0200000”) [8].
With the application of OBI concepts in MIGen, genotyping procedure and data analysis components of a genotyping experiment are considered as a sequence of planned processes, each of which can be categorized as a biomaterial transformation, assay, or data transformation process. With this abstractive view, virtually all steps executed in any genotyping experiment can be easily and explicitly specified at a high level, describing what information is required to be reported, without enumerating all the varieties for any given step. For example, MIGen specifies that if the input is a biomaterial, one must provide information on its type, its amount in value-unit pair, and other significant attributes. For detailed specification, please refer to the MIGen documentation.
The application of the OBI ontology within MIGen provides an abstractive framework that is generalized to define the necessary reporting standards for any process in a genotyping experiment. However, due to the complexity of genotyping experiments, there are many steps or processes involved, which can be reported at different levels of granularity depending on the experimenter’s definition of a process. For example, a PCR genotyping experiment can be broken down into sequential processes as following, starting from DNA sample: 1) biomaterial transformation where DNA is the input and assembled PCR reaction mix is the output, 2) biomaterial transformation where DNA sample in the PCR reaction mix is the input, the amplified DNA amplicon is the output of the thermocycler reaction, 3) an assay process where DNA amplicons are the input and the gel image is the output of the electrophoresis assay and 4) a data transformation process where the gel image was analyzed to determine the size of the samples’ amplicon. Alternatively, an experimenter can define the entire chain of processes as a single assay process where DNA (biological material) is the input and the size of each samples’ amplicon (data) is the output.
MIGen does not constrain how the experimenters break down the experimental or data analysis procedures, but rather specifies the list of key information that needs to be included to ensure the unambiguous interpretation, reproduction and reuse of the data.