Identification of Molecular Markers Associated with COPD in Non-Smokers and Smokers: A Bioinformatics Analysis

Background: Even though the proportional burden of chronic obstructive pulmonary disease (COPD) among never-smokers is significant in both developing and developed nations, accounting for around 30% of all COPD in the community, there is little awareness of the prevalence of COPD in this population. Understanding the molecular processes that underlie COPD in nonsmokers is essential. Methods: A dataset (GSE146560) was acquired from the Gene Expression Omnibus (GEO). The limma and clusterProfiler software tools were used to identify differentially expressed genes (DEGs) and conduct a functional enrichment analysis respectively. Results: In all, 10,583 DEGs were found, of which 1,065 were up-regulated and 9,518 were down-regulated. Kyoto Ency-clopedia of Genes and Genomes (KEGG) pathways such as neuroactive ligand-receptor interaction, taste transduction, maturity onset diabetes of the young, Hippo signaling pathway, insulin secretion, dilated cardiomyopathy, morphine addiction


Introduction
Although the proportional burden of chronic obstructive pulmonary disease (COPD) in never-smokers is significant in both developing and industrialized nations, accounting for roughly 30% [1][2][3][4][5] of all COPD in the population, the prevalence of COPD in never-smokers is not commonly recognized.
There is a dearth of data from population-based research on the risk variables connected to spirometrically proven COPD in never-smokers [6,7], and further information is required. [8] Exposures to risk factors may vary by gender. [9] Exposure to biomass fuel has repeatedly been associated with chronic bronchitis and spirometrically defined COPD in women in underdeveloped nations. [10] There may be distinct clinical and gender-related risk exposure profiles between smoking and non-smoking COPD, according to limited data from population-based studies. [11] Uncertainty exists as to whether never-smokers with COPD have the same characteristics as those who smoke. Only a small number of studies simultaneously evaluated COPD in never-and ever-smokers. [1] Such an assessment would make it easier to compare COPD in never-smokers with COPD in current smokers [8] and would shed light on potential phenotypic variations between tobacco-and non-tobacco-related COPD at the community level. To fill this knowledge gap on COPD in never-smokers, population-based studies using spirometry are required, including gender and a systematic comparison of never-smokers and ever-smokers.
Spirometry is used to assess the level of airflow restriction in order to identify the severity of COPD. Based on a per- son's gender, age, weight, and height, the forced expiratory volume in one second (FEV 1 ) is represented as a percentage of the projected normal value for spirometry. [12] According to Global Initiative for Chronic Obstructive Lung Disease (GOLD) recommendations, COPD is split into four stages based on FEV 1 [12], and COPD exacerbations are categorized as episodes of increasing symptoms from stage I to IV.
[13] FEV 1 is the basis for treatment recommendations for medical professionals in Europe and America [14] although FEV 1 does not accurately reflect how symptoms of COPD manifest in the body as a whole. [15] Numerous investigations have shown that COPD susceptibility has a genetic component. [16] Thus, the identification of relevant genetic markers may help to develop diagnostic and therapeutic targets for the treatment of COPD. Previous genome-wide association studies have identified several genetic loci related to COPD susceptibility. [17] A recent study investigated gene expression profiles among patients with frequent COPD exacerbations and identified three genes (ARHGEF10, LAF4, and B3GNT) as predictors of exacerbations. [18] However, the study did not investigate whether those genes could be used as biomarkers for the clinical classification of COPD.

Microarray data and data pre-processing
Downloads of the microarray data (GSE146560) from the Gene Expression Omnibus (GEO) database (http://www.ncbi. nlm.nih.gov/geo/) were made in 2022. [19] The GPL6480 Agilent-014850 Whole Human Genome Microarray 4x44K G4112F was the platform utilized. Four control smokers, four control never-smokers, four COPD smokers, and four COPD ex-smokers made up the total of 16 samples that were available. In this study, the four control never-smokers were compared with the four smokers with COPD (Figure 1). The Affy Bioconductor package [20] and Affymetrix annotation files from Brain Array Lab (Affymetrix, Santa Clara, CA; http://www.affymetrix.com/analysis/) were used to preprocess the data from the expression profile chip. Using the robust multiarray average algorithm, background correction, quartile data normalization, and probe summarization were carried out (http://www.bioconductor.org).

Identification of differentially expressed genes (DEGs)
The expression values for the normalized data were computed using the limma R package. [21] The student'stest was used to identify DEGs. The Benjamini-Hochberg method was used to convert raw -values into false discovery rates (FDR). [22] An FDR of 0.1 was chosen as the cutoff value.

Enrichment analysis for DEGs
Using the Gene Ontology (GO) database (http: //geneontology.org/), the DEGs were functionally enriched in the biological process, molecular function, and cellular component categories. [23] Pathway enrichment of the DEGs was done using the KEGG database (http://www.genome.jp/kegg/pathway.html). [24] A -value of 0.05 was chosen as the cutoff.

Data preprocessing and DEG screening
Pre-processing was done on a Gene Expression Omnibus (GEO) dataset (GSE146560) to create a dataset of four control never-smokers compared with four smokers with COPD. Overall, 10,583 DEGs, including 1,065 up-regulated genes and 9,518 down-regulated genes, were discovered; a comprehensive list of the differentially expressed genes is provided in Supplemental Table 1. A list of the most statistically significant up-and down-regulated genes is given in Table 1.
A volcano plot was used to display the distribution of DEGs (Figure 2).
We observed an under-expression of the hedgehog interacting protein (HHIP) variants ( Table 2) and 5hydroxytryptamine receptor 2A (HTR2A) genes ( Table 3) when we looked at the expression of previously reported COPD-related genes in smokers. [35,36]

GO and KEGG analysis
The studies of biological pathways and functions were carried out using the R package clusterProfiler. [25] GO categories for "biological process, " "molecular function, " and "cellular component" were each enriched. The upregulated and downregulated DEGs for molecular function (MF) were connected to 11 and 114 functions, respectively (Supplemental Tables 1 and 2, supplemental .csv file). As shown in Figure 3, four were discovered to be related to both patterns of gene expression: ion binding (MF, GO:0043167), metal ion binding (MF, GO:0046872), cation binding (MF, GO:0043169), and protein binding (MF, GO:0005515) ( Table  4).
Likewise, 84 and 607 biological processes (BP) were linked to the upregulated and downregulated DEGs, respectively (Supplemental Tables 1 and 2, supplemental .csv file). In this instance, nine pathways were shown to be associated with both patterns of gene expression (Figure 4). They include the synthesis of organic cyclic compounds, the reaction to organic substances, and the regulation of nitrogen compound metabolism (BP, GO:0031323), which is expressed considerably higher when the genes are upregulated ( =5.64 -06) than when they are downregulated ( =0.002651) ( Table  5).

Cellular component (CC) upregulated and downregulated
DEGs were linked to 39 and 120 different cell structural components, respectively (Supplemental Tables 1 and 2, supplemental .csv file). Two were linked to both forms of gene expression, as seen in Figure 5: cytoplasm (CC, GO:0005737) and vesicle (CC, GO:0031982) ( Table 6).
The DEGs were primarily enriched in KEGG pathways, including pathways for the Byzantine arch palate, inflammation, infection, and feeding difficulties, as well as pathways for maturity-onset diabetes of the young, insulin secretion, taste transduction, dilated cardiomyopathy, morphine addiction, and the Hippo signaling pathway (Figure 6, Table 7).

Discussion
In the present investigation, DEGs showed that COPD patients had downregulated levels of the genes FBXL19-AS1, KRTAP5-AS1, and HAGLR antisense. The non-coding DNA strand of a gene is known as "antisense". [26] Smoking may not cause COPD by itself because molecular actors also play important roles. [27] In addition to changes in the expression patterns of many long (>200 nt) non-coding RNAs (lncRNAs), COPD development is accompanied by modifications in protein-coding genes.
[28] Some lncRNAs with variable expression play significant roles in COPD and have potential therapeutic benefits. [29,30] In contrast to controls, MCM3AP-AS1 expression was found to be downregulated in the plasma of COPD patients in a prior study. [31] MCM3AP-AS1 expression was lower in smokers than in never-smokers among the controls. According to a 3-year follow-up study, smokers who had lower MCM3AP-AS1 expression had a higher incidence of COPD. MCM3AP-AS1 expression considerably increased following COPD therapy. [31] Additionally, PRAC1, a potential gene for prostate cancer susceptibility, was discovered to be overexpressed in COPD patients when upregulated genes were examined. Patients with COPD who experience associated problems, such as acute respiratory failure, cardiopulmonary arrest, pneumonia, and acute exacerbation, have a higher chance of developing prostate cancer. [32] Many studies have been conducted on the molecular aspects of a hereditary propensity to COPD. [37] A genetic site of vulnerability has been found in the GWASs of FAM13A, CHRNA5/3, IREB2, MMP3/MMP12, TGFB2, and HHIP. The lung function parameters FEV 1 /FVC ratio, FEV 1 , which are critical for diagnosis and COPD categorization in accordance with GOLD recommendations, have been linked to the HHIP gene. [35] HHIP, a member of the hedgehog gene family that is involved in lung morphogenesis and development, is encoded by the HHIP gene. [38] The HHIP gene, which has 13 exons, is located on chromosome 4q31.21, is almost 91 kb long, and encodes a 700-amino acid protein. [39] Many single nucleotide polymorphisms (SNPs) have been found to be related to COPD susceptibility and other pulmonary function features in the non-coding areas of the HHIP gene, primarily in European and Asian individuals. [35] In our work, we discovered the expression of the HHIP-AS1 gene's non-coding region, which can be used as a primary diagnostic marker in the diagnosis of COPD in smokers. Moreover, Ortega-Martnez et al. demonstrated a connection between changes in the blood and sputum protein levels of HHIP and variations in the HHIP gene linked to genetic vulnerability. [35] Nicotine, which is the main ingredient in cigarettes, operates on the central nervous system and alters the smoker's brain. The altering of key neurotransmitter levels, particularly dopamine and serotonin, is one of nicotine's main impacts (5-hydroxytyptamine, 5-HT). Hence, potential candidates for the mechanisms underlying nicotine addiction include the genes encoding the receptors or transporters involved in the dopaminergic and serotonergic pathways. [40] Moreover, the neurotransmitter 5-HT is significant in pulmonary function. Bronchoconstriction, pulmonary vasoconstriction, and an increased risk of pulmonary hypertension (PH) are all caused by elevated levels of 5-HT. [41,42] The primary causes of PH in COPD are alveolar hypoxia, pulmonary vasoconstriction, vessel remodeling, and pulmonary vascular resistance. [43] Plasma 5-HT levels and COPD are linked, as shown by Verde et al. [36] In COPD patients compared to the control group, higher plasma 5-HT levels were discovered. Smoking and the HTR2A genotype may cause an up-regulation of the serotonergic system, which may lead to an increase in COPD risk and the inflammatory response of airway epithelial cells.
Transmembrane transporter activity is highly expressed as a biological process in COPD patients, according to a functional enrichment study. [44] Current theories suggest that COPD is a chronic inflammatory illness and that macrophages are important in its pathogenesis. The immunometabolic profile of macrophages and the characteristics of lipid homeostasis, in which the ATP-binding cassette transporter A1 (ABCA1) plays a crucial role, together dictate the diversity of macrophage functions. [33] After the liver, the expression of the ABCA1 gene is highest in lung tissues, indicating the significance of the ABCA1 carrier in lung function. The transporter has a role in the development of COPD by regulating inflammation, phagocytosis, and apoptosis, as well as providing lipid metabolism. The pathophysiological mechanisms that result in the establishment of a varied clinical course of the disease may include violations of the processes in which ABCA1 is involved. [33] Certain cellular and biological regulatory systems are elevated in COPD patients based on molecular function. An earlier work added to the understanding that the onset and progression of COPD is a multifactorial pathological process involving several inflammatory cells, inflammatory mediators, and associated cell signaling pathways. Additionally, COPD controls mucus secretion, mucoprotein (MUC) production, and goblet cell proliferation. [34] The neuroactive ligand-receptor interaction, taste transduction, and maturity-onset diabetes of the young were the three most enriched pathways, according to the KEGG analysis. These annotation results offer crucial suggestions for identifying molecular connections in the progression of COPD. Indeed, it has been suggested that the development of diabetes, which is more common in COPD patients than in the general population, has a significant role in the pathogenesis of the disease. Similar therapy approaches are suggested to treat both type 1 and type 2 diabetes mellitus, which are linked to pulmonary problems.

Conclusion
A total 10,583 DEGs were found in the current study's gene expression analysis, of which 1,065 showed upregulation and 9,518 showed downregulation. Among the newly found DEGs that may be involved in the etiology of COPD are FBXL19-AS1, KRTAP5-AS1, and HAGLR antisense, which are downregulated and implicated as biomarkers in COPD. To validate the results of the current investigation, additional