Fi Fi Fi: TT TT FF
Fi Fi Fi: TT TT FF
The original CNV dataset was merged into a seg file, and the corresponding marker
file was downloaded for preprocessing. These two files were then processed using the
GISTIC software (Version 3.9.11 prerelease; https://cloud.genepattern.org/gp/pages/in-
dex.jsf accessed on 19 March 2025) to obtain CNV data for different genes in all patients.
The values 0, 1, and 2 represent the amplitude threshold categories in CNVs that de-
scribe the extent of copy number changes. They are interpreted as follows:
0: t < 0.1 indicates little to no significant copy number changes at this location. This
typically indicates that the gene copy number is close to normal.
1: 0.1 < t < 0.9 indicates a moderate degree of copy number change at this location.
Changes within this range indicate a slight increase or decrease in the gene copy number.
2: t > 0.9 indicates a significant copy number change at this location. This often implies
substantial copy number amplification or large-scale gene loss, which can greatly affect
gene expression or function.
SNV Parameters
SNVs are mutations involving single-nucleotide changes in the normal human ge-
nome that lead to deletions, insertions, or substitutions. Tumorigenesis is closely associ-
ated with SNVs [41,42].
We extracted the following information for SNVs: TCGA Identity (TCGAID) (Tu-
mor_Sample_Barcode), gene (Hugo_Symbol), and variant classification (Variant_Classifi-
cation). The variant classification was processed, where mutation types “Missense”,
“Nonsense”, “Nonstop”, “Translation_Start_Site”, “Frame_Shift_Del”,
“Frame_Shift_Ins”, “In_Frame_Del”, “In_Frame_Ins”, and “Splice_Site” were considered
as non-synonymous variants and assigned a value of 1. Mutation types “3′UTR”, “5′UTR”,
“3′Flank”, “5′Flank”, “Silent”, “Intron”, “IGR”, “RNA”, and “Splice region” were consid-
ered synonymous variants and assigned a value of 0.
mRNA Parameters
mRNA directly or indirectly influences gene translation, reflecting the pathological
state of tissues. Therefore, the detection of changes in intracellular mRNA levels can pro-
vide physiological evidence for early disease detection.
Based on the previous processing, we finally merged the clinical, nuclear, and genetic
features of LUAD and LUSC based on the unique TCGAID to generate a comprehensive
dataset.
We used PCCs to analyze the relationships between all factors and survival times at
1, 2, and 3 years, as well as between factors and lung cancer subtypes. Based on the corre-
lation coefficient values, factors with minimal impact were excluded. We retained features
with an absolute correlation coefficient (|r|) greater than 0.05 and p < 0.05 to ensure sta-
tistical significance.