0% found this document useful (0 votes)
7 views1 page

Fi Fi Fi: TT TT FF

The document describes the processing of CNV and SNV datasets using GISTIC software to analyze copy number variations and single-nucleotide variations in lung cancer patients. It outlines the classification of CNVs into three amplitude threshold categories and details the extraction of SNV information, including mutation types. Additionally, it discusses the correlation coefficient analysis to assess relationships between various factors and survival times in lung cancer subtypes, retaining statistically significant features for further analysis.

Uploaded by

emilywong0304
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views1 page

Fi Fi Fi: TT TT FF

The document describes the processing of CNV and SNV datasets using GISTIC software to analyze copy number variations and single-nucleotide variations in lung cancer patients. It outlines the classification of CNVs into three amplitude threshold categories and details the extraction of SNV information, including mutation types. Additionally, it discusses the correlation coefficient analysis to assess relationships between various factors and survival times in lung cancer subtypes, retaining statistically significant features for further analysis.

Uploaded by

emilywong0304
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Diagnostics 2025, 15, 872 9 of 26

The original CNV dataset was merged into a seg file, and the corresponding marker
file was downloaded for preprocessing. These two files were then processed using the
GISTIC software (Version 3.9.11 prerelease; https://cloud.genepattern.org/gp/pages/in-
dex.jsf accessed on 19 March 2025) to obtain CNV data for different genes in all patients.
The values 0, 1, and 2 represent the amplitude threshold categories in CNVs that de-
scribe the extent of copy number changes. They are interpreted as follows:
0: t < 0.1 indicates little to no significant copy number changes at this location. This
typically indicates that the gene copy number is close to normal.
1: 0.1 < t < 0.9 indicates a moderate degree of copy number change at this location.
Changes within this range indicate a slight increase or decrease in the gene copy number.
2: t > 0.9 indicates a significant copy number change at this location. This often implies
substantial copy number amplification or large-scale gene loss, which can greatly affect
gene expression or function.
SNV Parameters
SNVs are mutations involving single-nucleotide changes in the normal human ge-
nome that lead to deletions, insertions, or substitutions. Tumorigenesis is closely associ-
ated with SNVs [41,42].
We extracted the following information for SNVs: TCGA Identity (TCGAID) (Tu-
mor_Sample_Barcode), gene (Hugo_Symbol), and variant classification (Variant_Classifi-
cation). The variant classification was processed, where mutation types “Missense”,
“Nonsense”, “Nonstop”, “Translation_Start_Site”, “Frame_Shift_Del”,
“Frame_Shift_Ins”, “In_Frame_Del”, “In_Frame_Ins”, and “Splice_Site” were considered
as non-synonymous variants and assigned a value of 1. Mutation types “3′UTR”, “5′UTR”,
“3′Flank”, “5′Flank”, “Silent”, “Intron”, “IGR”, “RNA”, and “Splice region” were consid-
ered synonymous variants and assigned a value of 0.
mRNA Parameters
mRNA directly or indirectly influences gene translation, reflecting the pathological
state of tissues. Therefore, the detection of changes in intracellular mRNA levels can pro-
vide physiological evidence for early disease detection.
Based on the previous processing, we finally merged the clinical, nuclear, and genetic
features of LUAD and LUSC based on the unique TCGAID to generate a comprehensive
dataset.

2.5. Correlation Coefficient Analysis


The PCC (r) reflects the strength of the linear relationship between two variables,
with values ranging from −1 to 1. An r value of 1 denotes a perfect positive correlation, 0
denotes no correlation, and −1 denotes a perfect negative correlation [43]. The formula for
the PCC is below, where r represents the PCC; 𝑥 and 𝑦 are the ith data points of vari-
ables X and Y, respectively; 𝑥̅ and 𝑦 are the mean values of X and Y, respectively; and n
is the number of data points:
∑ 𝑥 𝑥̅ 𝑦 𝑦
𝑟 (17)
∑ 𝑥 𝑥̅ ∑ 𝑦 𝑦

We used PCCs to analyze the relationships between all factors and survival times at
1, 2, and 3 years, as well as between factors and lung cancer subtypes. Based on the corre-
lation coefficient values, factors with minimal impact were excluded. We retained features
with an absolute correlation coefficient (|r|) greater than 0.05 and p < 0.05 to ensure sta-
tistical significance.

You might also like