Transcriptomics
My transcriptomics portfolio :)
Introduction
I began learning transcriptomics analysis as my first step into computational biology, aiming to explore the world of cancer biology through non-hypothesis-driven research. I hope that transcriptomics will whisper me toward a fulfilling and exciting journey in the field of oncology. My heart shakes when analyzing wet lab data, and now it also shakes when I press Enter while analyzing dry lab data.
1. Data set 1: GSE197576
1.1 Analysis overview:
1.1.1 Analysis set 1: From FASTQ file to visualization
1.1 From FASTQ file to gene counts
1.2 From gene count matrix analyzed to differentially expressed genes by DESeq2 for functional analysis and visualization including
- Volcano plot
- PCA
- Heatmap
- Over-Representation Analysis (ORA) using GO term and MSigDB
- Gene Set Enrichment Analysis (GSEA) using fast GSEA
1.1.2. Analysis set 2: From gene count matrix (download from GEO) to visualization
2.1 From gene count matrix to transcripts per Million (TPM)
2.2 From gene count matrix analyzed to differentially expressed genes by DESeq2 for functional analysis and visualization.
1.2 Result Discussion
My analysis set 1 is derived from the same wet laboratory data. However, the differences in results and visualizations between analysis set 1 and analysis set 2 (this data set) may arise from two points:
-
The upstream processes I used upstream processes did not employ the same tools as those used in the publication. This is the limitation of the HPC. The use of different tools and processes might lead to variations in gene count results.
-
The gene count from GEO is presented as gene symbols, which can be duplicated. To manage these duplications, several methods can be employed, such as summation, averaging, or taking the maximum count. In this case, the gene counts from GEO have been processed to address duplicated genes, which may differ from my approach.
In my approach, I try to maintain the ENSEMBL ID from the upstream analysis and change them to gene symbols only for readability in visualizations.
1.3 What I have learned from this data set
This is the first transcriptomics analysis oin my computational biology portfolio, I think alwhere I reflect ong and explore every step I ran. Itook. I have mostly explained thoese steps in each file and wrote a blog which camepost that originated from a question while I didI had during the analysis: βWhat are GO, msigDB, KEGG, ORA, and GSEA in transcriptomics analysis?β.
One unique aspect of this data set is the data filtering process. The p-value distribution of the differentially expressed genes is not normally distributed. I learned how to filter the data from a post by Ming (Tommy) Tang.
Other related posts
Acknowledgment
-
I would like to acknowledge Ming (Tommy) Tang for his encouragement and the valuable knowledge he provided through various online platforms.
-
I would like to thank Dr. Patipark Kueanjinda, who kindly supported my learning in transcriptomics.
-
I would like to thank all my colleagues for their comments, suggestions, and assistance.
-
Lastly, I would like to thank my Ph.D. advisor. Without him, I would not have discovered the exciting world of cancer biology. ππ From that point, I expanded my interest to the -omic fields for a deeper understanding. π€©π€©
Iβm very happy π₯° that you are visiting my computational biology portfolio and would be even happier if you could provide suggestions or feedback π€©.
You can contact me through various online platforms here π¬ or leave a comment below using GitHub account. ππΌ