Research Article

[Retracted] High-Throughput Screen of Natural Compounds and Biomarkers for NSCLC Treatment by Differential Expression and Weighted Gene Coexpression Network Analysis (WGCNA)

Table 2

Detailed information of RNA-seq analysis pipeline.

StepAnalysesSoftwareScriptInputMain output

1Prepare raw dataTrimmomatic ver.0.36trimmomatic-0.36.jar PE -threads 12 -phred33 -trimlog xx.log xx.fq1 xx.fq2 -baseoutFASTQ fileFASTQ file
2Quality controlFastQCfastqc -t 10 -f fastq -o out xx.fqFASTQ file from step 1FASTQ file, multiqc_report.html
3AlignmentHISAThisat2 -p 20 -x hg38.fa -1 xx_clean_1P.fq.gz -2 xx_clean_2P.fq.gz -S XX.samFASTQ file from step 2Sam file
4SortSamtoolssamtools sort -@ 20 -O bam -o xx.bam xx.samSam file from step 3Bam file
5Transcript assemblyStringTiestringtie -e -p 30 -G hg38.gff -o xx.gtf xx.bamBam file from step 4gtf file
6Merged transcriptsStringTie-- mergestringtie --merge -o merged.gtf gtflistAll gtf file from step 5gtf file
7Reads countStringTie--eBstringtie -B -e -p 30 -G merged.gtf -o xx.gtf xx.bamgtf file from step 6 and Bam file from step 4Gallgown input file and gtf file
8Generated gene expression (FPKM)Ballgowngene_expression = gexpr(bg)Output files from StringTie--eBGene expression tables
9Differential expression analysisDESeq2 Wrapper of TBtoolsSelection criteria of DEGs: and Gene expression tablesVolcano plot and differential gene list
10KEGG pathway analysisR package clusterProfiler and http://org.Hs.eg.dbdata setParameter: , , Differential gene listKEGG pathway tables
11Gene expression visualizationTBtoolsDefault parametersExpression of top genesHeatmaps
12General analysis of WGCNA“WGCNA” R packagePower of (scale free )FPKM tablePlots and forms of module connectivity
13Selecting module and hub genes“WGCNA” R package and Output files from WGCNAModule and gene list
14NetworkCytoscapeDefault parameters
The criteria for edge filter: weight and weight
Output files from WGCNANetwork plots

The table lists analysis steps, software, and main scripts in our pipeline. Starting from the input FASTQ files produced by sequencing and finally generating the results of candidate medicine and genes for NSCLC cancer research.