REPORTE DE MUESTRA
Los datos para este muestra provienen del artículo A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae de Nookaew et al., en el cual se estudia la cepa de S. cerevisiae CEN.PK 113-7D (levadura) bajo dos condiciones metabólicas distintas: exceso de glucosa (batch) y limitación de glucosa (chemostat).
- Nombre
- Thalassomics
- Contacto
- info@thalassomics.com
- Sitio Web
- https://thalassomics.com
- Tipo de proyecto
- RNA-seq
- Platafora
- HiSeq 2500 High Output V4
- Setup
- 2x125
Resumen general
| Sample Name | % Duplication | % > Q30 | Mb Q30 bases | Reads After Filtering | GC content | % PF | % Adapter | Error rate | Non-primary | Reads mapped | % Mapped | % Proper pairs | % MapQ 0 reads | Total seqs | Mean insert | % Aligned |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| batch1 | 0.16% | 0.0M | 0.1M | 99.9% | 99.2% | 0.0% | 0.1M | 167.6bp | 99.9% | |||||||
| batch1_raw | 0.0% | 92.8% | 7.6Mb | 0.1M | 43.4% | 99.4% | 1.5% | |||||||||
| batch2 | 0.15% | 0.0M | 0.1M | 99.9% | 99.4% | 0.0% | 0.1M | 172.8bp | 99.9% | |||||||
| batch2_raw | 0.0% | 92.9% | 10.6Mb | 0.1M | 43.4% | 100.0% | 0.6% | |||||||||
| batch3 | 0.16% | 0.0M | 0.1M | 99.9% | 99.3% | 0.0% | 0.1M | 168.8bp | 99.8% | |||||||
| batch3_raw | 0.0% | 92.8% | 8.1Mb | 0.1M | 43.6% | 99.5% | 1.4% | |||||||||
| chem1 | 0.26% | 0.0M | 0.1M | 99.9% | 98.9% | 0.0% | 0.1M | 166.1bp | 99.9% | |||||||
| chem1_raw | 0.0% | 92.9% | 6.7Mb | 0.1M | 43.4% | 100.0% | 0.7% | |||||||||
| chem2 | 0.25% | 0.0M | 0.1M | 99.9% | 99.0% | 0.0% | 0.1M | 172.7bp | 99.9% | |||||||
| chem2_raw | 0.0% | 93.1% | 8.4Mb | 0.1M | 43.3% | 100.0% | 0.5% | |||||||||
| chem3 | 0.25% | 0.0M | 0.1M | 99.9% | 99.1% | 0.0% | 0.1M | 172.3bp | 99.9% | |||||||
| chem3_raw | 0.0% | 93.1% | 10.1Mb | 0.1M | 43.4% | 100.0% | 0.6% |
fastp
0.23.4
Análisis de calidad de lecturas con Fastp.URL: https://github.com/OpenGene/fastpDOI: 10.1093/bioinformatics/bty560
Este módulo analiza la calidad de las lecturas utilizando Fastp, proporcionando estadísticas detalladas sobre la calidad de las secuencias.Filtered Reads
Filtering statistics of sampled reads.
Insert Sizes
Insert size estimation of sampled reads.
Sequence Quality
Average sequencing quality over each base of all reads.
GC Content
Average GC content over each base of all reads.
N content
Average N content over each base of all reads.
Samtools
1.20
HTSlib:
1.21
Análisis de archivos BAM con Samtools.URL: http://www.htslib.orgDOI: 10.1093/bioinformatics/btp352
Este módulo utiliza Samtools para analizar archivos BAM, proporcionando estadísticas sobre la alineación y cobertura de las lecturas.Percent mapped
Alignment metrics from samtools stats; mapped vs. unmapped reads vs. reads mapped with MQ0.
For a set of samples that have come from the same multiplexed library, similar numbers of reads for each sample are expected. Large differences in numbers might indicate issues during the library preparation process. Whilst large differences in read numbers may be controlled for in downstream processings (e.g. read count normalisation), you may wish to consider whether the read depths achieved have fallen below recommended levels depending on the applications.
Low alignment rates could indicate contamination of samples (e.g. adapter sequences), low sequencing quality or other artefacts. These can be further investigated in the sequence level QC (e.g. from FastQC).
Reads mapped with MQ0 often indicate that the reads are ambiguously mapped to multiple locations in the reference sequence. This can be due to repetitive regions in the genome, the presence of alternative contigs in the reference, or due to reads that are too short to be uniquely mapped. These reads are often filtered out in downstream analyses.
Alignment stats
This module parses the output from samtools stats. All numbers in millions.
Bowtie 2 / HiSAT2
Alineamiento de lecturas con Bowtie2.URL: http://bowtie-bio.sourceforge.net/bowtie2; https://ccb.jhu.edu/software/hisat2DOI: 10.1038/nmeth.1923; 10.1038/nmeth.3317; 10.1038/s41587-019-0201-4
Este módulo utiliza Bowtie2 para alinear las lecturas a un genoma de referencia, proporcionando estadísticas sobre el alineamiento.Paired-end alignments
This plot shows the number of reads aligning to the reference in different ways.
There are 6 possible types of alignment:
- PE mapped uniquely: Pair has only one occurence in the reference genome.
- PE mapped discordantly uniquely: Pair has only one occurence but not in proper pair.
- PE one mate mapped uniquely: One read of a pair has one occurence.
- PE multimapped: Pair has multiple occurence.
- PE one mate multimapped: One read of a pair has multiple occurence.
- PE neither mate aligned: Pair has no occurence.
GffCompare
0.12.9
Tool to compare, merge and annotate one or more GFF files with a reference annotation in GFF format.URL: https://ccb.jhu.edu/software/stringtie/gffcompare.shtmlDOI: 10.12688/f1000research.23297.1
Accuracy values
Displayed are the accuracy values precisiond and sensitivity for different levels of genomic features. The metrics are calculated for the comparison of a query and reference GTF file.
Accuracy metrics are calculated as described in Burset et al. (1996). Sensitivity is the true positive rate, Precision True Positives are query features that agree with features in the reference. The exact definition depends on the feature level:
- Base: True positives are bases reported at the same coordinates.
- Exon: Comparison units are exons that overlap in query and reference with same coordinates.
- Intron chain: True positives are query transcripts for which all introns coordinates match those in the reference.
- Transcript: More stringent then intron chain, all Exon coordinates need to match. Outer exon coordinates (start + end) can vary by 100 bases in default settings
- Locus: Cluster of exons need to match.
A more in depth description can be found here.
Novel features
Number of novel features, present in the query data but not found in the reference annotation.
Missing features
False negative features, which are found in the reference annotation but missed (not present) in the query data.
Matriz de distancia
Mapa de calor de abundancias
Mapa de calor (heatmap) con los valores de abundancia de los genes/transcritos expresados diferencialmente (valores normalizados).
Análisis PCA
Este gráfico muestra los resultados del Análisis de Componentes Principales (PCA) con valores normalizados (VSD) con colores por grupo.
Gráfico Volcano
Gráfico de volcan los resultados del análisis de expresión diferencial.
Software Versions
Software Versions lists versions of software tools extracted from file contents.
| Group | Software | Version |
|---|---|---|
| GffCompare | GffCompare | 0.12.9 |
| Samtools | HTSlib | 1.21 |
| Samtools | 1.20 | |
| fastp | fastp | 0.23.4 |