Factor analysis report for Gut microbiome of Yakut patients with Viliuisk encephalomyelitis and healthy controls

Report summary

Explore the associations between microbiota composition of data and external factors provided by the user.

Created01/08/2019
Updated06/04/2020
TypeFactor analysis report
ProjectGut microbiome of Yakut patients with Viliuisk encephalomyelitis and healthy controls
Uploaded samples17

Taxonomic composition

Heatmap of taxonomic composition

The interactive heatmap represents relative abundance of major microbial taxa (columns) in the samples (rows). Using the drop-down list “Heatmap settings” on the right of the heatmap, users can select taxonomic rank of interest. For convenience of comparison between close values, clicking on a cell “freezes” the displayed value of cell on the Legend and additionally the displayed abundance of top 10 taxa of corresponding sample (click again or on the cross near sample name to “unfreeze”). Use the Top control to switch the way of major composition display between the top features in the selected sample and the top features across all samples on the average. Controls at the top and bottom-right allow to change the display of rows and columns.

Analysis of outliers

Automatic filtering of the user samples with extreme taxonomic composition (based on the combined analysis of user and all available external datasets). Analysis of outliers: samples in the upper 1% tail of distribution of median distance between each sample and closest 50% of neighbours approximated by normal distribution. List of outliers:

No outliers detected.

PCoA visualization based on taxonomic composition

Distribution of the samples by their taxonomic composition in reduced dimensionality. The closer the samples (points) on the plot, the more similar their composition. Vectors show the directions in which the levels of the respective major taxa increase. Method of dimension reduction: PCoA (Principal Coordinate Analysis); dissimilarity metric: Bray-Curtis. Clicking on a dot “freezes” the detailed information about the sample on the right of the plot (click again or on the cross near sample name to “unfreeze”). Switch between the display modes with or without outliers and with or without vectors showing major microbial “drivers” using the respective controls.

Alpha-diversity

The measure describes the conditional number of taxa in each sample. Metric: Shannon index. Clicking on a dot “freezes” the displayed value on Y axis and additionally the abundance of top 10 taxa (click on it or on the cross near the sample name to “unfreeze”). In addition, the mean and confidence interval value appear when the mouse is over the boxplot. Controls at the top and bottom-right allow to change the displayed data.

Reconstruction of metabolic potential

Predicted functional composition of microbiota.

Vitamins synthesis

Plots of relative abundance by factor

gender

Total relative abundance of the genes involved in vitamins biosynthesis summed across the respective pathways.

KEGG pathways

a_b

Total relative abundance of the genes involved in vitamins biosynthesis summed across the respective pathways.

KEGG pathways

origin

Total relative abundance of the genes involved in vitamins biosynthesis summed across the respective pathways.

KEGG pathways

settlement_size

Total relative abundance of the genes involved in vitamins biosynthesis summed across the respective pathways.

KEGG pathways

VE_status

Total relative abundance of the genes involved in vitamins biosynthesis summed across the respective pathways.

KEGG pathways

Synthesis of short-chain fatty acids (SCFAs)

Gut microbes are known to produce SCFAs. The boxplots represent median, standard deviation and quartiles of the SCFAs biosynthesis pathways in the samples.

Synthesis of butyrate

Plots of relative abundance by factor

gender

Total relative abundance of the genes involved in butyrate synthesis summed across the respective pathways.

KEGG pathways

a_b

Total relative abundance of the genes involved in butyrate synthesis summed across the respective pathways.

KEGG pathways

origin

Total relative abundance of the genes involved in butyrate synthesis summed across the respective pathways.

KEGG pathways

settlement_size

Total relative abundance of the genes involved in butyrate synthesis summed across the respective pathways.

KEGG pathways

VE_status

Total relative abundance of the genes involved in butyrate synthesis summed across the respective pathways.

KEGG pathways

Synthesis of propionate

Plots of relative abundance by factor

gender

Total relative abundance of the genes involved in propionate synthesis summed across the respective pathways.

KEGG pathways

a_b

Total relative abundance of the genes involved in propionate synthesis summed across the respective pathways.

KEGG pathways

origin

Total relative abundance of the genes involved in propionate synthesis summed across the respective pathways.

KEGG pathways

settlement_size

Total relative abundance of the genes involved in propionate synthesis summed across the respective pathways.

KEGG pathways

VE_status

Total relative abundance of the genes involved in propionate synthesis summed across the respective pathways.

KEGG pathways

Statistical analysis

General difference of community structure between two groups

Test if there are significant differences in overall community composition between the samples of two groups. Method: permutational multivariate analysis of variance (PERMANOVA), beta-diversity metric: weighted UniFrac. The result includes the total number of samples, number of PERMANOVA permutations, p-value for the null hypothesis that there is no difference between the groups, as well as information on the equality of group dispersions (obtained using PERMDISP method with same number of permutations). If the group variations are not equal, the results should be interpreted with caution. Samples-outliers listed in the taxonomic composition section are excluded from this analysis.

factor p-value adjusted p-value significant R-squared
VE_status 0.187741 0.187741 False 0.073656
parameter value
sample size 17
number of permutations 20000
significance level 0.05

General difference of metabolic potential structure between two groups

Test if there are significant differences in overall metabolic structure between the samples of two groups. Method: permutational multivariate analysis of variance (PERMANOVA), beta-diversity metric: Bray-Curtis distance. The result includes the total number of samples, number of PERMANOVA permutations, p-value for the null hypothesis that there is no difference between the groups, as well as information on the equality of group dispersions (obtained using PERMDISP method with same number of permutations). If the group variations are not equal, the results should be interpreted with caution. Samples-outliers listed in the taxonomic composition section are excluded from this analysis.

factor p-value adjusted p-value significant R-squared
VE_status 0.358932 0.358932 False 0.067867
parameter value
sample size 17
number of permutations 20000
significance level 0.05

Taxonomic composition

Individual microbial taxa for which relative abundance is significantly different between two groups are identified.

Generalized linear mixed effect model

A generalized mixed effects linear model is fitted for each taxon to identify associations with each factor from metadata. If on the average there is >50 samples per each fixed factor coefficient then a zero-inflated negative binomial distribution family is used; in other cases - a negative binomial one. Rare taxa are excluded from the analysis (a taxon must be present in at least 10% of the samples at the level of >0.2%). Multiple testing adjustment is performed using Benjamini–Hochberg procedure. The information about distribution family, terms of the model and sample size is displayed in "Model details" section.

Significant results

The column 'coefficient' contains the value of linear model coefficient. Its sign shows the direction of association between a microbial taxon and a factor - positive or negative. If a factor is categorical (for example, group), it is first decomposed into several factors - one per each value/group. Each of these is viewed as a separate factor relative to the first group (sorted by alphabet).

VE_status

taxon taxa level covariate group, compared with coefficient p-value adjusted p-value
p__Euryarchaeota phylum VE_statusyes no 2.087740 0.000289 0.002313
c__Methanobacteria class VE_statusyes no 2.087740 0.000289 0.004048
o__Methanobacteriales order VE_statusyes no 2.087740 0.000289 0.005783
f__Methanobacteriaceae family VE_statusyes no 2.087740 0.000289 0.010698
f__u(o__Clostridiales) family VE_statusyes no 1.205089 0.000773 0.014298
g__Methanobrevibacter genus VE_statusyes no 2.133336 0.000226 0.012647
g__u(o__Clostridiales) genus VE_statusyes no 1.205089 0.000773 0.021640
s__u(g__Methanobrevibacter) species VE_statusyes no 2.133336 0.000226 0.015809
s__u(o__Clostridiales) species VE_statusyes no 1.205089 0.000773 0.027050
s__u(g__Streptococcus) species VE_statusyes no -1.928370 0.001641 0.038280

All results of the test

Data filtration summary

Information about filtration of factors and features during the analysis

Metadata after NAs removement

Metadata after removement of NAs, factors with unique or all distinct values

Download metadata_after_na_filtration.csv

Excluded features

Model details

trait state
distribution negative binomial
formula feature_abundance ~ VE_status + (1|a_b)
link function log
number of samples 17

Functional composition

Individual microbial taxa for which relative abundance is significantly different between two groups are identified.

Generalized linear mixed effect model

A generalized mixed effects linear model is fitted for each taxon to identify associations with each factor from metadata. If on the average there is >50 samples per each fixed factor coefficient then a zero-inflated negative binomial distribution family is used; in other cases - a negative binomial one. Rare taxa are excluded from the analysis (a taxon must be present in at least 10% of the samples at the level of >0.2%). Multiple testing adjustment is performed using Benjamini–Hochberg procedure. The information about distribution family, terms of the model and sample size is displayed in "Model details" section.

Significant results

The column 'coefficient' contains the value of linear model coefficient. Its sign shows the direction of association between a microbial taxon and a factor - positive or negative. If a factor is categorical (for example, group), it is first decomposed into several factors - one per each value/group. Each of these is viewed as a separate factor relative to the first group (sorted by alphabet).

Nothing to show

Data filtration summary

Information about filtration of factors and features during the analysis

Metadata after NAs removement

Metadata after removement of NAs, factors with unique or all distinct values

Download metadata_after_na_filtration.csv

Excluded features

Model details

trait state
distribution gaussian
formula feature_abundance ~ VE_status + (1|a_b)
number of samples 17
transform arcsin(sqrt)

Specific pathways

Individual microbial taxa for which relative abundance is significantly different between two groups are identified.

Generalized linear mixed effect model

A generalized mixed effects linear model is fitted for each taxon to identify associations with each factor from metadata. If on the average there is >50 samples per each fixed factor coefficient then a zero-inflated negative binomial distribution family is used; in other cases - a negative binomial one. Rare taxa are excluded from the analysis (a taxon must be present in at least 10% of the samples at the level of >0.2%). Multiple testing adjustment is performed using Benjamini–Hochberg procedure. The information about distribution family, terms of the model and sample size is displayed in "Model details" section.

Significant results

The column 'coefficient' contains the value of linear model coefficient. Its sign shows the direction of association between a microbial taxon and a factor - positive or negative. If a factor is categorical (for example, group), it is first decomposed into several factors - one per each value/group. Each of these is viewed as a separate factor relative to the first group (sorted by alphabet).

Nothing to show

Data filtration summary

Information about filtration of factors and features during the analysis

Metadata after NAs removement

Metadata after removement of NAs, factors with unique or all distinct values

Download metadata_after_na_filtration.csv

Excluded features

Model details

trait state
distribution gaussian
formula feature_abundance ~ VE_status + (1|a_b)
number of samples 17
transform arcsin(sqrt)

Taxa co-occurence analysis

Individual microbial taxa for which relative abundance is significantly different between two groups are identified.

Generalized linear mixed effect model

A generalized mixed effects linear model is fitted for each taxon to identify associations with each factor from metadata. If on the average there is >50 samples per each fixed factor coefficient then a zero-inflated negative binomial distribution family is used; in other cases - a negative binomial one. Rare taxa are excluded from the analysis (a taxon must be present in at least 10% of the samples at the level of >0.2%). Multiple testing adjustment is performed using Benjamini–Hochberg procedure. The information about distribution family, terms of the model and sample size is displayed in "Model details" section.

Significant results

The column 'coefficient' contains the value of linear model coefficient. Its sign shows the direction of association between a microbial taxon and a factor - positive or negative. If a factor is categorical (for example, group), it is first decomposed into several factors - one per each value/group. Each of these is viewed as a separate factor relative to the first group (sorted by alphabet).

VE_status

taxon taxa level covariate group, compared with coefficient p-value adjusted p-value
coop_6 cooperatives VE_statusyes no 4.401757 0.000498 0.00263
coop_8 cooperatives VE_statusyes no 1.192037 0.000657 0.00263

Data filtration summary

Information about filtration of factors and features during the analysis

Metadata after NAs removement

Metadata after removement of NAs, factors with unique or all distinct values

Download metadata_after_na_filtration.csv

Excluded features

Model details

trait state
distribution negative binomial
formula feature_abundance ~ VE_status + (1|a_b)
link function log
number of samples 17

Alpha-diversity

Linear mixed effect model is applied to find associations of alpha-diversity with each factor from metadata. Normality of the residuals is tested using Shapiro-Wilk test; if p < 0.05 then the results of linear mixed effects model may be unreliable.

Model details

parameter value
number of samples 17
formula alpha_diversity ~ VE_status + (1|a_b)
Shapiro-Wilk test for residuals, p-value 0.854

Summary

covariate Estimate 2.5_ci 97.5_ci SE DF T-stat p-value Sig
(Intercept) 6.146759 5.930268 6.363251 0.110457 15.0 55.648433 8.525863e-19 ***
VE_statusyes 0.074708 -0.289702 0.439118 0.185927 15.0 0.401814 6.934876e-01

knb_interactive: 2.0.2
datalab: 3.10.0
knb_lib: 4.8.50