Last updated: 2022-01-25

Checks: 2 0

Knit directory: Embryoid_Body_Pilot_Workflowr/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 2208c96. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.Rhistory
    Ignored:    output/.Rhistory

Untracked files:
    Untracked:  GSE122380_raw_counts.txt.gz
    Untracked:  UTF1_plots.Rmd
    Untracked:  analysis/DoubletFinderTest.Rmd
    Untracked:  analysis/IntegrateReference_SCTregressCaoPlusScHCL_5newAndOriginal3.Rmd
    Untracked:  analysis/OLD/
    Untracked:  analysis/child/
    Untracked:  build_refint_5new.R
    Untracked:  build_refint_scale.R
    Untracked:  build_refint_sct.R
    Untracked:  build_stuff.R
    Untracked:  build_varpart_sc.R
    Untracked:  code/.ipynb_checkpoints/
    Untracked:  code/CellRangerPreprocess.Rmd
    Untracked:  code/GEO_processed_data.Rmd
    Untracked:  code/GEO_processed_data_additionalLines.Rmd
    Untracked:  code/PowerAnalysis_NoiseRatio.ipynb
    Untracked:  code/Rplots.pdf
    Untracked:  code/Untitled.ipynb
    Untracked:  code/Untitled1.ipynb
    Untracked:  data/HCL_Fig1_adata.h5ad
    Untracked:  data/HCL_Fig1_adata.h5seurat
    Untracked:  data/dge/
    Untracked:  data/dge_raw_data.tar.gz
    Untracked:  data/ref.expr.rda
    Untracked:  figure/
    Untracked:  output/5NEWLINES.BroadCellTypeCatAssignment.basedonclustersres0.15.csv
    Untracked:  output/5NEWLINES.Frequency.MostCommonAnnotation.FiveNearestRefCells.csv
    Untracked:  output/5NEWLINES.MostCommonAnnotation.FiveNearestRefCells.csv
    Untracked:  output/5NEWLINES.NearestReferenceCell.Cao.hESC.EuclideanDistanceinHarmonySpace.csv
    Untracked:  output/5NEWLINES.NearestReferenceCell.Cao.hESC.FrequencyofEachAnnotation.csv
    Untracked:  output/5newlines.merge.all.SCTwRegressOrigIdent.Harmony.rds
    Untracked:  output/CR_sampleQCrds/
    Untracked:  output/CaoEtAl.Obj.CellsOfAllClusters.ProteinCodingGenes.rds
    Untracked:  output/CaoEtAl.Obj.rds
    Untracked:  output/ClusterInfo_res0.1.csv
    Untracked:  output/DGELists/
    Untracked:  output/DoubletFinderTestSweepResults.RData
    Untracked:  output/DoubletFinderTestWITHknowndoublets.RData
    Untracked:  output/DownSampleVarPart.rds
    Untracked:  output/FiveNewLinesBarcodes.csv
    Untracked:  output/Frequency.MostCommonAnnotation.FiveNearestRefCells.csv
    Untracked:  output/GEOsubmissionProcessedFiles/
    Untracked:  output/GEOsubmission_additionalLines/
    Untracked:  output/GeneLists_by_minPCT/
    Untracked:  output/MostCommonAnnotation.FiveNearestRefCells.csv
    Untracked:  output/NEW_Pseudobulk_Limma_res0.1_OnevAllTopTables.csv
    Untracked:  output/NEW_Pseudobulk_Limma_res0.5_OnevAllTopTables.csv
    Untracked:  output/NEW_Pseudobulk_Limma_res0.8_OnevAllTopTables.csv
    Untracked:  output/NEW_Pseudobulk_Limma_res1_OnevAllTopTables.csv
    Untracked:  output/NearestReferenceCell.Cao.hESC.EuclideanDistanceinHarmonySpace.csv
    Untracked:  output/NearestReferenceCell.Cao.hESC.FrequencyofEachAnnotation.csv
    Untracked:  output/NearestReferenceCell.SCTregressRNAassay.Cao.hESC.EuclideanDistanceinHarmonySpace.csv
    Untracked:  output/NearestReferenceCell.SCTregressRNAassay.Cao.hESC.FrequencyofEachAnnotation.csv
    Untracked:  output/NewAndOriginal.merge.all.SCTwRegressOrigIdent.Harmony.rds
    Untracked:  output/Pseudobulk_Limma_res0.1_OnevAllTopTables.csv
    Untracked:  output/Pseudobulk_Limma_res0.1_OnevAll_top10Upregby_adjP.csv
    Untracked:  output/Pseudobulk_Limma_res0.1_OnevAll_top10Upregby_logFC.csv
    Untracked:  output/Pseudobulk_Limma_res0.5_OnevAllTopTables.csv
    Untracked:  output/Pseudobulk_Limma_res0.8_OnevAllTopTables.csv
    Untracked:  output/Pseudobulk_Limma_res1_OnevAllTopTables.csv
    Untracked:  output/Pseudobulk_VarPart.ByCluster.Res0.1.rds
    Untracked:  output/ResidualVariances_fromDownSampAnalysis.csv
    Untracked:  output/SingleCell_VariancePartition_RNA_Res0.1_minPCT0.2.rds
    Untracked:  output/SingleCell_VariancePartition_Res0.1_minPCT0.2.rds
    Untracked:  output/SingleCell_VariancePartition_SCT_Res0.1_minPCT0.2.rds
    Untracked:  output/TopicModelling_k10_top10drivergenes.byBeta.csv
    Untracked:  output/TopicModelling_k6_top10drivergenes.byBeta.csv
    Untracked:  output/TopicModelling_k6_top15drivergenes.byZ.csv
    Untracked:  output/TranferredAnnotations_ReferenceInt_JustEarlyEcto.csv
    Untracked:  output/TranferredAnnotations_ReferenceInt_JustEndoderm.csv
    Untracked:  output/TranferredAnnotations_ReferenceInt_JustMeso.csv
    Untracked:  output/TranferredAnnotations_ReferenceInt_JustNeuralCrest.csv
    Untracked:  output/TranferredAnnotations_ReferenceInt_JustNeuron.csv
    Untracked:  output/TranferredAnnotations_ReferenceInt_JustPluripotent.csv
    Untracked:  output/VarPart.ByCluster.Res0.1.rds
    Untracked:  output/azimuth/
    Untracked:  output/downsamp_10800cells_10subreps_medianexplainedbyresiduals_varpart_PsB.rds
    Untracked:  output/downsamp_16200cells_10subreps_medianexplainedbyresiduals_varpart_PsB.rds
    Untracked:  output/downsamp_21600cells_10subreps_medianexplainedbyresiduals_varpart_PsB.rds
    Untracked:  output/downsamp_2700cells_10subreps_medianexplainedbyresiduals_varpart_PsB.rds
    Untracked:  output/downsamp_2700cells_10subreps_medianexplainedbyresiduals_varpart_scres.rds
    Untracked:  output/downsamp_5400cells_10subreps_medianexplainedbyresiduals_varpart_PsB.rds
    Untracked:  output/downsamp_7200cells_10subreps_medianexplainedbyresiduals_varpart_PsB.rds
    Untracked:  output/fasttopics/
    Untracked:  output/figs/
    Untracked:  output/merge.Cao.SCTwRegressOrigIdent.rds
    Untracked:  output/merge.all.SCTwRegressOrigIdent.Harmony.rds
    Untracked:  output/merged.SCT.counts.matrix.rds
    Untracked:  output/merged.raw.counts.matrix.rds
    Untracked:  output/mergedObjects/
    Untracked:  output/pdfs/
    Untracked:  output/sampleQCrds/
    Untracked:  output/splitgpm_gsea_results/
    Untracked:  publish_stuff1.R
    Untracked:  publish_stuff2.R
    Untracked:  slurm-14911987.out

Unstaged changes:
    Deleted:    analysis/IntegrateAnalysis.afterFilter.HarmonyBatch.Rmd
    Deleted:    analysis/IntegrateAnalysis.afterFilter.HarmonyBatchSampleIDindividual.Rmd
    Deleted:    analysis/IntegrateAnalysis.afterFilter.NOHARMONYjustmerge.Rmd
    Deleted:    analysis/IntegrateAnalysis.afterFilter.SCTregressBatchIndividual.Rmd
    Deleted:    analysis/IntegrateAnalysis.afterFilter.SCTregressBatchIndividualHarmonyBatchindividual.Rmd
    Deleted:    analysis/RunscHCL_HarmonyBatchInd.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/index.Rmd) and HTML (docs/index.html) files. If you've configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 2208c96 KLRhodes 2022-01-25 Add eLife DOI
html ffe1aab KLRhodes 2022-01-25 Build site.
Rmd 49176b1 KLRhodes 2022-01-25 README.md
html 014c890 KLRhodes 2021-12-08 Build site.
Rmd 4236ad8 KLRhodes 2021-12-08 update site with new analysis and aesthetically updated figs
html caf406f KLRhodes 2021-12-08 Build site.
Rmd 6ef914a KLRhodes 2021-12-08 update site with new analysis and aesthetically updated figs
html 6ca558e KLRhodes 2021-12-08 Build site.
html be8515f KLRhodes 2021-07-05 Build site.
html b3bdaa0 KLRhodes 2021-07-05 Build site.
Rmd 8bc36eb KLRhodes 2021-07-05 wflow_publish("analysis/index.Rmd")
html 267d6ae KLRhodes 2021-07-05 Build site.
Rmd 7a07574 KLRhodes 2021-07-05 wflow_publish("analysis/index.Rmd")
html f46667d KLRhodes 2021-07-05 Build site.
Rmd 4805034 KLRhodes 2021-07-05 wflow_publish("analysis/index.Rmd")
html be3f766 KLRhodes 2021-07-05 Build site.
Rmd 8c34ca9 KLRhodes 2021-07-05 wflow_publish("analysis/index.Rmd")
html ba90a38 KLRhodes 2021-07-05 Build site.
Rmd 4123658 KLRhodes 2021-07-05 wflow_publish("analysis/index.Rmd")
html 69e2824 KLRhodes 2021-07-05 Build site.
Rmd 74c2914 KLRhodes 2021-07-05 wflow_publish("analysis/index.Rmd")
html 49d5ea0 KLRhodes 2020-10-19 Build site.
Rmd 1638e40 KLRhodes 2020-10-19 wflow_publish("analysis/index.Rmd")
html d1ee77e KLRhodes 2020-10-19 Build site.
Rmd 747f854 KLRhodes 2020-10-19 wflow_publish("analysis/index.Rmd")
html 6673196 KLRhodes 2020-10-19 Build site.
Rmd c2b8b19 KLRhodes 2020-10-19 wflow_publish("analysis/index.Rmd")
html a75233d KLRhodes 2020-10-19 Build site.
Rmd 3b424e0 KLRhodes 2020-10-19 wflow_publish("analysis/index.Rmd")
html 06d1fce KLRhodes 2020-10-19 Build site.
Rmd 1abb092 KLRhodes 2020-10-19 wflow_publish("analysis/index.Rmd")
html bae15ab KLRhodes 2020-10-19 Build site.
Rmd c4d03aa KLRhodes 2020-10-19 wflow_publish("analysis/index.Rmd")
html 968ead0 KLRhodes 2020-10-19 Build site.
Rmd 77fd6c3 KLRhodes 2020-10-19 wflow_publish("analysis/index.Rmd")
html 51b43d3 KLRhodes 2020-10-19 Build site.
Rmd 2fc2a64 KLRhodes 2020-10-19 wflow_publish("analysis/index.Rmd")
html 40eb61e KLRhodes 2020-10-19 Build site.
Rmd 5af5a73 KLRhodes 2020-10-19 wflow_publish("analysis/index.Rmd")
html dcceb2e KLRhodes 2020-08-31 Build site.
html 20aa37b KLRhodes 2020-08-31 Build site.
Rmd e50c78b KLRhodes 2020-08-31 wflow_publish("analysis/index.Rmd")
html 63a84ab KLRhodes 2020-08-14 Build site.
Rmd 7ccb0e1 KLRhodes 2020-08-14 wflow_publish("analysis/index.Rmd")
html c780d5b KLRhodes 2020-08-14 Build site.
Rmd c29c301 KLRhodes 2020-08-14 wflow_publish("analysis/index.Rmd")
html e0acf54 KLRhodes 2020-08-14 Build site.
Rmd a7fd3ea KLRhodes 2020-08-14 wflow_publish("analysis/index.Rmd")
html 40a7b23 KLRhodes 2020-08-14 Build site.
Rmd f36c13d KLRhodes 2020-08-14 wflow_publish("analysis/index.Rmd")
html 7055bce KLRhodes 2020-08-14 Build site.
Rmd 8fcb48a KLRhodes 2020-08-14 wflow_publish("analysis/index.Rmd")
html 508974d KLRhodes 2020-08-14 Build site.
Rmd 7a220f3 KLRhodes 2020-08-14 wflow_publish("analysis/index.Rmd")
Rmd d9b626c KLRhodes 2020-08-14 wflow_git_commit(all = TRUE)
html d9b626c KLRhodes 2020-08-14 wflow_git_commit(all = TRUE)
html 218f095 KLRhodes 2020-08-04 Build site.
Rmd 87c7292 KLRhodes 2020-08-04 Start workflowr project.

Welcome! This is a research website detailing analyses analyses performed on scRNA-seq data from human embryoid bodies, aggregates of spontaneously differentiating cells generated from induced pluripotent stem cells.

eLife article DOI: https://doi.org/10.7554/eLife.71361

Preprint on BioRxiv: https://www.biorxiv.org/content/10.1101/2021.06.16.448714v1

The overarching goals of this work are to:

  1. characterize tthe diversity of cell types resulting from this particular differentiation protocol
  2. Understand the contribution of biological and technical variation present in all levels of this this type of data i)What is the relative contribution of individual and replicate to variation in cell composition?
  1. What is the relative contribution of individual and replicate to variation in gene expression?
  2. What is the relative contribution of individual and replicate to variation in gene expression dynamics across distinct developmental trajectories?
  1. Evaluate the future prospects of this model system (EBs + single cell) for identification of dynamic QTLs

Study Design:

Embryoid bodies were formed from 3 human iPSCs lines in 3 replicates. Three weeks after formation, EBs were dissociated and scRNA-seq data was collected using the 10x platform. After quality control and filtering, this dataset consists of 42,488 cells.

We later collected data from 5 additional lines in a single replicate. This data was used to evaluate the robustness of EB cell type composition among YRI lines.

Alignment and Preprocessing

See full alignment and preprocessing pipeline here: https://github.com/kennethabarr/HumanChimp

Run EmptyDrops, Add Metadata to Seurat object from each 10x lane

See code directory: https://github.com/KLRhodes/Embryoid_Body_Pilot_Workflowr/blob/master/code/EB.getHumanMetadata.Rmd

Merging and integration, seurat clustering

Reference Integration

Here, we integrate our EB data with 1) scRNA-seq data from human fetal tissues (Cao et al. 2020) and with Embryoid body cells and human embryonic stem cells from Human Cell Landscape (Han et al. 2020)

We then checked that the integration and annotation procedure was robust by subsetting our EB data to only cells for a particular type and re-integrated to check that most cells were annotated the same as in the full integration.

Topic Modelling with FastTopics

Topic modelling Scripts can be found in the code directory.

Variance Partition (to determine the relative contribution of Cluster, Batch, and Individual to variation in gene expression)

Down Sampling and Power Analysis

Trajectory Inference with Scanpy/PAGA and Identification/Exploration of Dynamic Gene Module with Split-GPM

See Josh Popp's GitHub: https://github.com/jmp448/ebpilot

Additional Processing and Analyses that were tested but ultimately not included in the paper:

Other integrations w/ and w/out harmony:

Cell type identification using scHCL