Jeffrey Rosenfeld, PhD, MBA 🚲
New York City Metropolitan Area
5K followers
500+ connections
About
20+ years of progressive leadership in Genomics, Bioinformatics, BioTech, and Laboratory…
Services
Articles by Jeffrey
Contributions
Activity
-
Highlighting my previous collegue's newly updated consulting and partnership portfolio for bioinformatics solutions! If this is something in your…
Highlighting my previous collegue's newly updated consulting and partnership portfolio for bioinformatics solutions! If this is something in your…
Liked by Jeffrey Rosenfeld, PhD, MBA 🚲
-
Last week, during a fireside chat at the Association for Molecular Pathology (AMP) annual meeting, Dr. Annette Kim, Leanne Deptula & I had the chance…
Last week, during a fireside chat at the Association for Molecular Pathology (AMP) annual meeting, Dr. Annette Kim, Leanne Deptula & I had the chance…
Liked by Jeffrey Rosenfeld, PhD, MBA 🚲
-
Excited to announce the acquisition of Bioreach, a state-of-the-art diagnostics lab. This acquisition will enable us to reduce cost while driving…
Excited to announce the acquisition of Bioreach, a state-of-the-art diagnostics lab. This acquisition will enable us to reduce cost while driving…
Liked by Jeffrey Rosenfeld, PhD, MBA 🚲
Experience
Education
Licenses & Certifications
Volunteer Experience
-
Research Associate
American Museum of Natural History
- Present 12 years 3 months
Science and Technology
•Specialized in De novo sequencing and annotation of invertebrate exotic genomes,
• Primary author and leader of the Bedbug genome project
•Analyzed early evolutionary history of animals, constructed a Tree of Life, investigated dynamics of e-value and tree morphology for presence-absence phylogenies, and built Venninator pipeline for presence-absence phylogenetics
• Published numerous collaborative papers annually alongside Museum Curators. Authored textbook Phylogenomics: A Primer…•Specialized in De novo sequencing and annotation of invertebrate exotic genomes,
• Primary author and leader of the Bedbug genome project
•Analyzed early evolutionary history of animals, constructed a Tree of Life, investigated dynamics of e-value and tree morphology for presence-absence phylogenies, and built Venninator pipeline for presence-absence phylogenetics
• Published numerous collaborative papers annually alongside Museum Curators. Authored textbook Phylogenomics: A Primer, now published in its 2nd edition.
Publications
-
The impact of read length on quantification of differentially expressed genes and splice junction detection
Genome Biology
Background The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, people have assumed that longer reads are more informative and that paired-end reads produce better results than single-end reads. We used paired-end 101 bp reads and trimmed them to simulate different…
Background The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, people have assumed that longer reads are more informative and that paired-end reads produce better results than single-end reads. We used paired-end 101 bp reads and trimmed them to simulate different read lengths, and also separated the pairs to produce single-end reads. For each read-length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR. Results We found that, with the exception of 25 bp reads, there is little difference for the detection of differential expression regardless of the read length. Once single-end reads are at a length of 50 bp, the results do not change substantially for any level up to, and including, 100 bp paired-end. However, splice junction detection significantly improves as the read length increases with 100 bp paired-end showing the best performance. We performed the same analysis on two ENCODE samples and found consistent results confirming that our conclusions have broad application. Conclusions A researcher could save substantial resources by using 50 bp single-end reads for differential expression analysis instead of using longer reads. However, splicing detection is unquestionably improved by paired-end and longer reads. Therefore, an appropriate read length should be used based on the final goal of the study.
Other authorsSee publication -
Response to ‘pervasive sequence patents cover the entire human genome’-authors’ reply
Genome Medicine
In our previous work, we concluded that patents were claimed on 21 to 41% of human genes based on long (over 150 nucleotide) fragments, whereas for short (under 150 nucleotide) fragments, we showed their non-specificity meant that 100% of human genes had some portion patented. For this analysis, we relied on databases of DNA sequences included in patents provided by CAMBIA and the NCBI because we concluded that it is impractical to manually look through sequence files and patent applications to…
In our previous work, we concluded that patents were claimed on 21 to 41% of human genes based on long (over 150 nucleotide) fragments, whereas for short (under 150 nucleotide) fragments, we showed their non-specificity meant that 100% of human genes had some portion patented. For this analysis, we relied on databases of DNA sequences included in patents provided by CAMBIA and the NCBI because we concluded that it is impractical to manually look through sequence files and patent applications to determine whether a specific nucleotide sequence is covered by a patent. While the CAMBIA database includes all sequences present in claims, it does not distinguish between sequences that are specifically claimed from those sequences that are merely mentioned in the claims. Work by Graff et al. estimated that only 8,703 patents on naturally occurring DNA sequences are still in force. Of those, they estimate that only 3,535 (41%) are human, indicating that our previous conclusions may be too broad or could lead to legal conclusions that are based on an ‘incorrect view of the law’. Yet, subsequent analysis with the data from CAMBIA has shown many patents are still in force and leave legal ambiguity.
-
Inflammatory Monocytes Orchestrate Innate Antifungal Immunity in the Lung
PLOS Pathogens
Aspergillus fumigatus is an environmental fungus that causes invasive aspergillosis (IA) in immunocompromised patients. Although -CC-chemokine receptor-2 (CCR2) and Ly6C-expressing inflammatory monocytes (CCR2+Mo) and their derivatives initiate adaptive pulmonary immune responses, their role in coordinating innate immune responses in the lung remain poorly defined. Using conditional and antibody-mediated cell ablation strategies, we found that CCR2+Mo and monocyte-derived dendritic cells…
Aspergillus fumigatus is an environmental fungus that causes invasive aspergillosis (IA) in immunocompromised patients. Although -CC-chemokine receptor-2 (CCR2) and Ly6C-expressing inflammatory monocytes (CCR2+Mo) and their derivatives initiate adaptive pulmonary immune responses, their role in coordinating innate immune responses in the lung remain poorly defined. Using conditional and antibody-mediated cell ablation strategies, we found that CCR2+Mo and monocyte-derived dendritic cells (Mo-DCs) are essential for innate defense against inhaled conidia. By harnessing fluorescent Aspergillus reporter (FLARE) conidia that report fungal cell association and viability in vivo, we identify two mechanisms by which CCR2+Mo and Mo-DCs exert innate antifungal activity. First, CCR2+Mo and Mo-DCs condition the lung inflammatory milieu to augment neutrophil conidiacidal activity. Second, conidial uptake by CCR2+Mo temporally coincided with their differentiation into Mo-DCs, a process that resulted in direct conidial killing. Our findings illustrate both indirect and direct functions for CCR2+Mo and their derivatives in innate antifungal immunity in the lung.
-
Genome-wide association study implicates NDST3 in schizophrenia and bipolar disorder
Nature Communications
Schizophrenia and bipolar disorder are major psychiatric disorders with high heritability and overlapping genetic variance. Here we perform a genome-wide association study in an ethnically homogeneous cohort of 904 schizophrenia cases and 1,640 controls drawn from the Ashkenazi Jewish population. We identify a novel genome-wide significant risk locus at chromosome 4q26, demonstrating the potential advantages of this founder population for gene discovery. The top single-nucleotide polymorphism…
Schizophrenia and bipolar disorder are major psychiatric disorders with high heritability and overlapping genetic variance. Here we perform a genome-wide association study in an ethnically homogeneous cohort of 904 schizophrenia cases and 1,640 controls drawn from the Ashkenazi Jewish population. We identify a novel genome-wide significant risk locus at chromosome 4q26, demonstrating the potential advantages of this founder population for gene discovery. The top single-nucleotide polymorphism (SNP; rs11098403) demonstrates consistent effects across 11 replication and extension cohorts, totalling 23, 191 samples across multiple ethnicities, regardless of diagnosis (schizophrenia or bipolar disorder), resulting in Pmeta=9.49 × 10−12 (odds ratio (OR)=1.13, 95% confidence interval (CI): 1.08–1.17) across both disorders and Pmeta=2.67 × 10−8 (OR=1.15, 95% CI: 1.08–1.21) for schizophrenia alone. In addition, this intergenic SNP significantly predicts postmortem cerebellar gene expression of NDST3, which encodes an enzyme critical to heparan sulphate metabolism. Heparan sulphate binding is critical to neurite outgrowth, axon formation and synaptic processes thought to be aberrant in these disorders.
-
Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics
Science
Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations (“ultrasensitive”) and variants that are disruptive because of mechanistic effects…
Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations (“ultrasensitive”) and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, “motif-breakers”). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.
-
Crowdfunding genomics and bioinformatics
Genome Biology
An assessment of the use of crowd-funding for genetics companies. We used our start up Genome Liberty as an example.
-
Pervasive sequence patents cover the entire human genome
Genome Medicine
The scope and eligibility of patents for genetic sequences have been debated for decades, but a critical case regarding gene patents (Association of Molecular Pathologists v. Myriad Genetics) is now reaching the US Supreme Court. Recent court rulings have supported the assertion that such patents can provide intellectual property rights on sequences as small as 15 nucleotides (15mers), but an analysis of all current US patent claims and the human genome presented here shows that 15mer sequences…
The scope and eligibility of patents for genetic sequences have been debated for decades, but a critical case regarding gene patents (Association of Molecular Pathologists v. Myriad Genetics) is now reaching the US Supreme Court. Recent court rulings have supported the assertion that such patents can provide intellectual property rights on sequences as small as 15 nucleotides (15mers), but an analysis of all current US patent claims and the human genome presented here shows that 15mer sequences from all human genes match at least one other gene. The average gene matches 364 other genes as 15mers; the breast-cancer-associated gene BRCA1 has 15mers matching at least 689 other genes. Longer sequences (1,000 bp) still showed extensive cross-gene matches. Furthermore, 15mer-length claims from bovine and other animal patents could also claim as much as 84% of the genes in the human genome. In addition, when we expanded our analysis to full-length patent claims on DNA from all US patents to date, we found that 41% of the genes in the human genome have been claimed. Thus, current patents for both short and long nucleotide sequences are extraordinarily non-specific and create an uncertain, problematic liability for genomic medicine, especially in regard to targeted re-sequencing and other sequence diagnostic assays
-
Implication of a rare deletion at distal 16p11. 2 in schizophrenia
JAMA Psychiatry
Context: Large genomic copy number variations have been implicated as strong risk factors for schizophrenia. However, the rarity of these events has created challenges for the identification of further pathogenic loci, and extremely large samples are required to provide convincing replication.
Objective: To detect novel copy number variations that increase the susceptibility to schizophrenia by using 2 ethnically homogeneous discovery cohorts and replication in large…Context: Large genomic copy number variations have been implicated as strong risk factors for schizophrenia. However, the rarity of these events has created challenges for the identification of further pathogenic loci, and extremely large samples are required to provide convincing replication.
Objective: To detect novel copy number variations that increase the susceptibility to schizophrenia by using 2 ethnically homogeneous discovery cohorts and replication in large samples.
Design: Genetic association study of microarray data.
Setting: Samples of DNA were collected at 9 sites from different countries.
Participants: Two discovery cohorts consisted of 790 cases with schizophrenia and schizoaffective disorder and 1347 controls of Ashkenazi Jewish descent and 662 parent-offspring trios from Bulgaria, of which the offspring had schizophrenia or schizoaffective disorder. Replication data sets consisted of 12 398 cases and 17 945 controls.
Main Outcome Measures: Statistically increased rate of specific copy number variations in cases vs controls.
Results: One novel locus was implicated: a deletion at distal 16p11.2, which does not overlap the proximal 16p11.2 locus previously reported in schizophrenia and autism. Deletions at this locus were found in 13 of 13 850 cases (0.094%) and 3 of 19 954 controls (0.015%) (odds ratio, 6.25 [95% CI, 1.78-21.93]; P = .001, Fisher exact test).
Conclusions: Deletions at distal 16p11.2 have been previously implicated in developmental delay and obesity. The region contains 9 genes, several of which are implicated in neurological diseases, regulation of body weight, and glucose homeostasis. A telomeric extension of the deletion, observed in about half the cases but no controls, potentially implicates an additional 8 genes. Our findings add a new locus to the list of copy number variations that increase the risk for development of schizophrenia.
-
A Primer on Phylogenomics
Taylor and Francis
A book for advanced undergraduate students or beginning graduate students introducing them to the basics of phylogenetics along with DNA sequencing, microarrays, systems biology and genomics
Other authors -
An integrated map of genetic variation from 1,092 human genomes
Nature
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million…
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
-
Limitations of the Human Reference Genome for Personalized Genomics
PLOS One
Data from the 1000 genomes project (1KGP) and Complete Genomics (CG) have dramatically increased the numbers of known genetic variants and challenge several assumptions about the reference genome and its uses in both clinical and research settings. Specifically, 34% of published array-based GWAS studies for a variety of diseases utilize probes that overlap unanticipated single nucleotide polymorphisms (SNPs), indels, or structural variants. Linkage disequilibrium (LD) block length depends on…
Data from the 1000 genomes project (1KGP) and Complete Genomics (CG) have dramatically increased the numbers of known genetic variants and challenge several assumptions about the reference genome and its uses in both clinical and research settings. Specifically, 34% of published array-based GWAS studies for a variety of diseases utilize probes that overlap unanticipated single nucleotide polymorphisms (SNPs), indels, or structural variants. Linkage disequilibrium (LD) block length depends on the numbers of markers used, and the mean LD block size decreases from 16 kb to 7 kb,when HapMap-based calculations are compared to blocks computed from1KGP data. Additionally, when 1KGP and CG variants are compared, 19% of the single nucleotide variants (SNVs) reported from common genomes are unique to one dataset; likely a result of differences in data collection methodology, alignment of reads to the reference genome, and variant-calling algorithms. Together these observations indicate that current research resources and informatics methods do not adequately account for the high level of variation that already exists in the human population and significant efforts are needed to create resources that can accurately assess personal genomics for health, disease, and predict treatment outcomes.
-
Mycobacterium tuberculosis spoligotypes that may derive from mixed strain infections are revealed by a novel computational approach
nfection, Genetics and Evolution : Journal of Molecular Epidemiology and Evolutionary Genetics in Infectious Diseases
Global control of tuberculosis is increasingly dependent on rapid and accurate genetic typing of Mycobacteriumtuberculosis. Spoligotyping is a first-line genotypic fingerprinting method for M.tuberculosis isolates. An international online database (SpolDB4) of spoligotype patterns has been established wherein a clustered pattern (shared by ≥2 isolates) is designated a shared international type (SIT). Dual infections of single patients by distinct strains of M. tuberculosis is increasingly…
Global control of tuberculosis is increasingly dependent on rapid and accurate genetic typing of Mycobacteriumtuberculosis. Spoligotyping is a first-line genotypic fingerprinting method for M.tuberculosis isolates. An international online database (SpolDB4) of spoligotype patterns has been established wherein a clustered pattern (shared by ≥2 isolates) is designated a shared international type (SIT). Dual infections of single patients by distinct strains of M. tuberculosis is increasingly reported in high tuberculosis incidence areas, raising the possibility of false composite spoligotype patterns if performed upon mixed strain samples. A computational approach was applied to SpolDB4 and found that of the reported 1939 SITs, 54% could be a composite of two other SITs. Although many of the spoligotypes listed in SpolDB4 may be the product of admixing, the majority of patterns were reported with a corresponding low case frequency and so the effect of misclassification upon database integrity with these is likely minimal. Phylogenetic analysis of the five SITs most prone to be a composite demonstrated that these patterns designate nodes from which the ramifications of large families T, MANU, LAM, and EAI emerged. We illustrate how geographic context may indicate when an observed pattern could be the product of mixed infection. Importantly, when one of the most composite-prone SITs is obtained, further genetic testing by alternate methods is prudent to rule-out mixed infection, especially in high tuberculosis prevalence areas. These findings have broad practical implications for tuberculosis control and surveillance, as well as highlight the utility of a computational approach in providing solutions to biological questions in which the information can be digitalized.
-
E value cutoff and eukaryotic genome content phylogenetics
Molecular Phylogenetics and Evolution
Genome content analysis has been used as a source of phylogenetic information in large prokaryotic tree of life studies. Recently the sequencing of many eukaryotic genomes has allowed for the similar use of genome content analysis for these organisms too. In this communication we examine the utility of genome content analysis for recovering phylogenetic patterns in several eukaryotic groups. By constructing multiple matrices using different e value cutoffs we examine the dynamics of altering…
Genome content analysis has been used as a source of phylogenetic information in large prokaryotic tree of life studies. Recently the sequencing of many eukaryotic genomes has allowed for the similar use of genome content analysis for these organisms too. In this communication we examine the utility of genome content analysis for recovering phylogenetic patterns in several eukaryotic groups. By constructing multiple matrices using different e value cutoffs we examine the dynamics of altering the e value cutoff on five eukaryotic genome data sets. Our analysis indicates that the e value cutoff that is used as a criterion in the construction of the genome content matrix is a critical factor in both the accuracy and information content of the analysis. Strikingly, genome content by itself is not a reliable or accurate source of characters for phylogenetic analysis of the taxa in the five data sets we analyzed. We discuss two problems--small genome attraction and genome duplications as being involved in the rather poor performance of genome content data in recovering eukaryotic phylogeny.
-
Random roots and lineage sorting
Molecular Phylogenetics and Evolution
Lineage sorting has been suggested as a major force in generating incongruent phylogenetic signal when multiple gene partitions are examined. The degree of lineage sorting can be estimated using the coalescent process and simulation studies have also pointed to a major role for incomplete lineage sorting as a factor in phylogenetic inference. Some recent empirical studies point to an extreme role for this phenomenon with up to 50-60% of all informative genes showing incongruence as a result of…
Lineage sorting has been suggested as a major force in generating incongruent phylogenetic signal when multiple gene partitions are examined. The degree of lineage sorting can be estimated using the coalescent process and simulation studies have also pointed to a major role for incomplete lineage sorting as a factor in phylogenetic inference. Some recent empirical studies point to an extreme role for this phenomenon with up to 50-60% of all informative genes showing incongruence as a result of lineage sorting. Here, we examine seven large multi-partition genome level data sets over a large range of taxonomic representation. We took the approach of examining outgroup choice and its impact on tree topology, by swapping outgroups into analyses with successively larger genetics distances to the ingroup. Our results indicate a linear relationship of outgroup distance with incongruence in the data sets we examined suggesting a strong random rooting effect. In addition, we attempted to estimate the degree of lineage sorting in several large genome level data sets by examining triads of very closely related taxa. This exercise resulted in much lower estimates of incongruent genes that could be the result of lineage sorting, with an overall estimate of around 10% of the total number of genes in a genome showing incongruence as a result of true lineage sorting. Finally we examined the behavior of likelihood and parsimony approaches on the random rooting phenomenon. Likelihood tends to stabilize incongruence as outgroups get further and further away from the ingroup. In one extreme case, likelihood overcompensates for sequence divergence but increases random rooting causing long branch repulsion.
-
Implications for health and disease in the genetic signature of the Ashkenazi Jewish population
Genome Biology
BACKGROUND: Relatively small, reproductively isolated populations with reduced genetic diversity may have advantages for genomewide association mapping in disease genetics. The Ashkenazi Jewish population represents a unique population for study based on its recent (< 1,000 year) history of a limited number of founders, population bottlenecks and tradition of marriage within the community. We genotyped more than 1,300 Ashkenazi Jewish healthy volunteers from the Hebrew University Genetic…
BACKGROUND: Relatively small, reproductively isolated populations with reduced genetic diversity may have advantages for genomewide association mapping in disease genetics. The Ashkenazi Jewish population represents a unique population for study based on its recent (< 1,000 year) history of a limited number of founders, population bottlenecks and tradition of marriage within the community. We genotyped more than 1,300 Ashkenazi Jewish healthy volunteers from the Hebrew University Genetic Resource with the Illumina HumanOmni1-Quad platform. Comparison of the genotyping data with that of neighboring European and Asian populations enabled the Ashkenazi Jewish-specific component of the variance to be characterized with respect to disease-relevant alleles and pathways.
RESULTS:Using clustering, principal components, and pairwise genetic distance as converging approaches, we identified an Ashkenazi Jewish-specific genetic signature that differentiated these subjects from both European and Middle Eastern samples. Most notably, gene ontology analysis of the Ashkenazi Jewish genetic signature revealed an enrichment of genes functioning in transepithelial chloride transport, such as CFTR, and in equilibrioception, potentially shedding light on cystic fibrosis, Usher syndrome and other diseases over-represented in the Ashkenazi Jewish population. Results also impact risk profiles for autoimmune and metabolic disorders in this population. Finally, residual intra-Ashkenazi population structure was minimal, primarily determined by class 1 MHC alleles, and not related to host country of origin.
CONCLUSIONS: The Ashkenazi Jewish population is of potential utility in disease-mapping studies due to its relative homogeneity and distinct genomic signature. Results suggest that Ashkenazi-associated disease genes may be components of population-specific genomic differences in key functional pathways. -
Condensin dysfunction in human cells induces nonrandom chromosomal breaks in anaphase, with distinct patterns for both unique and repeated genomic regions
Chromosoma
Condensin complexes are essential for chromosome condensation and segregation in mitosis, while condensin dysfunction, among other pathways leading to chromosomal bridging in mitosis, may play a role in tumor genomic instability, including recently discovered chromotripsis. To characterize potential double-strand breaks specifically occurring in late anaphase, human chromosomes depleted of condensin were analyzed by γ-H2AX ChIP followed by high-throughput sequencing (ChIP-seq). In…
Condensin complexes are essential for chromosome condensation and segregation in mitosis, while condensin dysfunction, among other pathways leading to chromosomal bridging in mitosis, may play a role in tumor genomic instability, including recently discovered chromotripsis. To characterize potential double-strand breaks specifically occurring in late anaphase, human chromosomes depleted of condensin were analyzed by γ-H2AX ChIP followed by high-throughput sequencing (ChIP-seq). In condensin-depleted cells, the nonrepeated parts of the genome were shown to contain distinct γ-H2AX enrichment zones 75% of which overlapped with known hemizygous deletions in cancers. Furthermore, some tandemly repeated DNA sequences, analyzed separately from the rest of the genome, showed significant γ-H2AX enrichment in condensin-depleted anaphases. The most commonly occurring targets of such enrichment included simple repeats, centromeric satellites, and rDNA. The two latter categories indicate that acrocentric human chromosomes are especially susceptible to breaks upon condensin deficiency. The genomic regions that are specifically destabilized upon condensin dysfunction may constitute a condensin-specific chromosome destabilization pattern.
-
The mega‐matrix tree of life: using genome‐scale horizontal gene transfer and sequence evolution data as information about the vertical history of life
Cladistics
Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome-based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information…
Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome-based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information available in fully sequenced genomes can recover a reasonable ToL. Consequently, we used empirically defined gene homology definitions from a previous study that delineate xenologous gene families (gene families derived from a common transfer event) to generate a massively concatenated, combined-data ToL matrix derived from 323 404 translated open reading frames arranged into 12 381 gene homologue groups coded as amino acid data and 63 336, 64 105, 65 153, 66 922 and 67 109 gene homologue groups coded as gene presence/absence data for 166 fully sequenced genomes. This whole-genome gene presence/absence and amino acid sequence ToL data matrix is composed of 4867 184 characters (a combined data-type mega-matrix). Phylogenetic analysis of this mega-matrix yielded a fully resolved ToL that classifies all three commonly accepted domains of life as monophyletic and groups most taxa in traditionally recognized locations with high support. Most importantly, these results corroborate the existence of a common evolutionary history for these taxa present in both data types that is evident only when these data are analysed in combination.
-
Novel multi-nucleotide polymorphisms in the human genome characterized by whole genome and exome sequencing
Nucleic Acids Research
Genomic sequence comparisons between individuals are usually restricted to the analysis of single nucleotide polymorphisms (SNPs). While the interrogation of SNPs is efficient, they are not the only form of divergence between genomes. In this report, we expand the scope of polymorphism detection by investigating the occurrence of double nucleotide polymorphisms (DNPs) and triple nucleotide polymorphisms (TNPs), in which two or three consecutive nucleotides are altered compared to the reference…
Genomic sequence comparisons between individuals are usually restricted to the analysis of single nucleotide polymorphisms (SNPs). While the interrogation of SNPs is efficient, they are not the only form of divergence between genomes. In this report, we expand the scope of polymorphism detection by investigating the occurrence of double nucleotide polymorphisms (DNPs) and triple nucleotide polymorphisms (TNPs), in which two or three consecutive nucleotides are altered compared to the reference sequence. We have found such DNPs and TNPs throughout two complete genomes and eight exomes. Within exons, these novel polymorphisms are over-represented amongst protein-altering variants; nearly all DNPs and TNPs result in a change in amino acid sequence and, in some cases, two adjacent amino acids are changed. DNPs and TNPs represent a potentially important new source of genetic variation which may underlie human disease and they should be included in future medical genetics studies. As a confirmation of the damaging nature of xNPs, we have identified changes in the exome of a glioblastoma cell line that are important in glioblastoma pathogenesis. We have found a TNP causing a single amino acid change in LAMC2 and a TNP causing a truncation of HUWE1.
-
Investigating repetitively matching short sequencing reads: the enigmatic nature of H3K9me3
Epigenetics
Most histone modifications can easily be characterized as either activating or repressive. For example, histone3, lysine 4 trimethylation (H3K4me3) is generally considered a distinct sign of actively transcribed promoters while H3K27me3 is generally found at repressed genes. This is not the case for H3K9me3, the subject of this communication, which is a modification that has traditionally been considered a mark of constitutive heterochromatin, but has also been found in significant levels in…
Most histone modifications can easily be characterized as either activating or repressive. For example, histone3, lysine 4 trimethylation (H3K4me3) is generally considered a distinct sign of actively transcribed promoters while H3K27me3 is generally found at repressed genes. This is not the case for H3K9me3, the subject of this communication, which is a modification that has traditionally been considered a mark of constitutive heterochromatin, but has also been found in significant levels in expressed genes. We therefore sought to use new high-throughput genome-wide maps of H3K9me3 localization to investigate the conflicting hypotheses concerning the nature of this modification. Before we could accurately analyze the locations of H3K9me3 along the genome, and especially in repetitive locations, we developed a method for accurately utilizing short sequencing reads that do not map uniquely to a location in the genome. Investigating the locations of H3K9me3 along the genome allowed us to determine that, while there are high levels of H3K9me3 outside of genes, this modification is not absent from genes. Therefore, we suggest that H3K9me3 may have a role in chromatin organization rather than being directly related to gene expression. In addition, we have found that there is a need to include repetitively matching reads in any high-throughput sequencing experiment.
-
The use of high-throughput sequencing to investigate histone modifications in the non-coding portions of the human genome
NYU Dissertation
The higher order structure of eukaryotic chromosomes is complex, due to the fact that nearly six feet of DNA needs to be packaged into the nucleus of a cell. This packaging requires multiple levels of organization from the raw double helix up to a completely folded chromosome. The fundamental level of this organization is the wrapping of DNA around nucleosomes consisting of histone proteins. These histones can be post-transcriptionally modified through the addition of acetyl or methyl groups to…
The higher order structure of eukaryotic chromosomes is complex, due to the fact that nearly six feet of DNA needs to be packaged into the nucleus of a cell. This packaging requires multiple levels of organization from the raw double helix up to a completely folded chromosome. The fundamental level of this organization is the wrapping of DNA around nucleosomes consisting of histone proteins. These histones can be post-transcriptionally modified through the addition of acetyl or methyl groups to individual amino acid residues. I have investigated the enrichment of specific histone lysine methylation states throughout the human genome in hematopoietic cells.
-
Determination of enriched histone modifications in non-genic portions of the human genome
BMC Genomics
Background
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has recently been used to identify the modification patterns for the methylation and acetylation of many different histone tails in genes and enhancers.
Results
We have extended the analysis of histone modifications to gene deserts, pericentromeres and subtelomeres. Using data from human CD4+ T cells, we have found that each of these non-genic regions has a particular profile of histone…Background
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has recently been used to identify the modification patterns for the methylation and acetylation of many different histone tails in genes and enhancers.
Results
We have extended the analysis of histone modifications to gene deserts, pericentromeres and subtelomeres. Using data from human CD4+ T cells, we have found that each of these non-genic regions has a particular profile of histone modifications that distinguish it from the other non-coding regions. Different methylation states of H4K20, H3K9 and H3K27 were found to be enriched in each region relative to the other regions. These findings indicate that non-genic regions of the genome are variable with respect to histone modification patterns, rather than being monolithic. We furthermore used consensus sequences for unassembled centromeres and telomeres to identify the significant histone modifications in these regions. Finally, we compared the modification patterns in non-genic regions to those at silent genes and genes with higher levels of expression. For all tested methylations with the exception of H3K27me3, the enrichment level of each modification state for silent genes is between that of non-genic regions and expressed genes. For H3K27me3, the highest levels are found in silent genes.
Conclusion
In addition to the histone modification pattern difference between euchromatin and heterochromatin regions, as is illustrated by the enrichment of H3K9me2/3 in non-genic regions while H3K9me1 is enriched at active genes; the chromatin modifications within non-genic (heterochromatin-like) regions (e.g. subtelomeres, pericentromeres and gene deserts) are also quite different. -
Using whole genome presence/absence data to untangle function in 12 Drosophila genomes
Fly
The Drosophila 12 genome data set was used to construct whole genome, gene family presence/absence matrices using a broad range of E value cutoffs as criteria for gene family inclusion. The various matrices generated behave differently in phylogenetic analyses as a function of the e-value employed. Based on an optimality criterion that maximizes internal corroboration of information, we show that values of e-105 to e-125 extract the most internally consistent phylogenetic signal. Functional…
The Drosophila 12 genome data set was used to construct whole genome, gene family presence/absence matrices using a broad range of E value cutoffs as criteria for gene family inclusion. The various matrices generated behave differently in phylogenetic analyses as a function of the e-value employed. Based on an optimality criterion that maximizes internal corroboration of information, we show that values of e-105 to e-125 extract the most internally consistent phylogenetic signal. Functional class of most genes and gene families can be accurately determined based on the D. melanogaster genome annotation. We used the gene ontology (GO) system to create partitions based on gene function. Several measures of phylogenetic congruence (diagnosis, consistency, partitioned support , hidden support) for different higher and lower level GO categories, were used to mine the data set for genes and gene families that show strong agreement or disagreement with the overall combined phylogenetic hypothesis. We propose that measures of phylogenetic congruence can be used as criteria to identify loci with related GO terms that have a significant impact on cladogenesis.
-
Description of Freshwater Bacterial Assemblages from the Upper Paraná River Floodpulse System, Brazil
Microbial Ecology
Bacteria were identified from a large, seasonally flooded river (Paraná River, Brazil) and two floodplain habitats that were part of the same river system yet very different in nature: clearwater Garças Lagoon and the highly humic waters of Patos Lagoon. Bacterioplankton were collected during mid-summer (Jan. 2002) from water samples (2 l) filtered first through a 1.2-μm filter then a 0.2-μm membrane filter representing the particle-attached and free-living sub-communities, respectively. DNA…
Bacteria were identified from a large, seasonally flooded river (Paraná River, Brazil) and two floodplain habitats that were part of the same river system yet very different in nature: clearwater Garças Lagoon and the highly humic waters of Patos Lagoon. Bacterioplankton were collected during mid-summer (Jan. 2002) from water samples (2 l) filtered first through a 1.2-μm filter then a 0.2-μm membrane filter representing the particle-attached and free-living sub-communities, respectively. DNA was extracted from filters and purified and a 16S rRNA clone library established for each habitat. Over 300 clones were sequenced and checked for similarity to existing 16S sequences in GenBank using the BLAST algorithm with default parameters. Further classification of clones was done using a species “backbone” attachment followed by parsimony analysis. The majority (85%) of sequences, referred to here as operational taxonomic units (OTUs), were most similar to uncultured bacterium 16S sequences. OTUs from each Proteobacteria sub-phylum (α, β, γ, δ, ɛ) were present in the Upper Paraná River system, as well as members of the Bacteroidetes. The microbial assemblage from Patos Lagoon was least like other samples in that it had no Firmicutes present and was dominated by Actinobacteria. Verrucomicrobia OTUs were only found in the free-living assemblage. This study documents the presence of globally distributed phyla in Upper Paraná River and taxa unique to habitat and particle attachment.
-
Combinatorial patterns of histone acetylations and methylations in the human genome
Nature Genetics
Histones are characterized by numerous posttranslational modifications that influence gene transcription1,2. However, because of the lack of global distribution data in higher eukaryotic systems3, the extent to which gene-specific combinatorial patterns of histone modifications exist remains to be determined. Here, we report the patterns derived from the analysis of 39 histone modifications in human CD4+ T cells. Our data indicate that a large number of patterns are associated with promoters…
Histones are characterized by numerous posttranslational modifications that influence gene transcription1,2. However, because of the lack of global distribution data in higher eukaryotic systems3, the extent to which gene-specific combinatorial patterns of histone modifications exist remains to be determined. Here, we report the patterns derived from the analysis of 39 histone modifications in human CD4+ T cells. Our data indicate that a large number of patterns are associated with promoters and enhancers. In particular, we identify a common modification module consisting of 17 modifications detected at 3,286 promoters. These modifications tend to colocalize in the genome and correlate with each other at an individual nucleosome level. Genes associated with this module tend to have higher expression, and addition of more modifications to this module is associated with further increased expression. Our data suggest that these histone modifications may act cooperatively to prepare chromatin for transcriptional activation.
-
Reciprocal illumination in the gene content tree of life
Systematic Biology
Phylogenies based on gene content rely on statements of primary homology to characterize gene presence or absence. These statements (hypotheses) are usually determined by techniques based on threshold similarity or distance measurements between genes. This fundamental but problematic step can be examined by evaluating each homology hypothesis by the extent to which it is corroborated by the rest of the data. Here we test the effects of varying the stringency for making primary homology…
Phylogenies based on gene content rely on statements of primary homology to characterize gene presence or absence. These statements (hypotheses) are usually determined by techniques based on threshold similarity or distance measurements between genes. This fundamental but problematic step can be examined by evaluating each homology hypothesis by the extent to which it is corroborated by the rest of the data. Here we test the effects of varying the stringency for making primary homology statements using a range of similarity (e-value) cutoffs in 166 fully sequenced and annotated genomes spanning the tree of life. By evaluating each resulting data set with tree-based measurements of character consistency and information content, we find a set of homology statements that optimizes overall corroborration. The resulting data set produces well-resolved and well-supported trees of life and greatly ameliorates previously noted inconsistencies such as the misclassification of small genomes. The method presented here, which can be used to test any technique for recognizing primary homology, provides an objective framework for evaluating phylogenetic hypotheses and data sets for the tree of life. It also can serve as a technique for identifying well-corroborated sets of homologous genes for functional genomic applications.
-
ORFcurator: molecular curation of genes and gene clusters in prokaryotic organisms
Bioinformatics
The ability to detect clusters of functionally related genes in multiple microbial genomes has enormous potential for enhancing studies on gene function and microbial evolution. The staggering amount of new genome sequence data presents a largely untapped resource for gene cluster discovery. To date, gene cluster analysis has not been fully automated, and one must rely on manual, tedious and time-consuming manipulation of sequences. To facilitate accurate and rapid identification of conserved…
The ability to detect clusters of functionally related genes in multiple microbial genomes has enormous potential for enhancing studies on gene function and microbial evolution. The staggering amount of new genome sequence data presents a largely untapped resource for gene cluster discovery. To date, gene cluster analysis has not been fully automated, and one must rely on manual, tedious and time-consuming manipulation of sequences. To facilitate accurate and rapid identification of conserved gene clusters, we developed a database-driven web application, called ORFcurator. We used ORFcurator to find clusters containing any genes similar to those of the 14-gene Widespread Colonization Island of Actinobacillus actinomycetemcomitans. From 126 genomes, ORFcurator identified all 73 clusters previously determined by manual searching
Patents
-
Development of a single sperm sequencing assay for prediction of autism risk and as a substitute for manual sperm analysis
US S2017-067
This patent describes a method for using single-cell sequencing to test the RNA in sperm. We are hoping to refine the technique to be able to develop an assay that can be used to screen sperm to prevent genetic diseases in offspring
-
Single sperm gene expression and mutation anaylsis for prediction of diseases
US20200010896A1
The present disclosure is directed to methods for typing and characterizing sperm. The technology employs a sequencing-based method for detecting and measuring RNA transcripts from single sperm cells and the analysis of the sequencing data for the prediction of male parent contribution to autism.
Courses
-
Next-generation sequencing Workshop
-
Languages
-
Hebrew
-
More activity by Jeffrey
-
We're Hiring! Are you a motivated scientist with a passion for spatial biology and 3D analysis? Do you want to be part of a dynamic, fast-growing…
We're Hiring! Are you a motivated scientist with a passion for spatial biology and 3D analysis? Do you want to be part of a dynamic, fast-growing…
Liked by Jeffrey Rosenfeld, PhD, MBA 🚲
-
Interested in working at a startup? Voyant Bio, a San Diego company, is looking for #machinelearning #computationalbiologist Voyant Bio, a cancer…
Interested in working at a startup? Voyant Bio, a San Diego company, is looking for #machinelearning #computationalbiologist Voyant Bio, a cancer…
Liked by Jeffrey Rosenfeld, PhD, MBA 🚲
-
I’m reaching out to my network to share some exciting career opportunities! Are you, or someone you know, interested in any of the following…
I’m reaching out to my network to share some exciting career opportunities! Are you, or someone you know, interested in any of the following…
Liked by Jeffrey Rosenfeld, PhD, MBA 🚲
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More