Dr. Markus Schmidberger

Dr. Markus Schmidberger

County Kerry, Ireland
7K followers 500+ connections

About

Wondering how to unlock your organization's full potential? I help data leaders age 35-50…

Services

Articles by Dr. Markus

  • ServerLess BigData in Action

    ServerLess BigData in Action

    At Glomex GmbH I have build a big data platform on AWS to process billion of events per day. The architecture is based…

    2 Comments
  • 2016 was a crazy year for creating a new Glomex Data Platform

    2016 was a crazy year for creating a new Glomex Data Platform

    In 2016 I joined an amazing team at Glomex GmbH to create the best data platform ever. In April 2016 at AWS Summit…

Contributions

Activity

Join now to see all activity

Experience

  • Turtle Transformation Limited Graphic
  • -

    Ireland

  • -

    Glenbeigh, Co. Kerry, Ireland

  • -

    Europe

  • -

    Munich Area, Germany

  • -

    Germany

  • -

    Munich Area, Germany

  • -

    Munich Area, Germany

  • -

    Unterföhring, Munich

  • -

    Munich Area, Germany

  • -

    Munich Area, Germany

  • -

    Germany

  • -

    Berlin Area, Germany

  • -

    Munich Area, Germany

  • -

    Munich Area, Germany

  • -

    Greater Seattle Area

Education

  • Homodea - Veit Lindau

    -

    In one year, learn everything you need to take your life to a whole new level and be successful as an integral Life Trust Coach TM. Even if your goal is still fuzzy, in a year you will know exactly how to serve the world as a coach.

  • -

    This PhD thesis demonstrates the usefulness of the R language and parallel computing for biological research.

  • -

  • -

Licenses & Certifications

Publications

  • Beziehungsmagie für Männer: 100 Impulse für mehr Verbindung und Liebe

    Self Publishing

    Book is written in German!

    In der heutigen hektischen Welt sind starke und erfüllende Beziehungen von entscheidender Bedeutung. Dieses Arbeitsbuch ist deine Eintrittskarte zu einer erfüllenden Beziehung zu dir selbst und einer Partnerschaft, die auf Liebe, Verbindung und Magie aufbaut.

    Ein Buch für Männer, die nach Wegen suchen, ihre Beziehung zu sich selbst und ihrer Partnerin zu vertiefen und zu stärken. Dieses Buch bietet praxiserprobte Impulse und Schritte, um eine dauerhafte…

    Book is written in German!

    In der heutigen hektischen Welt sind starke und erfüllende Beziehungen von entscheidender Bedeutung. Dieses Arbeitsbuch ist deine Eintrittskarte zu einer erfüllenden Beziehung zu dir selbst und einer Partnerschaft, die auf Liebe, Verbindung und Magie aufbaut.

    Ein Buch für Männer, die nach Wegen suchen, ihre Beziehung zu sich selbst und ihrer Partnerin zu vertiefen und zu stärken. Dieses Buch bietet praxiserprobte Impulse und Schritte, um eine dauerhafte Verbindung aufzubauen. Von der Kommunikation über die Intimität bis hin zur Konfliktlösung wirst du entdecken, wie du eine Beziehung gestalten kannst, die nicht nur überlebt, sondern auch erblüht.

    In "Beziehungsmagie für Männer" findest du:
    * Für 50 Wochen je einen Persönlichkeitsentwicklungs- und einen Beziehungsimpuls. Die Veränderung beginnt bei dir selbst und du wirst eingeladen, eine Reise zu mehr Klarheit und Authentizität in deinen Beziehungen zu starten.
    * Bewährte Techniken für bessere Kommunikation: Lerne, effektiver zu kommunizieren, Konflikte zu bewältigen und Missverständnisse zu minimieren.
    * Praktische Übungen und Anleitungen: Schritt-für-Schritt-Anleitungen und Übungen, die dir dabei helfen, das Gelernte in die Tat umzusetzen und Veränderungen in deinen Beziehungen zu bewirken.
    * Anstöße zu inneren Prozessen, um eine tiefere Bindung aufbauen: Entdecke, wie du eine starke emotionale Verbindung zu dir selbst und deiner Partnerin aufbauen und deine Beziehung langfristig stärken kannst.

    "Beziehungsmagie für Männer" ist ein Buch, das nicht nur dein Leben, sondern auch deine Partnerschaft transformieren kann. Entdecke, wie du die Liebe, Verbindung und Magie in deiner Beziehung wiederherstellen und vertiefen kannst.

    Mache den ersten Schritt zu einer erfüllten Partnerschaft - heute.

    See publication
  • An eight-gene expression signature for the prediction of survival and time to treatment in chronic lymphocytic leukemia

    Leukemia

    The clinical course of chronic lymphocytic leukemia (CLL) is highly variable, ranging from slow progression and survival for several decades to rapidly progressive and chemotherapy-resistant disease with death within 1 year of diagnosis. The hierarchical model of common genomic aberrations determined by interphase fluorescence in situ hybridization (FISH) and the analysis of the mutational status of the immunoglobulin heavy-chain variable region genes (IGVH status) are broadly used molecular…

    The clinical course of chronic lymphocytic leukemia (CLL) is highly variable, ranging from slow progression and survival for several decades to rapidly progressive and chemotherapy-resistant disease with death within 1 year of diagnosis. The hierarchical model of common genomic aberrations determined by interphase fluorescence in situ hybridization (FISH) and the analysis of the mutational status of the immunoglobulin heavy-chain variable region genes (IGVH status) are broadly used molecular markers to predict the prognosis of CLL patients.

    Other authors
    See publication
  • Conceptual aspects of large meta-analyses with publicly available microarray data: A case study in oncology

    Bioinformatics and Biology Insights

    Abstract: Large public repositories of microarray experiments offer an abundance of biological data. It is of interest to use and to combine the available material to create new biological information and to develop a broader view on biological phenomena.

    Meta-analyses recombine similar information over a series of experiments to sketch scientific aspects which were not accessible by each of the single experiments. Meta-analysis of high-throughput experiments has to handle methodological…

    Abstract: Large public repositories of microarray experiments offer an abundance of biological data. It is of interest to use and to combine the available material to create new biological information and to develop a broader view on biological phenomena.

    Meta-analyses recombine similar information over a series of experiments to sketch scientific aspects which were not accessible by each of the single experiments. Meta-analysis of high-throughput experiments has to handle methodological as well as technical challenges. Methodological aspects concern the identification of homogeneous material which can be combined by appropriate statistical procedures. Technical challenges come from the data management of large amounts of high-dimensional data, long computation time, as well as the handling of the stored phenotype data.

    This paper compares in a meta-analysis of a large series of microarray experiments the interaction structure within selected pathways between different tumour entities. The feasibility of such a study is explored and a technical as well as a statistical framework for its completion is presented. Multiple obstacles were met during completion of this project. They are mainly related to the quality of the available data and influence the biological interpretation of the results derived.

    The sobering experience of our study asks for combined efforts to improve the data quality in public repositories of high-throughput data. The exploration of the available data in large meta-analyses is limited by incomplete documentation of essential aspects of experiments and studies, by technical deficiencies in the data stored, and by careless duplications of data.

    Other authors
    • Ulrich Mansmann
    • Sabine Lennert
    See publication
  • Hands-on Tutorial for Parallel Computing with R

    Springer - Computational Statistics

    Due to the increasing availability of powerful hardware resources, parallel computing is becoming an important issue, as a noticeable speedup may be achieved. The statistical programming language R allows for parallel computing on computer clusters as well as multicore systems through several packages. This tutorial gives a short, practical overview of four, in view of the authors, important packages for parallel computing in R, namely multicore, snow, snowfall and nws. First, the general…

    Due to the increasing availability of powerful hardware resources, parallel computing is becoming an important issue, as a noticeable speedup may be achieved. The statistical programming language R allows for parallel computing on computer clusters as well as multicore systems through several packages. This tutorial gives a short, practical overview of four, in view of the authors, important packages for parallel computing in R, namely multicore, snow, snowfall and nws. First, the general principle of parallelizing simple tasks is briefly illustrated based on a statistical cross-validation example. Afterwards, the usage of each of the introduced packages is being demonstrated on the example. Furthermore, we address some specific features of the packages and provide guidance for selecting an adequate package for the computing environment at hand.

    Other authors
    See publication
  • Indirect Comparison of Interaction Graphs

    Book Chapter: Statistical Modelling and Regression Structures -- Festschrift in Honour of Ludwig Fahrmeir; Kneib, Thomas; Tutz, Gerhard (Eds.)

    Astrategy for testing differential conditional independence structures (CIS) between two graphs is introduced. The graphs have the same set of nodes and are estimated from data sampled under two different conditions. The test uses the entire pathplot in a Lasso regression as the information on how a node connects with the remaining nodes in the graph.
    The interpretation of the paths as random processes allows defining stopping times which make the statistical properties of the test statistic…

    Astrategy for testing differential conditional independence structures (CIS) between two graphs is introduced. The graphs have the same set of nodes and are estimated from data sampled under two different conditions. The test uses the entire pathplot in a Lasso regression as the information on how a node connects with the remaining nodes in the graph.
    The interpretation of the paths as random processes allows defining stopping times which make the statistical properties of the test statistic accessible to analytic reasoning. A resampling approach is proposed to calculated p-values simultaneously for a hierarchical testing procedure. The hierarchical testing steps through a given hierarchy of clusters. First, collective effects are measured at the coarsest level possible (the global null hypothesis that no node in the graph shows a differential CIS). If the global null hypothesis can be rejected, finer resolution levels are tested for an effect until the level of individual nodes is reached.
    The strategy is applied to association patterns of categories from the ICF in patients under post-acute rehabilitation. The patients are characterized by two different conditions. Acomprehensive understanding of differences in the conditional independence structures between the patient groups is pivotal for evidence-based intervention design on the policy, the service and the clinical level related to their treatment. Due to extensive computation, parallel computing offers an effective approach to implement our explorative tool and to locate nodes in a graph which show differential CIS between two conditions.

    Other authors
    • Ulrich Mansmann
    • Ralf Strobl
    • Vindi Jurinovic
    See publication
  • Simulation Study for the Agreement between Statistical Methods in Quality Assessment and Control of Microarray Data

    Springer - Computational Statistics

    As microarray data quality can affect each step of the microarray analysis process, quality assessment and control is an integral part. It detects divergent measurements beyond the acceptable level of random fluctuations. This empirical study identifies association and correlation between the six quality assessment methods for microarray outlier detection used in the arrayQualityMetrics package version 2.2.2. For evaluation two different agreement tests—Cohen’s Kappa, after a homogeneity…

    As microarray data quality can affect each step of the microarray analysis process, quality assessment and control is an integral part. It detects divergent measurements beyond the acceptable level of random fluctuations. This empirical study identifies association and correlation between the six quality assessment methods for microarray outlier detection used in the arrayQualityMetrics package version 2.2.2. For evaluation two different agreement tests—Cohen’s Kappa, after a homogeneity marginal criteria, and AC1 Statistic—, the Pearson Correlation Coefficient and realistic microarray data from the public ArrayExpress database have been used. It is possible to assess the quality of a data set using only four of the six currently proposed statistical methods to comprehensively quantify the quality information in large series of microarrays. This saves computation time and reduces decision complexity for the analyst. The new proposed rule is validated with data sets from biomedical studies.

    Other authors
    See publication
  • affyPara - a Bioconductor Package for Parallelized Preprocessing Algorithms of Affymetrix Microarray Data

    Bioinformatics and Biology Insights

    Microarray data repositories as well as large clinical applications of gene expression allow to analyse several hundreds of microarrays at one time. The preprocessing of large amounts of microarrays is still a challenge. The algorithms are limited by the available computer hardware. For example, building classification or prognostic rules from large microarray sets will be very time consuming. Here, preprocessing has to be a part of the cross-validation and resampling strategy which is…

    Microarray data repositories as well as large clinical applications of gene expression allow to analyse several hundreds of microarrays at one time. The preprocessing of large amounts of microarrays is still a challenge. The algorithms are limited by the available computer hardware. For example, building classification or prognostic rules from large microarray sets will be very time consuming. Here, preprocessing has to be a part of the cross-validation and resampling strategy which is necessary to estimate the rule’s prediction quality honestly. This paper proposes the new Bioconductor package affyPara for parallelized preprocessing of Affymetrix microarray data. Partition of data can be applied on arrays and parallelization of algorithms is a straightforward consequence. The partition of data and distribution to several nodes solves the main memory problems and accelerates preprocessing by up to the factor 20 for 200 or more arrays. affyPara is a free and open source package, under GPL license, available form the Bioconductor project at www.bioconductor.org. A user guide and examples are provided with the package.

    Other authors
    See publication
  • Parallel Computing with the R Language in a Supercomputing Environment

    Book Chapter: High Performance Computing in Science and Engineering, Garching 2009, Springer

    R is an open-source programming language and software environment for statistical computing and graphics. During the last decade a great deal of research has been conducted on parallel computing techniques with the R language. Two packages (snow and Rmpi) stand out as particularly useful for general use on computer clusters and the multicore package for the use on multi-core machines.
    This article describes the operation of the R language at the supercomputer HLRB2 hosted at the…

    R is an open-source programming language and software environment for statistical computing and graphics. During the last decade a great deal of research has been conducted on parallel computing techniques with the R language. Two packages (snow and Rmpi) stand out as particularly useful for general use on computer clusters and the multicore package for the use on multi-core machines.
    This article describes the operation of the R language at the supercomputer HLRB2 hosted at the Leibniz-Rechenzentrum in Munich, Germany. Additional, a small benchmark is provided and the article explains and discusses two parallel biostatistical applications calculated at the HLRB2. The indirect comparison of interaction graph example outlines the requirements for more than 10.000 processors.

    Other authors
    • Ulrich Mansmann
    See publication
  • State of the Art in Parallel Computing with R

    Journal of Statistical Software

    R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing.

    This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their…

    R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing.

    This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance.

    Two packages (snow, Rmpi) stand out as particularly suited to general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems five different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix.

    Other authors
    See publication
  • Parallelized preprocessing algorithms for high-density oligonucleotide array data

    22th International Parallel and Distributed Processing Symposium (IPDPS 2008)

    tudies of gene expression using high-density oligonucleotide microarrays have become standard in a variety of biological contexts. The data recorded using the microarray technique are characterized by high levels of noise and bias. These failures have to be removed, therefore preprocessing of raw data has been a research topic of high priority over the past few years. Actual research and computations are limited by the available computer hardware. Furthermore most of the existing preprocessing…

    tudies of gene expression using high-density oligonucleotide microarrays have become standard in a variety of biological contexts. The data recorded using the microarray technique are characterized by high levels of noise and bias. These failures have to be removed, therefore preprocessing of raw data has been a research topic of high priority over the past few years. Actual research and computations are limited by the available computer hardware. Furthermore most of the existing preprocessing methods are very time consuming. To solve these problems, the potential of parallel computing should be used. For parallelization on multicomputers, the communication protocol MPI (message passing interface) and the R language will be used. This paper proposes the new R language package affyPara for parallelized preprocessing of high-density oligonucleotide microarray data. Partition of data could be done on arrays and therefore parallelization of algorithms gets intuitive possible. The partition of data and distribution to several nodes solves the main memory problems and accelerates the methods by up to the factor ten.

    Other authors
    • Ulrich Mansmann
    See publication

Honors & Awards

  • BARC Best Practice Award für Business Intelligence und Analytics 2018

    BARC

    * Scout24 wins in the category "medium-sized businesses" with a comprehensive transformation approach to data-driven work
    * Expert jury recognizes approach that encompasses change in technology, organization and corporate culture
    * Markus Schmidberger, Head of Data Technology at Scout24: "Data-controlled work is one of the central corporate values of Scout24. Thanks to our new data organisation, we are enabling more and more employees to evaluate data independently. This also benefits our…

    * Scout24 wins in the category "medium-sized businesses" with a comprehensive transformation approach to data-driven work
    * Expert jury recognizes approach that encompasses change in technology, organization and corporate culture
    * Markus Schmidberger, Head of Data Technology at Scout24: "Data-controlled work is one of the central corporate values of Scout24. Thanks to our new data organisation, we are enabling more and more employees to evaluate data independently. This also benefits our users, customers and partners".

    https://www.scout24.com/en/Press-Media/News-Archive/Scout24-wins-BARC-Best-Practice-Award.aspx

    https://barc.de/news/wmf-und-scout24-gewinnen-den-barc-best-practice-award-fur-business-intelligence-und-analytics-2018

  • Gartner Data & Analytics Excellence Award for Best Data Management and Infrastructure

    Gartner Inc.

    Gartner, Inc. has announced the winners of the Gartner Data & Analytics Excellence Awards. The awards recognize excellence in data and analytics technology to drive best-in-class initiatives.

    Six winners and 12 finalists were chosen from a pool of 152 submissions across all six categories. While the criteria were different for each category, all submissions were assessed by a team of Gartner analysts, and honorees were selected by benchmarking against world-class performance standards…

    Gartner, Inc. has announced the winners of the Gartner Data & Analytics Excellence Awards. The awards recognize excellence in data and analytics technology to drive best-in-class initiatives.

    Six winners and 12 finalists were chosen from a pool of 152 submissions across all six categories. While the criteria were different for each category, all submissions were assessed by a team of Gartner analysts, and honorees were selected by benchmarking against world-class performance standards. Gartner looked for submissions with a strong organizational and leadership component, effective use of modern technologies, and most of all, clear business outcomes.

    glomex —The global media exchange based in Germany, glomex built a scalable data management infrastructure through big data investments enabling it to meet customer demand.

    http://www.gartner.com/newsroom/id/3591318

Recommendations received

17 people have recommended Dr. Markus

Join now to view

More activity by Dr. Markus

View Dr. Markus’ full profile

  • See who you know in common
  • Get introduced
  • Contact Dr. Markus directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Add new skills with these courses