Advanced Data Visualization using R. Edition (I)
Data Visualization has become one of the most important segments in almost any industry. Research, Academic, Commercial, Making Data platforms, Machine Learning Services, Business services, almost any of these rely heavily on Data Story telling, Using Data Visualization to make Decisions, Interpreting data analysis or just making the User experience better for anyone using the platforms. R is ideal for all this with large number of packages (thousands) especially when implemented in RStudio which makes all this faster to implement and export.
In this edition i will show how R is one of the best programming languages for Data Visualization or Data Viz, now even a whole Specialization in Data Science.
The visualization above was created using the 'pheatmap' package. The plot reflects the Hierarchical Clustering and heatmap presentation for given data.
Spectrum~
R has thousands of Data visualization packages and even its base is very advanced in plotting. Being able to plot almost anything in Data Science or Research is one of the important factors in my choice of R years ago. From Bioinformatics to Bayesian Statistics, Meta-analysis, Machine Learning, GIS plots, interactive plots, apps, web and many more areas, almost anything can be plotted in R.
Esthetics and Resolution ~
In Data Visualization, esthetics is the key. A good Data Viz knows all technical graphics design aspects. When it comes to this segment R is probably one of the best programming languages. Enabling fully customizable plots with full technical features is essential. Did you know that most of the plots in R can have virtually limitless resolution? That's right. Because plots in R are not plain images composed of pixels, but rather mathematical graphics, we call vector graphics, having computation behind them instead of pixels and as such can be scaled at any resolution.
Why is this important? Well if a good R expert is making plots there is definitely no blurry image problem. Not when they are zoomed scaled or in any similar image modification. Combination of full customization and virtually limitless resolution is a powerful one. Tip: How do we get the limitless resolution? By converting the plots in either svg or pdf files in R. Mathematical plots in these formats can be generated in most situations and provide extremely high resolutions. This visualization was created using the 'ggplot2' and 'ggridges' packages.
Documentation~
Its important to know the methods behind the plot and documentation is the key here. R plots are so well documented what sometimes majority of packages are about the plots and different methods applicable in those plots. Sometimes tens, sometimes hundreds or pages in a package reserved just for plots is the situation regularly encountered when working with R packages. Plots and methods used in R need to be compliant to different regulations and validations and such a large documentation brings R at the top of Data Viz in my opinion.
Specific areas Data Viz ~
R has a very wide spectrum of plot options for different specific methods like Bayesian Statistics, Markov Chain Monte Carlo methods, ARIMAs, Hidden Markov Models, Differential Gene expression, Transcriptomics, Genomics, Meta-analysis, Clustering and many more. R enables even the 3D design based on Statistical methods. In the example bellow Hierarchical Bayesian Statistics posterior distributions can be seen in the plot (created using the 'brms' package).
Journal Standards~
When publishing Research Journal Standards become essential to comply with. With such large number of packages for Data Viz in R its quite easy to find the one which can comply with different Journal. One example i mentioned in previous articles is performing a Meta-analysis and creating forest plots which are standardized for different Journals and Esthetically Advanced at the same time. This is applicable for most other Journals.
This is an example of a Meta-analysis plot using the 'meta' package and Revman5 standard by Cochrane Community for Systematic Review and Meta-analyses.
File Formats~
R is optimized for exporting plots as virtually any relevant image or vector graphics format. Wheatear its SVG, PDF, PNG, TIFF, Metafile, EPS, BMP, JPEG, PPT or any other relevant file, in R exports are made easy. RStudio based use of R makes most of these easier using the Export tab, but there are also packages and function where any of these formats can be implemented using different workflows. I would like to emphasize the importance of all these formats in using R for plotting with special focus on SVG, PDF and PPT files which can all export vector graphics and have high resolution of images.
Interactive plots and model deployment~
Development of 'Shiny' library has probably had the largest impact on Data Visualization. Data Viz is at the core of the package used to deploy Machine Learning models as apps and the main interface is often composed of course of Advanced Esthetics Data Visualizations where input parameters can be changed and changes in the visualization observed in real time. In addition to Shiny there are hundreds of other interactive plot packages for R and i would recommend these for any Data Viz expert.
Quality and accuracy~
Last but not least in this edition, advanced plotting in R ensures high quality and accuracy of what's being plotted. Having such a large number of packages, methods implemented and detailed documentation for many R packages also enables them to be validated in the plotting part. Most of those advanced and publication ready plots are implemented millions of times and validated in the process. This quality validation of the plots trough their implementation is one of the best ways to ensure their quality and accuracy. There are many R communities which specialize in validating R visualizations
By Darko Medin, a Data Scientist, R programmer and a Statistician
References:
R : https://www.r-project.org/
RStudio : https://www.rstudio.com/
Package 'ggplot2' : https://www.rdocumentation.org/packages/ggridges/versions/0.5.3
Package 'ggridges' : https://www.rdocumentation.org/packages/ggridges/versions/0.5.3
Package 'pheatmap' : https://cran.r-project.org/web/packages/pheatmap/index.html
Package 'brms' : https://cran.r-project.org/web/packages/brms/index.html
Package 'meta':https://cran.r-project.org/web/packages/meta/index.html
Cochrane community : https://community.cochrane.org/
Data Scientist and a Biostatistician. Developer of ML/AI models. Researcher in the fields of Biology and Clinical Research. Helping companies with Digital products, Artificial intelligence, Machine Learning.
2yIn this edition, the focus was on key aspects of R Advanced Data visualization perspectives, while in the next ones i will be making R programming tutorials with specific coding examples and projects.