top of page

Introduction to ggplot2 for Biological Datasets

You're currently learning a lecture from the course: 


Prerequisite Terminologies

In order to have thorough understanding of the main topic, you should have the basic concept of the following terms:





Yusra Karim


ggplot2 is a data visualization package based on the ‘Grammar of Graphics’ created by Hadley Wickham. R does provide built-in plotting functions, ggplot2 library is utilized to declaratively make plots and charts of the required datasets. Among other data visualization packages that are available for biological data analysis, ggplot2 is the most frequently used package because it provides more methodologies and is easier to learn. It is particularly effective for describing how visualizations should represent the data which makes it a preeminent plotting library in R. ggplot2 library allows you to make nearly any kind of (static) data visualization and customized to your exact specifications. It can be utilized to efficiently visualize the larger and complex datasets.


  • You can access the homepage of ggplot2 which is available on the CRAN package from here.

  • On the homepage of CRAN,You can see details about particular packages and information like version, license, citations and reference manual. Vignettes is also available that allows you to utilize examples related to ggplot2 so you can have better understanding to learn it.

  • You can download it through windows binaries if you are working with offline scenarios.


  • There are two ways to install ggplot2 in Rstudio.

Using Command line:

  • You can install ggplot2 directly using the following command:


[It’ll start installing the ggplot2.]

Using Package tab:

  • Another way to install ggplot2 is through the ‘package tab’. There you can install already downloaded packages.

  • Click on the install, it’ll open the ‘Repository (CRAN)’ window where you can type ‘ggplot2’. It’ll start installing the package.

Initializing Library:

  • Initializing ggplot2 is important because if you don't initialize the library, you cannot utilize the functions that are available within that particular library.

  • There are two ways to initialize ggplot2.

  • You can just click on the ‘ggplot2’ in the ‘Package bar’.

  • Another way to initialize the ggplot2 is that to simply run the following command:


[It’ll initialize the library. So you can use certain functions of it to visualize datasets.]

  • The latter method is efficient to initialize ggplot2 in case you have many libraries in the package bar.

  • We’ve utilized a plant dataset just to see how ggplot2 can visualize datasets. Run the following command:

ggplot(data = iris) +

geom_jitter(aes(x = sepal.width, y = species, color = species),

width = 0.05, height = 0.1, alpha = 0.5)

[The above code will be discussed thoroughly in the next section of this video. Here we used it just to see how ggplot2 plots the visualization to represent the dataset.]

  • You’ll see the visualization of that plant species and each dataset coloured differently. It'll be discussed in detail in the next section of this video.]


In this introductory video, we learned about the ggplot2 package and went through with its installation and initialization. We also got to see how ggplot2 can be utilized for the visualization to represent the particular dataset.

File(s) Section

If a particular file is required for this video, and was discussed in the lecture, you can download it by clicking the button below.

bottom of page