Descriptive Statistics with Basic Data Visualization in R programming

Anushka Agrawal
4 min readMay 14, 2021

R programming language is an extremely versatile and user friendly software for programming. It is commonly used for statistical analysis. Lets go over some of the most basic yet frequently used functions in R.

Functions covered:

  • Basic Statistical Functions
  • Other important functions in different libraries
  • Important Plots in R
Photo by Carlos Muza on Unsplash

Basic Statistical Functions

You must’ve heard a lot of statistical concepts used in Descriptive Statistics. Descriptive statistics helps us understand the key features that describe a given dataset. This vast topic includes concepts like the Mean of the sample, Median, Mode, Skewness, Kurtosis and Moments (raw and central). Lets understand the application and easy functions of these statistics on R programming:

Assuming “data1” is the dataset that contains two variables, marks in mathematics and marks in statistics, we can establish the following commands.

Assigning the variables in R:

> maths <- data1$Mathematics #where “Mathematics” is the name fo the column containing mathematics marks
> stats <- data1$Statistics #where “Statistics” is the the name fo the column containing the marks in Statistics

Note: We can use the “$” sign to call one specific variable from a dataframe, hence, we can obtain a specific column of the entire data using syntax
> datasetname$columnname

Also, we can use the “#” sign to give comments in R.

Mean:
>mean(maths) #finding the average marks in Mathematics

Variance and Standard Deviation:
> var(stats) #finding the variance of marks in Statistics
> sd(stats) #finding the square root of the variance / standard devation in Statistics marks

Mode:
> mode(stats) #finding Mode of Marks in Statistics

Median
> median(maths) #finding the Median of the Marks in Mathematics

Interquartile Range:
>IQR(maths) #gives the interquartile range of the Marks in Mathematics

Photo by Aaron Burden on Unsplash

Other Statistics Functions in R Libraries

Skewness: This inbuilt function is available in one of the libraries of R, which is the e1071 package.
>library(e1071)
>skewness(stats) #gives the skewness of the Marks in Statistics

Kurtosis: This again is inbuilt only in the e1071 package.
> library(e1071)
>kurtosis(maths) #gives the Kurtosis of Marks in Mathematics

Moments: These are some parameters based on which we can obtain some important an relevant information about the distribution of the data. There are two types of moments: raw moments and central moments. Central moments are measured around the mean of the sample whereas the central Moments around 0.
> library(e1071)
> moments(maths, 3) #this gives the 3rd Raw Moment for the Marks in Mathematics
> moments(stats, 4, center=TRUE) #this ives the 4th Moments around the center, i.e., the 4th Central Moment

Basic Plots in R

There are numerous plots and chart types we can obtain with R programming. The higher we go, the more sophisticated and beautiful it can be.

As important as it is to be aware of the plots in R, it is important to understand their relevance to the data we want to represent.

Lets go over the basic plots with the situations when they are used!

Plot function: This is one of the most used functions, where we cna plot two variable on the x and y axis, by mentioning the values it takes and also specify characteristics like what type of plot we want.

Here, we use the inbuilt dataset of mtcars and make a scatter plot between the Displacement and the Gross Horsepower of the various cars.

>plot(mtcars$disp,mtcars$hp,type=”p”,col=”blue”, main=”scatter plot of the Displcement and Gross Horse Power”,xlab=”Displacement”,ylab=”Gross Horsepower”)

The type argument takes the following letters:
“l” = for a line chart
“s” = step graph
“n” = to get no graph, only the x-axis and y-axis plottes
“o”=to get the points and lines of the chart, Overplot

Output Screen to the command above, Image by author

Lines Function: It is used to plot a line of different pair of observations on the same plot that is already created.

>x <- seq(-pi,pi,0.1) #definign the sequence over which we want to plot

>plot(x,sin(x),lty=3,ylab=”Sinx and Cosx”,main=”Plot of Sin(x) and Cos(x) against X”) #plotting x vs Sin(x)
>lines(x,cos(x),col=”red”,lty=4) #overplotting the line of Cos(x) on the earlier plot

>legend(“topright”,lty=c(3,4),col=c(“black”,”red”),c(“Sin(x)”,”Cos(x)”)) #making a legend to describe what each line represents

Bar plot: This data visualization is used while representing a Discrete Variable, i.e., the variable that has categories to it. Each bar represents that category and a bar plot is formed, where each bar represents the frequency of the occurrence of that observation.

>barplot(sample, xlab=”x-axis label”, ylab=”y-axis lable”, main=”main heading of the plot”)

Histogram: This is like a continuous bar chart. It is formed for sample in class intervals.

> hist(sample, xlab=”x-axis label”, ylab=”y-axis lable”, main=”main heading of the plot”)

Pie chart: It is sued to represent the share of one category from the total.

>pie(sample, clockwise=TRUE) #clockwise =TRUE, to get the observation sectioned clockwise

I hope this article got you excited about R programming.

Spend half an hour a day with this software and you will learn so much more everyday :)

--

--