Tuesday 19 January 2016

My Journey with plotting in R - Basic Plotting

My Journey with plotting in R


Today, I plotted my first plot in R. I was feeling lost in the beginning, but understood after trying a few examples. I tried with inbuilt data sets in R. But in this post I am going to explain the Nile data set which can be found here.
CSV Description.

Csv file contains the data, 100 records and the doc file contains the description of data.

So let's begin..

Before we plot anything, let's try to understand the data. head function shows first few records in data.

df = read.csv(file = 'Nile.csv', header = TRUE)
head(df)
  X time Nile
1 1 1871 1120
2 2 1872 1160
3 3 1873  963
4 4 1874 1210
5 5 1875 1160
6 6 1876 1160

time is the year of the observation and Nile is water flow in that particular year.

plot(df$time, df$Nile, xlab = 'Year', ylab = 'Flow', type = 'l')


With the above line, the following plot is generated:

The plot function is a versatile plotting tool and can plot many types of plots which we will see in a bit. The above plot looks good but it would be better if we could add average water flow in Nile.

meanWater = mean(df$Nile)
abline(h = meanWater)
 
Looks good, but what if we could add min threshold and max threshold after which the flow in Nile could result into trouble for nearby habitants? Let's add that as standard deviation of the flow. Let's also denote the average flow by green line and standard deviation as red lines since that marks as limit after which problems may arise.
sdWater = sd(df$Nile)
abline(h = meanWater, col = 'green')
abline(h = meanWater + sdWater, col = 'red')
abline(h = meanWater - sdWater, col = 'red')

And the final plot looks something like this:





I think now we can understand the plot more clearly. Year around 1915 was a very dry year and we may conclude that there was a flood in around 1880 and 1895.

Here's the complete script:

df = read.csv(file = 'Nile.csv', header = TRUE)
plot.new()
plot(df$time, df$Nile, xlab = 'Year', ylab = 'Water in Nile', type = 'l')
meanWater = mean(df$Nile)
sdWater = sd(df$Nile)
abline(h = meanWater, col = 'green')
abline(h = meanWater + sdWater, col = 'red')
abline(h = meanWater - sdWater, col = 'red')


Find out about types of plot in my next blog. ;)