In this post we will try to plot a basic exploratory plot using ggplot. The data was obtained from world bank. You can download it from here.
So the data has only 2 columns, year and inflation in that year. Plotting it as it is will give this plot:
Listing for the above plot:
inflation = read.csv('india_inflation.csv')
plot(inflation, type = "l")
Thought this plot conveys all the information required, we can see that we can improve this plot in multiple ways. The very first thing we notice is that there are certain peaks in the plot. These inflation values are very hight or very low, and that should be highlighted. So lets categorize the inflation values in categories such as Low, Normal and High.
inflation = read.csv('india_inflation.csv')
inf.mean = mean(inflation$Inflation)
inf.sd = sd(inflation$Inflation)
high = inf.mean + inf.sd
low = inf.mean - inf.sd
# Categorize the data
inflation[,]$Category = "NORMAL"
inflation[inflation$Inflation > high,]$Category = "HIGH"
inflation[inflation$Inflation < low,]$Category = "LOW"
This code calculates the mean inflation and standard deviation. Any values which are out of first standard deviation are labeled high or low accordingly. This can be used to color the graph accordingly.
g = ggplot(inflation, aes(Year, Inflation)) +
geom_line() +
geom_abline(slope = 0, intercept = inf.mean, aes(color = "gold", size = 1)) +
geom_point(aes(color = Category, size = 4)) +
scale_color_manual(values = c("firebrick4", "firebrick4", "forestgreen")) +
theme_bw() + theme(legend.position = "none")
g