With the concept of digital world, all the arena of life has been mapped and recorded in digital realm. This has led to availability of plethora of data that need to be analysed to discover the hidden pattern that is far from random. This can be addressed through Data Science. There are several definitions of data science floating around. The data science is amalgamation of statistics, computer science and application in various sectors. large datasets of spatial and temporal information are assembled and manipulated in order to improve the understanding and management of complex systems. At its core, data science involves assemblage, management and analysis of large spatio-temporal datasets for their translation into information that eventually could be used by the decision makers for policy making. Thus, the application of data analytics is very wide and is required across almost all the disciplines. One of the most common tool used for data analytics is R programming language and Hadoop.
Both Hadoop and R programming language are open-source and have become very powerful over time with the contribution of packages by several experts in the field of data science. So, term data science and R and Hadoop as a tool has become almost synonymous.