Parallel computing of big data using mapreduce

Swetha, G and Radhika, V

Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. Extracting useful information from dataset measuring in gigabytes and tetra bytes is a real challenge for data miners. In this paper we discuss and analyze opportunities and challenges for efficient parallel data processing. Big Data is the next frontier for innovation, competition, and productivity, and many solutions continue to appear, partly supported by the considerable enthusiasm around the Map Reduce paradigm for large-scale data analysis. Map Reduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. We review various parallel and distributed programming paradigms, analyzing how they fit into the Big Data era, and present modern emerging paradigms and frameworks. With “Big Data” now becoming a reality, more programmers are interested in building programs on the parallel model — and they often find SQL an unfamiliar and restrictive way to wrangle data and write code. The biggest game-changer to come along is Map Reduce, the parallel programming framework that has gained prominence thanks to its use at web search companies.

Download PDF: