- To Analyze large sets of data we need to use PIG
- High-level language for expressing data analysis programs (PIG latin)
- Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs
- It can be extended using other languages like java js etc.
pig -x mapreduce
A = load '/user/cloudera/passwd' using PigStorage(':');
B = foreach A generate $0, $4, $5;
dump B;
The above code prints the columns in etc/passwd file of the local system.