Labels

algorithms (22) Design Patterns (20) java (19) linux (14) Snippet (13) service mix (6) soa (4)

A Simple usage of PIG


  1. To Analyze large sets of data we need to use PIG
  2. High-level language for expressing data analysis programs (PIG latin)
  3. Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs
  4. It can be extended using other languages like java js etc.



pig -x mapreduce
A = load '/user/cloudera/passwd' using PigStorage(':');
B = foreach A generate $0, $4, $5;
dump B; 

The above code prints the columns in etc/passwd file of the local system.
Thanks to the course for the intro.

Other resources to learn about PIG in detail

SQOOP

Simple way to remember this

SQltohadOOP : To import SQL data into HDFS


Hive and Impala both can read data from HDFS

Impala Executes by using MapReduce where as Hive runs parallel queries massively.


Search 24 Bytes