其他
Parallel BASH is a modified version of BASH intended for text processing on computer clusters. It enables use of common UNIX text processing tools (e.g., awk, perl, grep) across multicore or distributed systems. It is particularly suited for scalable processing of large (multi-GB or larger) files.
parbash interprets scripts in the same way as BASH does except when a structure similar to the following is encountered:
cat hdfs:/student_marks | grep ^A | sort | uniq -c > hdfs:/out
In this case, parbash translate the bash pipeline into a map-reduce program, and executes it using Hadoop or Amazon Elastic Map Reduce Service.
See IntroductionToParbash and ParbashExamples for more examples.
Read about common ParbashPatterns through examples.
See AmazonQuickStart for how to run parbas
bash
hadoop
amazon
map-reduce
暂无评论