Jaql
Jaql (pronounced "jackal") is a functional data processing and query language most commonly used for JSON query processing on big data. It started as an open source project at Google[1] but the latest release was on 2010-07-12. IBM[2] took it over as primary data processing language for their Hadoop software package BigInsights. Although having been developed for JSON it supports a variety of other data sources like CSV, TSV, XML. A comparison[3] to other BigData query languages like PIG Latin and Hive QL illustrates performance and usability aspects of these technologies. Jaql supports[4] lazy evaluation, so expressions are only materialized when needed. SyntaxThe basic concept of Jaql is source -> operator(parameter) -> sink ;
where a sink can be a source for a downstream operator. So typically a Jaql program has to following structure, expressing a data processing graph: source -> operator1(parameter) -> operator2(parameter) -> operator2(parameter) -> operator3(parameter) -> operator4(parameter) -> sink ;
Most commonly for readability reasons Jaql programs are linebreaked after the arrow, as is also a common idiom in Twitter Scalding: source -> operator1(parameter)
-> operator2(parameter)
-> operator2(parameter)
-> operator3(parameter)
-> operator4(parameter)
-> sink ;
Core operatorsSource:[5] ExpandUse the EXPAND expression to flatten nested arrays. This expression takes as input an array of nested arrays [[T]] and produces an output array [T], by promoting the elements of each nested array to the top-level output array. FilterUse the FILTER operator to filter away elements from the specified input array. This operator takes as input an array of elements of type T and outputs an array of the same type, retaining those elements for which a predicate evaluates to true. It is the Jaql equivalent of the SQL WHERE clause. Example: data = [
{name: "Jon Doe", income: 20000, manager: false},
{name: "Vince Wayne", income: 32500, manager: false},
{name: "Jane Dean", income: 72000, manager: true},
{name: "Alex Smith", income: 25000, manager: false}
];
data -> filter $.manager;
[
{
"income": 72000,
"manager": true,
"name": "Jane Dean"
}
]
data -> filter $.income < 30000;
[
{
"income": 20000,
"manager": false,
"name": "Jon Doe"
},
{
"income": 25000,
"manager": false,
"name": "Alex Smith"
}
]
GroupUse the GROUP expression to group one or more input arrays on a grouping key and applies an aggregate function per group. JoinUse the JOIN operator to express a join between two or more input arrays. This operator supports multiple types of joins, including natural, left-outer, right-outer, and outer joins. SortUse the SORT operator to sort an input by one or more fields. TopThe TOP expression selects the first k elements of its input. If a comparator is provided, the output is semantically equivalent to sorting the input, then selecting the first k elements. TransformUse the TRANSFORM operator to realize a projection or to apply a function to all items of an output. See alsoReferences
External links |
Portal di Ensiklopedia Dunia