Aggregate invocation
Syntax
In a Data Set expression:
aggregateOperator ( firstOperand { , additionalOperand }* { groupingClause } )
In a Component expression within an aggr clause:
aggregateOperator ( firstOperand { , additionalOperand }* ) { groupingClause }
aggregateOperator ::= avg | count | max | median | min | stddev_pop | stddev_samp | sum | var_pop | var_samp
- groupingClause ::=
- { group by groupingId {, groupingId}*| group except groupingId {, groupingId}*| group all conversionExpr }¹{ having havingCondition }
Input parameters
aggregateOperator |
the keyword of the aggregate operator to invoke (e.g., avg, count, max…) |
firstOperand |
the first operand of the invoked aggregate operator (a Data Set for an invocation
at Data Set level or a Component of the input Data Set for an invocation at
Component level within an aggr operator or an aggr clause in a join operation)
|
additionalOperand |
an additional operand (if any) of the invoked operator. The various operators
can have a different number of parameters. The number of parameters, their
types and if they are mandatory or optional depend on the invoked operator
|
groupingClause |
the following alternative grouping options:
· group by: the Data Points are grouped by the values of the specified Identifiers
(groupingId). The Identifiers not specified are dropped in the result.
· group except: the Data Points are grouped by the values of the Identifiers not
specified as groupingId. The Identifiers specified as groupingId are
dropped in the result.
· group all: converts the values of an Identifier Component using conversionExpr
and keeps all the resulting Identifiers.
|
groupingId |
Identifier Component to be kept (in the group by clause) or dropped (in the
group except clause).
|
conversionExpr |
specifies a conversion operator (e.g., time_agg) to convert data from finer to
coarser granularity. The conversion operator is applied on an Identifier of the
operand Data Set op.
|
havingCondition |
a condition (boolean expression) at component level, having only Components of
the input Data Sets as operands (and possibly constants), to be fulfilled by the
groups of Data Points: only groups for which havingCondition evaluates to TRUE
appear in the result. The havingCondition refers to the groups specified through
the groupingClause, therefore it must invoke aggregate operators (e.g. avg,
count, max, …, see also the corresponding sections). A correct example
of havingCondition is: max(obs_value) < 1000, while the condition
obs_value < 1000 is not a right havingCondition, because it refers to the values
of single Data Points and not to the groups. The count operator is used in a
havingCondition without parameters, e.g.:
sum ( ds group by id1 having count () >= 10 ) .
|
Examples of valid syntaxes
avg ( DS_1 )
avg ( DS_1 group by Id_1, Id_2 )
avg ( DS_1 group except Id_1, Id_2 )
avg ( DS_1 group all time_agg ( "Q" ) )
Semantics for scalar operations
The aggregate operators cannot be applied to scalar values.
Input parameters type
firstOperand
dataset
| component
additionalOperand
see the type of the additional parameter (if any) of the invoked
aggregateOperator. The aggregate operators and their parameters are
described in the following sections.
groupingId
name < identifier >
conversionExpr
identifier
havingCondition
component < boolean >
Result type
result
dataset
| component
Additional Constraints
The Aggregate invocation cannot be nested in other Aggregate or Analytic invocations.
The aggregate operations at component level can be invoked within the aggr clause, both as part of a join operator and the aggr operator (see the parameter aggrExpr of those operators).
The basic scalar types of firstOperand and additionalOperand (if any) must be compliant with the specific basic scalar types required by the invoked operator (the required basic scalar types are described in the table at the beginning of this chapter and in the sections of the various operators below).
The conversionExpr parameter applies just one conversion operator to just one Identifier belonging to the input Data Set. The basic scalar type of the Identifier must be compatible with the basic scalar type of the conversion operator.
If the grouping clause is omitted, then all the input Data Points are aggregated in a single group and the clause returns a Data Set that contains a single Data Point and has no Identifiers.
Behaviour
The aggregateOperator is applied as usual to all the measures of the firstOperand Data Set (if invoked at Data Set level) or to the firstOperand Component of the input Data Set (if invoked at Component level). In both cases, the operator calculates the required aggregated values for groups of Data Points of the input Data Set. The groups of Data Points to be aggregated are specified through the groupingClause, which allows the following alternative options.
group by |
the Data Points are grouped by the values of the specified Identifiers.
The Identifiers not specified are dropped in the result.
|
group except |
the Data Points are grouped by the values of the Identifiers not specified
in the clause. The specified Identifiers are dropped in the result.
|
group all |
converts an Identifier Component using conversionExpr and keeps all the Identifiers. |
The having clause is used to filter groups in the result by means of an aggregate condition evaluated on the single groups (for example the minimum number of rows in the group).
If no grouping clause is specified, then all the input Data Points are aggregated in a single group and the operator returns a Data Set that contains a single Data Point and has no Identifiers.
For the invocation at Data Set level, the resulting Data Set has the same Measures as the operand. For the invocation at Component level, the resulting Data Set has the Measures explicitly calculated (all the other Measures are dropped because no aggregation behaviour is specified for them).
For invocation at Data Set level, the Attribute propagation rule is applied. For invocation at Component level, the Attributes calculated within the aggr clause are maintained in the result; for all the other Attributes that are defined as viral, the Attribute propagation rule is applied (for the semantics, see the Attribute Propagation Rule section in the User Manual).
As mentioned, the Aggregate invocation at component level can be done within the aggr clause, both as part of a Join operator and the aggr operator (see the parameter aggrExpr of those operators), therefore, for a better comprehension fo the behaviour at Component level, see also those operators.
Examples
Given the operand datasets DS_1 and DS_2:
Input DS_1 (see structure)
Id_1 |
Id_2 |
Id_3 |
Me_1 |
At_1 |
---|---|---|---|---|
2010 |
E |
XX |
20 |
|
2010 |
B |
XX |
1 |
H |
2010 |
R |
XX |
1 |
A |
2010 |
F |
YY |
23 |
|
2011 |
E |
XX |
20 |
P |
2011 |
B |
ZZ |
1 |
N |
2011 |
R |
YY |
-1 |
P |
2011 |
F |
XX |
20 |
Z |
2012 |
L |
ZZ |
40 |
P |
2012 |
E |
YY |
30 |
P |
Input DS_2 (see structure)
Id_1 |
Id_2 |
Id_3 |
Me_1 |
At_1 |
---|---|---|---|---|
2010 |
E |
XX |
20 |
|
2010 |
B |
XX |
1 |
H |
2010 |
R |
XX |
1 |
A |
2010 |
F |
YY |
23 |
|
2011 |
E |
XX |
20 |
P |
2011 |
B |
ZZ |
1 |
N |
2011 |
R |
YY |
-1 |
P |
2011 |
F |
XX |
20 |
Z |
2012 |
L |
ZZ |
40 |
P |
2012 |
E |
YY |
30 |
P |
Example 1
DS_r := avg ( DS_1 group by Id_1 );
results in (see structure):
Id_1 |
Me_1 |
---|---|
2010 |
11.25 |
2011 |
10.0 |
2012 |
35.0 |
Example 2
DS_r := sum ( DS_1 group by Id_1, Id_3 );
results in (see structure):
Id_1 |
Id_3 |
Me_1 |
---|---|---|
2010 |
XX |
22.0 |
2010 |
YY |
23.0 |
2011 |
XX |
40.0 |
2011 |
ZZ |
1.0 |
2011 |
YY |
-1.0 |
2012 |
ZZ |
40.0 |
2012 |
YY |
30.0 |
Example 3
DS_r := avg ( DS_1 );
results in (see structure):
Me_1 |
---|
15.5 |
Example 4
DS_r := DS_1 [ aggr Me_2 := max ( Me_1 ) , Me_3 := min ( Me_1 ) group by Id_1 ];
/*provided that At_1 is viral and the first letter in alphabetic order prevails
and NULL prevails on all the other characters.*/
results in (see structure):
Id_1 |
Me_2 |
Me_3 |
At_1 |
---|---|---|---|
2010 |
23 |
1 |
|
2011 |
20 |
-1 |
N |
2012 |
40 |
30 |
P |
Note: the first example can be rewritten equivalently in the following forms:
DS_r := avg ( DS_1 group except Id_2, Id_3 )
DS_r := avg ( DS_1#Me_1 group by Id_1 )