Aggregation: aggr

Syntax

op [ aggr aggrClause { groupingClause } ]

aggrClause ::= { aggrRole } aggrComp := aggrExpr { , { aggrRole } aggrComp:= aggrExpr }*

groupingClause ::= { group by groupingId {, groupingId }*

| group except groupingId {, groupingId }*

| group all conversionExpr }¹

{ having havingCondition }

aggrRole ::= measure | attribute | viral attribute

Input parameters

op	the operand
aggrClause	clause that specifies the required aggregations, i.e., the aggregated Components to be calculated, their roles and their calculation algorithm, to be applied on the joined and filtered Data Points
aggrRole	the role of the aggregated Component to be calculated
aggrComp	the name of the aggregated Component to be calculated; this is a dependent Component of the result (Measure or Attribute, not Identifier)
aggrExpr	expression at component level, having only Components of the input Data Sets as operands, which invokes an aggregate operator (e.g. avg, count, max…, see also the corresponding sections) to perform the desired aggregation. Note that the count operator is used in an aggrClause without parameters, e.g.: DS_1 [ aggr Me_1 := count ( ) group by Id_1 ) ]
groupingClause	the following alternative grouping options: · group by: the Data Points are grouped by the values of the specified Identifiers (groupingId). The Identifiers not specified are dropped in the result. · group except: the Data Points are grouped by the values of the Identifiers not specified as groupingId. The Identifiers specified as groupingId are dropped in the result. · group all: converts the values of an Identifier Component using conversionExpr and keeps all the resulting Identifiers.
groupingId	Identifier Component to be kept (in the group by clause) or dropped (in the group except clause).
conversionExpr	specifies a conversion operator (e.g., time_agg) to convert an Identifier from finer to coarser granularity. The conversion operator is applied on an Identifier of the operand Data Set op.
havingCondition	a condition (boolean expression) at component level, having only Components of the input Data Sets as operands (and possibly constants), to be fulfilled by the groups of Data Points: only groups for which havingCondition evaluates to TRUE appear in the result. The havingCondition refers to the groups specified through the groupingClause, therefore it must invoke aggregate operators (e.g. avg, count, max…, see also the section Aggregate invocation). A correct example of havingCondition is: max(obs_value) < 1000 instead the condition obs_value < 1000 is not a right havingCondition, because it refers to the values of the single Data Points and not to the groups. The count operator is used in a havingCondition without parameters, e.g.: sum (DS_1 group by id1 having count ( ) >= 10 )

Examples of valid syntaxes

DS_1 [ aggr M1 := min ( Me_1 ) group by  Id_1, Id_2  ]
DS_1 [ aggr M1 := min ( Me_1 ) group except  Id_1, Id_2 ]

Semantics for scalar operations

This operator cannot be applied to scalar values.

Input parameters type

op

dataset

aggrComp

name<component>

aggrExpr

component<scalar>

groupingId

name<identifier>

conversionExpr

identifier<scalar>

havingCondition

component<boolean>

Result type

result

dataset

Additional Constraints

The aggrComp parameter cannot be the name of an Identifier component.

All the components used in aggrExpr must belong to the operand Data Set op.

The conversionExpr parameter applies just one conversion operator to just one Identifier belonging to the input Data Set. The basic scalar type of the Identifier must be compatible with the basic scalar type of the conversion operator.

Behaviour

The operator aggr calculates aggregations of dependent Components (Measures or Attributes) on the basis of sub-expressions at Component level. Each Component is calculated through an independent sub-expression. It is possible to specify the role of the calculated Component among measure, attribute, or viral attribute. The substring viral allows to control the virality of Attributes, if the Attribute propagation rule is adopted (see the User Manual). When the role is omitted, the following rule is applied: if the component exists in the operand Data Set then it maintains its role; if the component does not exist in the operand Data Set then its role is Measure.

The aggrExpr sub-expressions are independent of one another, they can only reference Components of the input Data Set and cannot use Components generated, for example, by other aggrExpr sub-expressions. The aggr computed Measures and Attributes are the only Measures and Attributes returned in the output Data Set (plus the possible viral Attributes). The sub-expressions must contain only Aggregate operators, which are able to compute an aggregated Value relevant to a group of Data Points. The groups of Data Points to be aggregated are specified through the groupingClause, which allows the following alternative options.

group by	by the Data Points are grouped by the values of the specified Identifiers. The Identifiers not specified are dropped in the result.
group except	the Data Points are grouped by the values of the Identifiers not specified in the clause. The specified Identifiers are dropped in the result.
group all	converts an Identifier Component using conversionExpr and keeps all the other Identifiers.

The having clause is used to filter groups in the result by means of an aggregate condition evaluated on the single groups (for example the minimum number of Data Points in the group).

If no grouping clause is specified, then all the input Data Points are aggregated in a single group and the clause returns a Data Set that contains a single Data Point and has no Identifiers.

The Attributes calculated through the aggr clauses are maintained in the result. For all the other Attributes that are defined as viral, the Attribute propagation rule is applied (for the semantics, see the Attribute Propagation Rule section in the User Manual).

Examples

Given the Data Set DS_1:

Input DS_1 (see structure)

Id_1	Id_2	Id_3	Me_1
1	A	XX	0
1	A	YY	2
1	B	XX	3
1	B	YY	5
2	A	XX	7
2	A	YY	2

Example 1

DS_r := DS_1 [ aggr Me_1:= sum( Me_1 ) group by Id_1 , Id_2 ];

results in (see structure):

DS_r
Id_1	Id_2	Me_1
1	A	2
1	B	8
2	A	9

Example 2

DS_r := DS_1 [ aggr Me_3:= min( Me_1 ) group except Id_3 ];

results in (see structure):

DS_r
Id_1	Id_2	Me_3
1	A	0
1	B	3
2	A	2

Example 3

DS_r := DS_1 [ aggr Me_1:= sum( Me_1 ), Me_2 := max( Me_1) group by Id_1 , Id_2 having avg (Me_1 ) > 2 ];

results in (see structure):

DS_r
Id_1	Id_2	Me_1	Me_2
1	B	8	5
2	A	9	7