Subspace: sub

Syntax

op [ sub identifier = value { , identifier = value }* ]

Input parameters

op	dataset
identifier	Identifier Component of the input Data Set op
value	valid value for identifier

Examples of valid syntaxes

DS_r := DS_1 [sub Id_2 = "A", Id_5 = 1 ]

Semantics for scalar operations

This operator cannot be applied to scalar values.

Input parameters type

op

dataset

identifier

name<identifier>

value

scalar

Result type

result

dataset

Additional Constraints

The specified Identifier Components identifier (s) must belong to the input Data Set op.

Each Identifier Component can be specified only once.

The specified value must be an allowed value for identifier.

Behaviour

The operator returns a Data Set in a subspace of the one of the input Dataset. Its behaviour can be procedurally described as follows:

It creates a virtual Data Set VDS as a copy of op
It maintains the Data Points of VDS for which identifier = value (for all the specified identifier) and eliminates all the Data Points for which identifier <> value (even for only one specified identifier)
It projects out (“drops”, in VTL terms) all the identifier (s)

The result of the last step is the output of the operation.

The resulting Data Set has the Identifier Components that are not specified as identifier (s) and has the same Measure and Attribute Components of the input Data Set.

The result Data Set does not violate the functional constraint because after the filter of the step 2, all the remaining identifier (s) do not contain the same Values for all the Data Points. In other words, given that the input Data Set is a 1st order function and therefore does not contain duplicates, the result Data Set is a 1st order function as well. To show this, let K₁,…,Kₘ,…,Kₙ be the Identifier components for the generic input Data Set DS. Let us suppose that K₁,…,Kₘ are assigned to fixed values by using the subspace operator. A duplicate could arise only if in the result there are two Data Points DPᵣ₁ and DPᵣ₂ having the same value for Kₘ₊₁,…,Kₙ , but this is impossible since such Data Points had same K₁,…,Kₘ in the original Data Set DS, which did not contain duplicates.

If we consider the vector space of Data Points individuated by the n-uples of Identifier components of a Data Set DS(K₁,…,Kₙ,…) (along, e.g., with the operators of sum and multiplication), we have that the subspace operator actually performs a subsetting of such space into another space with fewer Identifiers. This can be also seen as the equivalent of a dice operation performed on hyper-cubes in multi-dimensional data warehousing.

Examples

Given the Data Set DS_1:

Input DS_1 (see structure)

Id_1	Id_2	Id_3	Me_1	At_1
1	A	XX	20	F
1	A	YY	1	F
1	B	XX	4	E
1	B	YY	9	F
2	A	XX	7	F
2	A	YY	5	E
2	B	XX	12	F
2	B	YY	15	F

Example 1

DS_r := DS_1 [ sub Id_1 = 1, Id_2 = "A" ];

results in (see structure):

DS_r
Id_3	Me_1	At_1
XX	20	F
YY	1	F

Example 2

DS_r := DS_1 [ sub Id_1 = 1, Id_2 = "B", Id_3 = "YY" ];

results in (see structure):

DS_r
Me_1	At_1
9	F

Example 3

DS_r := DS_1 [ sub Id_2 = "A" ] + DS_1 [ sub Id_2 = "B" ];

results in (see structure):

DS_r
Id_1	Id_3	Me_1
1	XX	24
1	YY	10
2	XX	19
2	YY	20