Subspace: sub
Syntax
op [ sub identifier = value { , identifier = value }* ]
Input parameters
op |
dataset |
identifier |
Identifier Component of the input Data Set op |
value |
valid value for identifier |
Examples of valid syntaxes
DS_r := DS_1 [sub Id_2 = "A", Id_5 = 1 ]
Semantics for scalar operations
This operator cannot be applied to scalar values.
Input parameters type
op
dataset
identifier
name<identifier>
value
scalar
Result type
result
dataset
Additional Constraints
The specified Identifier Components identifier (s) must belong to the input Data Set op.
Each Identifier Component can be specified only once.
The specified value must be an allowed value for identifier.
Behaviour
The operator returns a Data Set in a subspace of the one of the input Dataset. Its behaviour can be procedurally described as follows:
It creates a virtual Data Set VDS as a copy of op
It maintains the Data Points of VDS for which identifier = value (for all the specified identifier) and eliminates all the Data Points for which identifier <> value (even for only one specified identifier)
It projects out (“drops”, in VTL terms) all the identifier (s)
The result of the last step is the output of the operation.
The resulting Data Set has the Identifier Components that are not specified as identifier (s) and has the same Measure and Attribute Components of the input Data Set.
The result Data Set does not violate the functional constraint because after the filter of the step 2, all the remaining identifier (s) do not contain the same Values for all the Data Points. In other words, given that the input Data Set is a 1st order function and therefore does not contain duplicates, the result Data Set is a 1st order function as well. To show this, let K₁,…,Kₘ,…,Kₙ be the Identifier components for the generic input Data Set DS. Let us suppose that K₁,…,Kₘ are assigned to fixed values by using the subspace operator. A duplicate could arise only if in the result there are two Data Points DPᵣ₁ and DPᵣ₂ having the same value for Kₘ₊₁,…,Kₙ , but this is impossible since such Data Points had same K₁,…,Kₘ in the original Data Set DS, which did not contain duplicates.
If we consider the vector space of Data Points individuated by the n-uples of Identifier components of a Data Set DS(K₁,…,Kₙ,…) (along, e.g., with the operators of sum and multiplication), we have that the subspace operator actually performs a subsetting of such space into another space with fewer Identifiers. This can be also seen as the equivalent of a dice operation performed on hyper-cubes in multi-dimensional data warehousing.
Examples
Given the Data Set DS_1:
Input DS_1 (see structure)
Id_1 |
Id_2 |
Id_3 |
Me_1 |
At_1 |
---|---|---|---|---|
1 |
A |
XX |
20 |
F |
1 |
A |
YY |
1 |
F |
1 |
B |
XX |
4 |
E |
1 |
B |
YY |
9 |
F |
2 |
A |
XX |
7 |
F |
2 |
A |
YY |
5 |
E |
2 |
B |
XX |
12 |
F |
2 |
B |
YY |
15 |
F |
Example 1
DS_r := DS_1 [ sub Id_1 = 1, Id_2 = "A" ];
results in (see structure):
Id_3 |
Me_1 |
At_1 |
---|---|---|
XX |
20 |
F |
YY |
1 |
F |
Example 2
DS_r := DS_1 [ sub Id_1 = 1, Id_2 = "B", Id_3 = "YY" ];
results in (see structure):
Me_1 |
At_1 |
---|---|
9 |
F |
Example 3
DS_r := DS_1 [ sub Id_2 = "A" ] + DS_1 [ sub Id_2 = "B" ];
results in (see structure):
Id_1 |
Id_3 |
Me_1 |
---|---|---|
1 |
XX |
24 |
1 |
YY |
10 |
2 |
XX |
19 |
2 |
YY |
20 |