Learn the basics of ECL, the powerful programming language built for
big data analytics.
File type
There are two kinds of operations in ECL, definition (Definitions)
and execution (Actions). After using EXPORT to execute the definition
operation, it will no longer be able to perform operations in this
file.
Similarly, ECL has two kinds of files. Their suffixes are
.ecl. Definitions file and build execution (BWR, Builder
Window Runnable) file. The difference is
The definition file always contains EXPORT and SHARED definitions
and contains no execution operations. Therefore, the file cannot be
executed through Submit.
The BWR file contains at least one execution operation and no EXPORT
and SHARED definition operations.
Define variables
Variable names cannot have spaces and end with a semicolon. Format
for defining variables:
Among them, n indicates that this integer occupies number of
bytes, which can be 1~8. The default is 8.
IntType describes whether the high bit of the number is at
the low address or the low bit is at the low address. Can take either
BIG_ENDIAN or LITTLE_ENDIAN. Default is
LITTLE_ENDIAN.
UNSIGNED, used to describe whether it is signed or not, the default
is signed.
REAL[** n **] represents a floating point
number, n can be 4 (7 significant figures) or 8 (15 significant
figures)
Set
All elements must be of the same type.
Example:
1 2 3
SetInts := [1,2,3,4,5]; // an INTEGER set with 5 elements SetExp := [1,2+3,45,SomeIntegerDefinition,7*3]; SetSomeField := SET(SomeFile, SomeField);
SET can be accessed by subscript, subscript starts from
1
1 2 3
MySet := [5,4,3,2,1]; ReverseNum := MySet[2]; //indexing to MySet's element number 2, //so ReverseNum contains the value 4
Strings are treated as SET with multiple 1-character elements, so
they can also be accessed by subscript
1 2
MyString := 'ABCDE'; MySubString := MyString[2]; // MySubString is 'B'
Strings support range access:
1 2 3 4
MyString := 'ABCDE'; MySubString1 := MyString[2..4]; // MySubString1 is 'BCD' MySubString2 := MyString[ ..4]; // MySubString2 is 'ABCD' MySubString3 := MyString[2.. ]; // MySubString3 is 'BCDE'
The data type in the Set can be specified:
1 2 3 4 5 6
SET OF INTEGER1 SetValues := [5,10,15,20];
IsInSetFunction(SET OF INTEGER1 x=SetValues,y) := y IN x;
How to use: EXPORT[ VIRTUAL ]definition, only one EXPORT Module is allowed
in each file, and the name of this Module must be the
same as the file name.
VIRTUAL is optional. If specified, the definition is only valid
within the Module. Allows usage as Module.Definition from other
files.
EXPORT allows nesting. If you want to access a value in Module from
another file, this value must also be modified by EXPORT.
Enums can be useful when you want to represent a limited set of
possible values for a variable or a parameter. For example:
1 2 3 4
Color := ENUM(RED=1, GREEN=2, BLUE=3); myColor := Color.RED;
OUTPUT(myColor); # 1
RECORD
A RECORD in ECL represents the structure or format of a
dataset. It is similar to the concept of a "table" in a SQL database,
where each field in the record is similar to a column in the table. It
defines the data types and names of fields.
For example:
1 2 3 4 5 6 7 8 9 10 11
ChildRec := RECORD UNSIGNED4 person_id; STRING20 per_surname; STRING20 per_forename; END;
rec := RECORD STRING20 name; INTEGER4 age; BOOLEAN isEmployed; END;
usually used with DATASET.
DATASET
It represents a set of data. A dataset is a group of records with the
same record layout. A record layout is defined using the
RECORD structure, which contains a set of fields, each with
a name and a type.
For example, to output 111 to the test panel, you can write:
1
OUTPUT(1111, NAMED('test'));
TABLE
Used to create a new dataset (Dataset). The TABLE
function takes a set of records (each defined by a RECORD
structure) and an optional filter condition, and returns a new
dataset.
In PROJECT, the TRANSFORM operation is performed on each piece of
data, and then a new data set is obtained, which is one-to-one in
quantity. And NORMALIZE is to expand a single piece of data into
multiple pieces of data. In some cases, you may have a field that
contains repeated data, and you may wish to split each repetition into a
separate record. In this case, you can use the NORMALIZE
function. The NORMALIZE function is used by receiving a
dataset and a TRANSFORM function and returning a new dataset. In the
conversion function, you need to define how to split the original
records into new records.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Layout := RECORD STRING10 name; INTEGER4 times; END;
ds.times indicates the number of times to repeat, and
the TRANSFORM function defines how to convert the original
record into a new record. In the output dataset, John and
Jane will appear 3 and 2 times, respectively. Note that in
the NORMALIZE function, for the current data, use LEFT
reference.
INTEGER4 MySquare(INTEGER4 val) := EMBED(Python) return val * val ENDEMBED;
OUTPUT(MySquare(5)); // 25
LOCAL
The LOCAL keyword is mainly used to limit the scope of
data or functions, or to control how data is distributed in the
cluster.
If a definition (for example, a dataset or function) is declared
LOCAL, then this definition is only visible in the ECL
statement in which it is declared. This is similar to local variables in
other programming languages. For example:
1 2 3 4
ECLCopy codemyFunction := FUNCTION LOCAL myLocalValue := 5; // Only available in this function RETURN myLocalValue * 2; END;
Additionally, when the LOCAL keyword is used on a
dataset, it indicates that the dataset should be computed locally on
each node, rather than across the entire cluster. This can be useful for
reducing network communication and speeding up calculations. For
example, the following ECL statement creates a dataset that is computed
locally on each node:
In this example, myRecordDef is a record definition that
describes the structure of each record in the dataset. Each node will
process a portion of this dataset, not the entire dataset.
ASSERT
It is often used to judge whether a function has obtained the
expected result.