Talend DI Benchmark PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Talend Data Integration Benchmark

Talend Integration Suite 4.2 MPx July 25, 2011

© Talend 2011 1
Configuration
Hardware
• CPU: 2 x Intel Xeon E5320 (8 Cores), 1.8 Ghz
• RAM: 14 Gb
• HD: 1 Tb, 7200 Rpm

Software
• Operating System: Red Hat 64bits
• JVM: Sun 1.6
• MySQL 5.0 with MyISAM engine
• Edition: Talend Integration Suite 4.2 MPx

Data
• Files have been automatically (randomly)
generated

© Talend 2011 2
Scenario 1
File Input Delimited > File Output Delimited
Reading X lines from a delimited file and writing them in a delimited file without performing transformations.

# Rows  Processing time (sec)  Perf. (rows/sec) 


100,000    0.89 112,994   
1,000,000    5.24 190,694   
Flat file read/write 5,000,000    24.43 204,641   

+209K rows/s! 20,000,000    95.69 209,017   

© Talend 2011 3
Scenario 2 & 3
File Input Delimited > MySQL
Loading X lines from a delimited file into a MySQL database (parallelized).

# Rows  Processing time (sec)  Perf. (rows/sec) 


1,000,000    3.69 27,100   
5,000,000    22.162 45,122   
20,000,000    113.357 44,108   
Using the MySQL DB loader
# Rows  Processing time (sec)  Perf. (rows/sec) 
1,000,000    1.78 56,180   
5,000,000    18.83 53,107   
20,000,000    38.33 52,178   
© Talend 2011 4
Scenario 4
MySQL > File Output Delimited
Dumping the content of a MySQL table into a delimited file.

# Rows  Processing time (sec)  Perf. (rows/sec) 


1,000,000    1.25 80,000   
5,000,000    4.61 108,460   
20,000,000    12.731 78,548   

DB to flat file

+108K rows/s!
© Talend 2011 5
Scenario 5
File Input Delimited > Transform > File Output Delimited
Reading X lines from a delimited file and writing them into a delimited file after some transformations.

Transformations list:
• the `rate` field is multiplied by 100
• the new field `name` is a concatenation of fields (`firstname`+ « » +`lastname`)
• the `address` field is converted to uppercase

# Rows  Processing time (sec)  Perf. (rows/sec) 


100,000    0.96 104,384   
Moderately complex transformations 1,000,000    5.49 182,116   
2,000,000    10.58 188,982   

+200K rows/s! 5,000,000   


20,000,000   
25.35
99.84
197,262   
200,314   

© Talend 2011 6
Scenario 6
File Input Delimited > Sort > File Output Delimited
Reading X lines from a delimited file and writing a sorted delimited file.
The file is sorted ascending according to the `age` (integer) and `firstname` (string) fields.

# Rows Processing time (sec)  Perf. (rows/sec) 

Java Sort 100,000    1.75 57,274   


1,000,000    14.62 68,390   

+ 34K rows/s 5,000,000   


20,000,000   
150.98
581.96
33,116   
34,367   

© Talend 2010 7
Scenario 7
MySQL table > Transformation & lookup > MySQL table
Reading X lines from a MySQL table, performing transformations & lookup and
writing the result in a MySQL table leveraging the ELT mode.

# Rows Processing time (sec)  Perf. (rows/sec) 


ELT transformation & lookup 1,000,000    0.95 104,275   

+ 150K rows/s
5,000,000    8.09 61,759   
20,000,000    13.16 151,941   

© Talend 2011 8
Scenario 8
File Input Delimited > Sort > File Output Delimited
Reading X lines from a delimited file and writing a sorted delimited file.
The file is sorted ascending according to the `age` (integer) and `firstname` (string) fields.

# Rows Processing time (sec)  Perf. (rows/sec) 

Java Sort 100,000    1.75 57,274   


1,000,000    14.62 68,390   

+ 34K rows/s 5,000,000   


20,000,000   
150.98
581.96
33,116   
34,367   

© Talend 2011 9
Scenario 8 / MPx
File Input Delimited > Sort > File Output Delimited
Sorting a X lines delimited file leveraging MPx technology.
The file is sorted ascending according to the `age` (integer) and `firstname` (string) fields.

# Rows Processing time (sec)  Perf. (rows/sec) 

MPx Sort 100,000    0.51 194,932   

350K rows/s!
1,000,000    2.81 354,862   
5,000,000    13.60 367,512   
20,000,000    57.22 349,485   

© Talend 2011 10
Scenario 9
File Input Delimited > Aggregate > File Output Delimited
Reading X lines from a delimited file, making an aggregation and writing the result in a delimited file.
The aggregation is based on the `age` field; Operations: COUNT, SUM(rate), AVG(rate), MIN(rate), MAX(rate)

# Rows  Processing time (sec)  Perf. (rows/sec) 


100,000    0.97 103,093   
Aggregation
1,000,000    4.645 215,285   

+250K rows/s! 5,000,000   


20,000,000   
20.46
79.106
244,379   
252,825   

© Talend 2011 11
Scenario 10
File Input Delimited > File Output Delimited
Reading X lines from a delimited file, making a lookup on a 1 million rows file, filtering and writing results in
different delimited files.

# Rows  Processing time (sec)  Perf. (rows/sec) 


Flat file split & lookup 100,000    0.92 108,696   

+200K rows/s! 1,000,000   


5,000,000   
6.24
26.43
160,256   
189,179   
20,000,000    99.69 200,622   

© Talend 2011 12

You might also like