Big Data Benchmark Repository and Performance Analysis
Fork me on GitHub

Welcome to the ALOJA project,

ALOJA is an initiative of the BSC-MSR research centre in Barcelona to explore Hadoop's performance under different deployment scenarios. You can find introductory Slides and Papers in the ALOJA Reference menu.

This site is under constant development and it is in the process of being documented. For now these Slides and Papers in the Feel free to browse the site, the code, and send inquiries, feature requests or bug reports to: hadoop@bsc.es

If you're curious about the name of the project, visit ALOJA


Site's content:

Video DEMO of ALOJA Brief video showcasting ALOJA's main online features (a bit outdated).
Benchmark Executions
This sections presents the benchmark execution repository. It features more than 30,000 executions and counting.
This tool allows you to browse, filter, search, and select distinct executions to compare and analyse its execution details.
Hadoop Job Counters
The Hadoop Job Counters sections allows to browse the counters output at each of the Hadoop executions, filter them, and to order by a specific counter the selected runs (or all).
The section presents the summary of all the Job execution counters, Map and Reduce specific counters, and the I/O subsystem counters.
It also features the details by task: to understand the running time of each Map or Reduce process.
Best Configuration
This section allows the user to see the best configuration found for a given benchmark. It also allows to find it filtering some parameters such as kind of cluster, number of mappers, block size, etcetera.
Config improvement
The Configuration Improvement sections evaluates the SPEED-UP improvement by different Hardware and Software configurations of the Hadoop executions.
The page allows to filter and group results according to their execution configurations.
Cost/Perf Evaluation
The Cost/Performance evaluation tool, presents a cloud of points of the different Hadoop executions and evaluates the cost-effectiveness of both:

- the Hardware e.g., SSDs, InfiniBand
- to the Hadoop configuration e.g., Number of concurrent mappers, replication, or compression.
Parameter Evaluation
The Parameter Evaluation presents a column chart to see how much time it took executions to ran for each possible value of a selected parameter. In the picture an example of how the number of mappers affect the execution time is given.
Performance Charts
The Performance Charts sections allows first to get a visual glance at the full Hadoop execution of the selected runs by analyzing each of the Hadoop phases: map, merge, shuffle, and reduce. It also allows to dig deeper into the performance metrics and results for the executions, in order to understand the amount of resources use, and contrast visually the difference between the selected executions.
Analysing: CPU utilization, # of processes and context switches, Memory and Paging, Networking, and the I/O subsystem.
Performance Metrics
The Performance Metrics section shows all the performance metrics collected during benchmark's executions, allowing the user to see how many CPU, Network, I/O and Memory was used.