Development of solutions and investigation of new methods for resource analysis and optimization in MapReduce/Hadoop and Spark environments. Investigation of performance indicators and optimization parameters of large scale computing environments in real world class problems involving data science and machine learning. Problems consist of classification, clustering, regression, data streaming, and prediction, in the presence of high volumes of data.
This project aims at analysing the capabilities and weaknesses of current high computing platforms and applying machine learning techniques such as neural networks, bio-inspired algorithms and classical statistics to optimize system parameters in different class of problems. Paradigms, frameworks and libraries such as Hadoop/MapReduce, Spark, Open Source, Flink, Storm e H2O, Apache MLLib, Mahout and SAMOA are investigated. Artificial intelligence techniques such as Support Vector Machines, Evolutionary Algorithms, Particle Swarm Optimization, Cuckoo Search, Ant Colony Optimization, Bee Algorithm, Bat Algorithm, Firefly Algorithm and Logistic Regression are also considered as machine learning techniques for optimization.