Skip to main content

Week 6 :- CERN-HSF @ GSoC 2018

This week, I worked on adding Job Attributes like Owner, Job Group, Running Time etc. to ElasticSearch.

The main idea behind such a move is that in future we would like to see these attributes as part of Job Parameters. This would allow for easier and efficient queries as it these attributes are commonly accessed in most of the functions.

The changes/additions made to the existing code are:

1. Modified "setJobParameter" to receive these parameters as keywords, meaning they can be sent to the function but not necessarily required. The index has been modified with additions of the following attributes:

  • Job Group
  • Owner
  • Proxy
  • Submission Time
  • Running Time
2. Add "getJobParametersAndAttributes" function in order to access results based on JobID, containing both Parameters (Name, Value) and Attributes (all five mentioned above).

Along with this, existing codes were modified to use these functions, as well as some issues, were fixed in tests, Monitoring etc.

All my commits can be found below:
  1. Add attributes to ES: 677ad2333d031c82e502aba20d48ac3118ec3ddb
  2. Fix issues in Monitoring and MySQL: 61180f8da0a6bd6d4da3c13203a5449a81f93166
That's all done for the week 😄.


Comments

Popular posts from this blog

My GSoC experience with CERN-HSF, Summer 2018

The Google Summer of Code (GSoC) program is one of the most prestigious programs for student developers, who are eager to work and demonstrate their skills in a full-fledged working environment. The program gives this opportunity to students all around the world and I feel good that I was selected for this program with the CERN-HSF organization, one of the pioneer organizations for nuclear research. My experience started before the GSoC program, with the first contact with the current mentor, Federico Stagni where we discussed the proposed project and my suitability for it by performing some tasks that are available here . Based on performance and my timelines, I submitted my proposal which went through multiple revisions with inputs from my mentor Federico and self-improvements. Finally, almost after a month of wait, students selected for GSoC 2018's program were announced and found myself fortunate to be selected with the following project with CERN-HSF:-  Monitoring and...

Week 9 :- CERN-HSF @ GSoC 2018

This week I was involved with working on Pull Request 3744  where some changes were demanded by other members of the organization as per the requirements of other components of the project. Firstly, as we moved towards adding a Jobs Status table (discussed in another post of the blog), which has eased and increased efficiency for query processing, we need to take care of the modules that accessed that table during any of there functions. Hence, it became important to analyze and test every module available in the Workload Management System as well as other modules that are accessing the table. A string of commits can be found below for the work done during this week: 0b1a63594eec932c12669d3274d5da705d9e2ffb fdda6f1fa473f866f0b552b838761f1201bc3c5b All the commits related to this issue can be found in this Pull Request .

Week 5 :- CERN-HSF @ GSoC 2018

This week I was involved with documentation and testing of the two DB backends: MySQL and ElasticSearch. Documentation: Added developer documentation for using the ES backend for Job Monitoring and Job Status Update. Includes all the functions of ES and in general full documentation of the Workload Management System (not present before in developer documentation). Performance Tests: It is very important that we are at least able to achieve similar performance while using ES backend when compared to MySQL DB. It should also be remembered that the main objective of moving to ES backend is efficient data management as currently, MySQL doesn't store all the incoming data and query processing also becomes difficult in that case. After implementing and using both backends individually, I tested both the DBs performance using multi-mechanize ( link ), a tool which helps in running concurrent process on a specified number of threads. Some of the results can be found here for ...