Design and Programming for Multicore machines: An Empirical study on time and effort required by programmer

. As the demand for high-performance computing continues to surge, harnessing the full potential of multicore architectures has become paramount. This paper explores a pragmatic approach to transition from sequential to parallel programming, capitalizing on the computational prowess of modern hardware systems. Recognizing the challenges of enforcing parallelism in early software development phases, we advocate for a focus on the implementation stage, where architects, designers, and developers can seamlessly introduce parallel constructs while preserving software integrity. To facilitate this paradigm shift, we introduce the "SDLC model with Parallel Constructs," a modified Software Development Life Cycle (SDLC) framework comprising additional phases: "Parallel Constructs" and "Test Parallel Constructs." This model empowers development teams to integrate parallel computing efficiently, enhancing performance and maintaining a structured development process. Our observations reveal intriguing dynamics. Initially, the single-threaded program outperforms its parallel counterpart for smaller datasets, but as data sizes grow, the parallel version demonstrates superior performance. We underscore the pivotal role of available CPU cores and task partitioning in determining efficiency. Our analysis also evaluates the programmer's effort, measured by lines of code, needed for the transition. Leveraging OpenMP constructs streamlines this transition, reducing programming complexity.


Introduction
Traditionally, problem-solving through programming has followed a linear path: design a solution, implement it as a sequence of operations or statements executed from START to END.This approach was well-suited to the computing hardware of the past decade, which consisted primarily of single-core machines.These machines executed a single instruction cycle per clock cycle, allowing for single-threaded execution at any given time.
However, as software applications grew in complexity, demanding increased performance and the ability to perform multiple tasks simultaneously, hardware designers and architects faced a significant challenge.They responded by packing more transistors onto a single chip, a trend famously recognized as Moore's Law.Advances in nanotechnology further reduced transistor size, facilitating the integration of even more transistors onto a single chip.Nevertheless, a point was reached where adding more transistors became impractical, resulting in excessive power consumption and limited transistor lifespan.This technological impasse prompted a shift in focus toward a new paradigm: the integration of multiple cores on a single die, giving rise to multicore machines.The last single-core machine was released in 2013, marking the ubiquity of multicore machines across a wide range of electronic devices, from embedded systems to smartphones, workstations, and data servers.
While context switching between application threads was introduced in single-core systems to enhance application performance, it proved insufficient for handling the demands of increasingly complex applications.Single-core machines could execute only one thread at a time, and this limitation became apparent when running large applications.True parallelism was lacking, and other threads had to wait their turn, exacerbated by faster components elsewhere in the system, causing a bottleneck in processing speed.However, it's important to note that single-core processors still offer advantages.They tend to have lower power consumption and manufacturing costs.They are well-suited for embedded systems designed for specific tasks.
To address the performance requirements of modern software applications, such as big data analysis and weather forecasting, innovative approaches are essential.These include designing more efficient algorithms, introducing domain-specific programming languages, and developing domain-specific chips or systems.
In this paper, we delve into strategies for fully harnessing the power of multicore machines from a software development [6] perspective.We recognize that it may not always be practical to force software analysts, designers, and architects to think exclusively in terms of parallel or concurrent execution, as it may lead to suboptimal designs.Instead, we explore the use of compilers, open-source tools, and software libraries that provide directives, APIs, and packages for parallel computation [7].By leveraging these resources and making minor adjustments to traditional sequential programs, we can optimize program performance [8] and make the most of multicore systems.

Related work
In recent, experiments have been carried-out to evaluate the performance of multicore machines and GPU's as well.The measurable variables included are CPU cores, Number of threads, execution time, shared resources of CPU and parallel program constructs itself.The Table 1 gives an overview of some of the studies and summarizes it.

Proposed Methodology
During the software development phase, the primary emphasis lies on comprehending the problem, collecting requirements, and subsequently, the design phase.At this juncture, it's often impractical to mandate architects and designers to immediately embrace parallel computation concepts.Instead, we advocate for shifting this focus to a later stage: the implementation phase.To facilitate this transition, we've redefined the traditional Software Development Life Cycle (SDLC) model, introducing two additional phases-namely, "Parallel Constructs" and "Test Parallel Constructs."This modified model is referred to as the "SDLC model with Parallel Constructs," and Figure 1 below illustrates this updated SDLC framework.Under the adapted SDLC model, software development models and methodologies can be suitably revised to incorporate parallel constructs within the implementation and testing phases.During the testing phase, our objectives expand beyond functional testing.We aim to measure and quantify the efficiency gains resulting from the integration of parallel constructs.Additionally, we rigorously assess for any potential memory leaks that may arise.
Measuring the effort required by software engineers to transition from a sequential algorithmic design to a parallel execution paradigm is paramount.Two key metrics for this assessment are the Number of Lines of Code (LOC) and the time spent by individuals on inserting parallel constructs, considered a complex and low-level language task.

Methodology Steps
The overarching steps of this methodology are presented below: I.
For any software project, adhere to the standard SDLC process, encompassing problem understanding, requirement collection, and the design of the solution.II.
Document the corresponding algorithm as the proposed solution for the given requirements.Conduct comprehensive performance assessments to gauge the effectiveness of the application's parallelized components.V.

ITM Web of
It's crucial to note that, within this methodology, modifications are made exclusively during the implementation phase.This ensures that the correctness and completeness of the software solution are maintained throughout the development cycle.
By following these methodological steps, software development teams can efficiently incorporate parallel computing concepts into their projects, optimizing application performance while preserving software integrity and completeness.

Case study:
An empirical study was conducted to investigate the impact of integrating parallel constructs into a sequential program.The selected program for this study is scalar multiplication, a classic example involving sequential array operations and multiplication.The speed-up in each case is calculated using Amdahl's law and presented in Table 3. Speed-up is determined by dividing the time taken by the original program (T(O)) by the time taken when OpenMP is added to the original program (T(N)). 7
Initially, the execution time of the single-threaded program outperforms the parallel version when the array size is smaller.However, as the array size increases, the parallel construct of the program exhibits better performance.II.
Generally, when the array size is substantially large, the parallel version of the program demonstrates a significant speedup, being around fifty times (according to 7 th case where n = 100000000) faster than the single-threaded program.III.
The number of parallel threads executed is constrained by the number of available CPU cores in the hardware system.IV.
The efficiency of the parallel execution also relies on how the programmer partitions the task into multiple sub-tasks and assigns them to parallel threads.When sub-tasks have uniform workloads, all threads may complete their tasks simultaneously, fully utilizing the available CPU cores.However, imbalanced workloads can lead to some threads finishing earlier, leaving certain cores underutilized.V.
The parallel version of the program effectively leverages all available CPU cores within the hardware system, in contrast to the single-threaded program.VI.
The additional effort required by the programmer is quantified by comparing the lines of code between the single-threaded and parallel-threaded implementations.VII.
In this case, the OpenMP construct is employed to transition from a single-threaded execution model to a multi-threaded parallel execution model.This approach minimizes the programming effort required, streamlining the parallelization process.
ITM Web of Conferences 57, 01016 (2023) ICAECT 2023 https://doi.org/10.1051/itmconf/20235701016 These observations provide valuable insights into the performance characteristics and tradeoffs between the sequential and parallel implementations of your program.Including specific speedup factors and details about the hardware configuration would enhance the completeness of your findings.

Conclusion
In this paper, we embarked on a journey to harness the untapped power of multicore computing architectures from a software development perspective.As software applications continue to grow in complexity and demand heightened performance, the need to embrace parallel computing methodologies becomes increasingly critical.However, transitioning from sequential to parallel programming is not without its challenges.To facilitate this paradigm shift, we introduced the "SDLC model with Parallel Constructs," a modified Software Development Life Cycle model.This model introduces two additional phases: "Parallel Constructs" and "Test Parallel Constructs."By doing so, it enables software development teams to seamlessly integrate parallel computing into their projects while maintaining a structured and well-defined development process by following the methodologies and insights presented in this paper, software development teams can unlock the latent potential of multicore systems, achieving enhanced performance, efficiency, and competitiveness in a rapidly evolving computing landscape.

Figure 4 :
Figure 4: Comparison of Execution Times: Original Program vs. Parallel Construct Program.

Table 1 :
Summary of Parallel Methodology adopted by researchers.
During the implementation phase, integrate parallel constructs tailored to specific domains or those with which software developers are proficient.These constructs can be sourced from various organizations and open-source communities.IV.

Table 2 :
Execution time of sequential and parallel program using OpenMP

Execution time T(O) in secs. of original program Execution time T(N) in secs. of modified program with OpenMP construct
Figure 4 displays the data from Table2in graphical form.The scale factor is le8.

Table 3 :
Speedup achieved by varying the array sizes.

Table 4 :
Comparison of Lines of Code