American Journal of Operational Research
p-ISSN: 2324-6537 e-ISSN: 2324-6545
2025; 15(1): 1-14
doi:10.5923/j.ajor.20251501.01
Received: Dec. 16, 2024; Accepted: Jan. 7, 2025; Published: Jan. 13, 2025

Emmanuel Siyawo Mvere1, Devkumar Shahill Callychurn1, Dinesh Kumar Hurreeram2
1Mechanical and Production Engineering Department, University of Mauritius, Moka, Mauritius
2University of Technology Mauritius, La Tour Koenig, Pointe aux Sables, Mauritius
Correspondence to: Emmanuel Siyawo Mvere, Mechanical and Production Engineering Department, University of Mauritius, Moka, Mauritius.
| Email: | ![]() |
Copyright © 2025 The Author(s). Published by Scientific & Academic Publishing.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/

The k-median algorithm is a well-known combinatorial optimization problem. k-median problem is essentially the p-median problem that has been around for almost 60 years. However, because of its usefulness it has been found relevant in many modern-day applications such as advanced drone technology, self-driving vehicles, image processing and many other avenues that have to do with efficient distribution of resources, were outliers must be eliminated. Research work on this problem is on-going. The level of hardness of this problem, the approximability lower bound of this problem, has not yet been achieved. To the best of our knowledge all known prior algorithms solving the k-median problem used the input of a distance matrix. The study introduces the ESMVERE k-median problem solver, which replaces traditional distance matrices with GPS-based Euclidean distance calculations, simplifying data input and reducing computational complexity. This research examines the challenge of hazardous waste management from Used Lead Acid Batteries (ULABs) in Mauritius, a pressing public health and environmental concern due to the risks of lead exposure. The study’s key contribution is the practical adaptation of theoretical k-median frameworks to address environmental challenges, offering a replicable model for hazardous waste management in other contexts. It highlights the potential for mathematical modelling to drive practical solutions to global environmental challenges.
Keywords: k-median problem, p-median problem, TSP problem
Cite this paper: Emmanuel Siyawo Mvere, Devkumar Shahill Callychurn, Dinesh Kumar Hurreeram, The Near Optimal Siting of Hazardous Waste (Used Lead Acid Battery (ULAB)) Collection Facilities in the Republic of Mauritius Using the ESMVERE CPLEX k-Median Solver Algorithm, American Journal of Operational Research, Vol. 15 No. 1, 2025, pp. 1-14. doi: 10.5923/j.ajor.20251501.01.
![]() | Figure 1. k-median-instance |
, is a set of potential collection points for used or failed vehicle batteries, the
set of clients, is a set of people with used or failed vehicle batteries which need to be sent to the nearest collection center to be recovered or replaced or recycled, and finally the X data set, is a data set of all the potential collection facility locations and client locations that need to be partitioned into k clusters. If
[13]partitions a set of geometric points into clusters, the objective function of the k-median algorithm thus measures how well the X data set can be partitioned into k clusters. Thus, the dataset X contains the whole set of facilities and the whole set of clients combined. While
is the subset of
that consists of the optimum set facilties to be opened out of the whole set of facilities
, these characterise the solution set. While the modulus of these optimum set of facilities is the upper bound k which is the limit on the number of facilities which are required to be opened, thus the solution set of facilities opened at the given number of k facilities.The Euclidean distance metric is used to solve the k-median problem in this thesis which is also known as the 2-norm distance. This direct line distance is a generalization of the Pythagorean theorem to more than two coordinates. If a straight line calculates the distance between two points, the Euclidean distance or direct line distance is obtained. This is the most common distance measure and, in this case, the most available. ![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
![]() | (5) |
![]() | (6) |
,
), where
is the set of facilities and
is the set of cities [8]. However, in the real world this instance is not yet created. This could be one of the most time-consuming and complex steps in solving this problem practically.Cases in which these algorithms have been used to solve real world problems, and in which they have been analyzed confronting the real challenges in the world and achieving satisfying results that prove their guarantees are missing. The application of these algorithms in a real-world setting will achieve more in terms of revealing the limitations and boundaries of the algorithms. When they are analyzed from the basis of reality a further extension of the qualities of the algorithms may give way, and be revealed, this gives a chance for modifications and improvement.When solving the problems practically the instance has got to be created, and for the algorithms to work on the instance, the instance would have to fit the model and the assumptions under which the algorithm was developed. This could give rise to cases were an instance might be too complex to develop accurately to solve the problem. The choice of data which constitutes the set of facilities and the data which constitutes the set of clients in a real-world environment must be carefully determined from multiple possible of choices to accurately solve the problem. The worst-case analysis might prove to have an even worse case where an instance cannot be created for the specific case, thus the need for worked solutions. Appendix C to E displays the journey that was taken to create the right instance using ArcGIS.
(a)This would be impractical in many cases in the real world where there might be hundreds or thousands of location points, and the distances that make each pair of distances. Thus, a new model is developed that does not need the input of the distance matrix but uses the formular for the Euclidean distance between two points and the input of actual GPS coordinate location points to solve the k-median problem. The formular for the Euclidean distance between two points is:
(b)The k-median Cplex Opl Euclidean distance formular developed for this study
(c)
(d)The new method solves the k-median problem by making a choice between facility locations and client locations, without considering the distance between all the points as a distance matrix [22]. The newly developed method only requires the GPS coordinate location points. This means that it opens the best out of n potential facility locations and assigns only m clients to these open facilities per calculation. It is a heuristic solution technique, an Integer binary linear program with binary variables that solve the k-median problem. The new method could also be a useful contribution in k-median approximation algorithms, where it could contribute to a faster approximation algorithm for the k-median problem. ![]() | (6) |
![]() | (7) |
![]() | (8) |
![]() | (9) |
![]() | (10) |
where
is an element of the set of Facilities
and
is an element of the set of client location points
While for the TSP problem (eq7) there is only one containing set of location points
where
is an element of the set of start points location for the path of an edge and
is an element of the set of the end point locations of an edge. These are all contained in the same set of locations
however in an iteration the same location picked for the beginning point of an edge, node or path cannot be the same location picked for an ending point of the same edge at the same time, there would be no distance, “zero distance” because it starts where it ends**(j≠i). Thus, the objective is to derive the shortest possible distance that runs through all the locations from all the edge distances, these being in the same set. The observations above lead to an experiment in which the Location set A, of the TSP by [4], in which
was separated into two arrays that conform to the k-median problem. Location sets
and
in which
are on Figure 2 renamed to ulabfacilitylocation for
while the array
is labeled ulabcitylocation.The first constraint of the k-median problem (eq2), which states that every client must be connected to a facility, is comparable to the first constraint of the TSP (eq8) which states that every starting point location must be connected to an ending point location. From the similarity and the prior changes made to the sets of location points, it is possible to convert these constraints to the other in the CPLEX Opl language.The second constraint of the k-median problem (eq3), which states that every client must be connected to an open facility, is comparable to the second constraint of the TSP (eq9) which states that every ending point location must be connected to a starting point location which simply reverses the same connection as the first connection. Thus, for the TSP every starting location point in the set must be connected to its closest location ending point while each location ending point must be connected to a starting point through all the locations in the set. The TSP subtour elimination constraint (eq10) is completely removed when converting an algorithm to the k-median problem because it is not relevant or applicable in this case. Thus, remaining only, the non-negativity constraints (eq4) and (eq5) can be included. The similarity in the calculation is that both the k-median and TSP seek to minimize the distance between location points. Thus, from the similarity and the prior changes made to the sets of location points, it is possible to convert these constraints either way as algorithms in the CPLEX Opl language. Arrays of the facility locations and opening costs were added, these enabled the costs of the edges to be computed from each facility to each client. The changes have developed an entirely new software solution to the UFL problem.The mathematical formulation of the p-median problem is [5]:![]() | (11) |
![]() | (12) |
![]() | (13) |
![]() | (14) |
![]() | (15) |
![]() | (16) |
where
is an element of the set of Facilities
and
is an element of the set of client location points
. Thus, the objective of the p-median problem equation (eq11) and that of the k-median problem equation (eq1) are identical in that respect. The p-median (eq11) considers summation of the total distances from each client to a facility nearer to it. However, in this special case (eq11) by [5] they include the demand from each client. The demand in the case of ULABs is unknown as this is the first time that this project is proposed hence in this case it is irrelevant, hence it is removed from the objective of the p-median in CPLEX Opl. Thus, the remaining are the variables are
which calculate the total distances from each client to a facility nearest to it, and the decision variable
which indicates if a facility is open or not.The first constraint of the k-median problem (eq2), which states that every client must be connected to at least 1 facility, is comparable to the first constraint (eq12) for p-median which also states that every client must be connected to a facility. The second constraint of the k-median problem (eq3), which states that every client must be connected to an open facility, is comparable to the second constraint (eq14) of the p-median which states the same. They are identical in that if a client is connected to a facility the decision variable
then that facility must only be open
Thus, in both cases the client cannot connect to a closed facility because a facility only opens if it is in the minimized solution set of facilities.The remaining constraint for the p-median equation (eq13) compares to the k-median constraint (eq 4). The inequality in (eq4) proves their difference where for the k-median it means it can open less than or equal to k facilities. While this constraint (eq13) forces the p-median to open exactly p number of facilities as shown by the equal’s sign. Thus, a solution can be developed which combines the p-median and the TSP problem solution algorithms developed in CPLEX Opl and converts part of their code to produce a k-median problem solution.The p-median constraints where then modified to suit the k-median constraints as shown in Appendix A. Appendix A shows the newly developed algorithm that draws from the data file on Figure 2.
the number of (ulabfacilities) is 7 in the test case, and the value of
the number of ulabcities is 21 in this test case. The value of
on this diagram is given as 6 this is just as an example on this diagram. The other diagrams display a solution for k as 6. It also contains the values of the two arrays, ulabcityLocation and ulabfacilityLocation which host the
and
coordinates of ulabcity locations and ulab collection facility locations. To develop the test coordinates an x axis and a y axis were placed on a printed copy of the diagram using a 30-centimeter ruler.It also contains the value of
in which is the input of the limit of ulab collection facilities. The objective of the k-median problem only considers minimizing the average sum of distances from each (ulab city) to its nearest open (ulab collection facility). The last constraint is a k-median constraint that places a limit on the number of facilities to be opened. If one only considers the distance from a facility to a city and one tries to minimize the total distance without considering the cost of opening a facility, then the more facilities one has, the better, the objective value. With more facilities, there are more chances for the cities to find a facility closer to them. The answer is the more facilities there are, the better. If there are maximum 7 facilities, then the answer for the best k is 7, If there are 20, then the answer is 20. Because there are no costs for opening the facilities, the main reason is to establish the locations which are the median facility.Thus Figure 2 is the data file for input of data.![]() | Figure 2. Prototype data |
. After running the data on Figure 2 in the CPLEX Optimization Engine, the solutions to the decision variables are displayed as on Figure 3. On the left half side of the diagram the CPLEX Engine displays a Solution with objective 45.043, this is the minimum cost solution of the k-median Problem calculation please note it also displays the input value of k as 6.The minimum TotalDistance of 45.043 centimeters. The solution to the decision variable
is displayed on the right half of the diagram Figure 3. This side displays a list of all the edges
This displays edges (size147), meaning in combination there are 147 edges between the facilities and the clients. ![]() | Figure 3. Variable x |
. The solution to the decision variable
is displayed on the left half of the diagram Figure 4 as an array with seven numbers or seven locations
. Thus
is a boolean decision variable, which places a 1 on an open facility and a 0 on a closed facility. Thus, the solution in Figure 4 opens the ulabfacility locations 1,2,4,5,6 and 7.![]() | Figure 4. Variable y |
is also displayed on the right half of the diagram Figure 4. This side displays a list of all the facilities
, this displays ulabfacilities(size7), meaning there are 7 facilities listed down from facility 1 down to facility 7. The corresponding boolean value of the facility is displayed to the right side as either a 0 or a 1, for a closed or open facility respectively. The right half of the diagram Figure 5 displays the list of each edge, this displays edges(size147) and its value value in terms of its size or distance to the facility it is connected to. This is a list of all the possible combinations of distances from the ulabcities to the ulabfacilities.![]() | Figure 5. Cost of each edge |
A comparison of the mean computation time in seconds, for 100 runs of the different algorithms for various sample sizes n =250,n=500and n=2000 and number of clusters k. The computation time is given for one initialization point [37].
|
![]() | Table 2 |
|
![]() | Figure 6. k-median Optimum Solution [14] |
Appendix B The ESMvere CPLEX k-median Solution Model test demonstration.Out of 144 client locations, 125 locations were selected from the Northen densely populated areas, whereas 19 client locations were dropped from the southwest, spacey populated areas. The 152 potential facility locations were computed making a choice of the best locations within each region, using the same set of clients. The ESMvere CPLEX k-median solution model in this case computed the solution by partitioning the whole map of Mauritius into regions. Densely populated regions such as Port Louis were computed first. The division of the whole Republic of Mauritius map into regions was sensible and effective in that some regions to the southwest of Mauritius are too spacey populated, thus in these regions it would be more feasible to open all the ulab collection facilities. Whereas in areas that are too densely populated it would be more effective selecting the few facilities that would serve all clients with a minimum cost.The other method which was adopted as the more efficient and accurate method, was to divide the whole country of Mauritius into districts. The division of clients and facilities into districts meant that the calculation would be more accurate as it would be between the facilities and the clients that are in that district. In this method the computation to establish the optimum network would again, must be computed in series.The capital city Port Louis has a total of 17 gas stations. The company Total service stations own 7 gas stations, whereas the company Shell service stations also own 7 gas stations, 2 gas stations belong to Engen and 1 gas station belongs to Indian Oil. Figure 7 displays the solution to the decision variable
. The solution to the decision variable
is displayed on the left half of the diagram Figure 7 as an array with seven numbers or seven locations
.![]() | Figure 7. Total Gas in Port Louis |
is a boolean decision variable, which places a 1 on an open facility and a 0 on a closed facility. Thus, the solution in Figure 7 opens the ulabfacility locations 2,5 and 7. This clearly is a minimized set of open facilities Thus an optimum minimized TotalDistance value of 16564km. The solution to the decision variable
is also displayed on the right half of the diagram Figure 7. This side displays a list of all the facilities
, this displays ulabfacilities(size7), meaning there are 7 facilities listed down from facility 1 down to facility 7. The corresponding coordinates of facility 7 are also displayed on the right half of the diagram Figure 7.Out of the Seven Total service station facilities in the capital city of Port Louis, the best 3 facilities to open would be the facility 2,5 and 7 with coordinates -20.165595 57.492625, -20.156518 57.508551and -20.145017 57.523959. Typing the solution coordinates on google maps reveals the real and live, location of the feasible gas station Figure 8. The other six Total gas stations in Port Louis are dropped to minimize costs, while a (ulab collection facility) is built on this gas station. The objective function value of 0.16564 translates to 16564 meters (m) is displayed with a TotalDistance of 16564m. This shows that it’s a small island the total distance to cover all the locations is only 16km.![]() | Figure 8. Total k-median Solution. |
![]() | Figure 9. ArcGIS Gas Station Map [14] |
![]() | Figure 10. ArcGIS Population Size [14] |
![]() | Figure 11. Facilities vs Clients [14] |