Cellular users tend to run computation-intensive applications which require many computing resources. To meet this requirement, the idea of cellular edge computing is introduced. First, one or more off-the-shelf computing units or servers are located in the base station (BS) to form a small-scale computing resource pool[1-2]. Then, users’ computing tasks are offloaded to the BS to run. The so-formed system is called the cellular edge computing system[3-4]. In such a system, there are two types of resources: the radio resource and computing resource. Therefore, the resource scheduling in the cellular edge computing system must take both types of resources into consideration. This paper studies the resource scheduling in the cellular edge computing system.
This paper focuses on the delay aspect of the resource scheduling. In cellular edge computing systems, there are four types of delays: transmission delay, communication queueing delay, execution delay and computing queueing delay. According to the way of incorporating delays into resource scheduling, existing works can be classified into three categories. For the first category, only transmission and execution delays are considered, but the communication and computing queueing delays are not considered[5-8]. For the second category, in addition to the transmission and execution delays, the queueing related delays are also considered. Furthermore, this type of algorithm assumed that the queue can be modelled as the traditional queue (e.g., the M/M/1 queue) so that the delay formulae of the queueing theory can be readily applied in the problem formulation[9-12]. For the third category of the algorithm, the queueing related delays are also considered. However, instead of using queueing theory’s delay formulae, Little’s law is used in this category. According to Little’s law, the average delay is proportional to the average queue length. Therefore, the delay minimization problem was transformed into the queue length minimization problem. Then, the scheduling algorithms were derived[13-16].
Generally, the third category of the algorithm is better than the second category, since it does not need any assumption on traffic’s statistics. However, the third category of the algorithm depends heavily on Little’s law. Actually, Little’s law can be inaccurate. Here is an example. Consider a queue where there are two newly arrived tasks at the beginning of each slot and each task needs half slot time to serve. Let Q(n) denote the number of backlogged tasks at the end of slot n. Then, Q(n) is always 2. On the one hand, according to Little’s Law, the average delay is 1 slot. On the other hand, it can be verified that the delay of the odd-numbered and even-numbered tasks is 0.5 and 1 slot, respectively, so the true average delay is 0.75 slot. Therefore, Little’s law is inaccurate for this example. The reason is that the queue length is not sampled frequently enough. Actually, for this example, if the queue length is sampled once every half-slot, the queue length will be 1 or 2 alternately, and then the average delay can be correctly calculated according to Little’s law.
Motivated by the above observations, this work tries to improve the third category of the algorithm by dropping the dependence on Little’s law. The key is to derive the formulae of the delays when there is no assumption on the traffic’s statistics so that the scheduling algorithm can be proposed by solving the delay minimization problem directly without using Little’s law.
Consider a cell consisting of one BS and I users, in which time is slotted and the duration of each slot is Tslot (unit:s). Let Ei denote the number of cycles needed by the task of user i. For simplicity and without loss of generality, assume that E1≤E2≤…≤EI. Assume that there are J computing servers in the BS. Each server j can provide Fj cycles per second. Then, it will take Ei/Fj of server j to execute one task of user i. For convenience, let αij=Ei/Fj/Tslot denote the normalized execution time of one task of user i in server j.
A task experiences two delays: the communication delay which is the sum of communication queue waiting time and transmission time, and the computing delay which is the sum of computing queue waiting time and execution time. The formulae of these two delays are derived as follows.
First, we derive expressions describing the evolution of communication queues. Let Ai(n) denote the number of newly arrived computing tasks of user i during slot n. Let Ni(n) denote the number of tasks of user i selected by the resource scheduler during slot n. Furthermore, let Ui(n) denote the number of backlogged tasks of user i at the end of slot n, which can be expressed as
Ui(n)=Ui(n-1)-Ni(n)+Ai(n)
(1)
where Ui(0)=0. There are three constraints on Ni(n). First, Ni(n) must be the integer. Secondly, Ni(n) must not exceed the backlogged tasks in the queue, that is
0≤Ni(n)≤Ui(n-1)
(2)
Thirdly, Xi(n) must be limited by the communication capability of the air interface. For this constraint, some notations are needed. Let Ri denote the number of subcarriers needed by user i to offload a task request to the BS in one slot. The value of Ri depends on both the number of bits of each task request of user i and the channel condition of user i (e.g., path loss, fading, etc.). Specifically, the better the channel condition of user i, the smaller the value of Ri . Assume that there are a total of R subcarriers in the cellular edge computing system, and then this constraint can be expressed as
(3)
Next, we derive the formula of communication delay. For user i, let denote the total number of arrivals until slot n. Similarly, let Ξi(n)= denote the total number of departures until slot n. In the literature, Ωi(n) and Ξi(n) are also known as the arriving curve and leaving curve, respectively, as illustrated in Fig.1(a). The total area between these two curves up to slot n represents the total time all tasks have spent in the queue during the interval (0, nTslot). Let corresponding to the shadowed area in Fig.1(a). For convenience, let Wi(n) denote the normalized version of i.e., Wi(n)= Then, we can have the formula of Wi(n) as
Wi(n)=Wi(n-1)+Ui(n-1)
(4)
Finally, we define
(5)
as the measure of the communication delay experienced by user i’s tasks.
(a)
(b)
Fig.1 The delay models. (a) The communication delay; (b) The computing delay
First, we derive expressions describing the evolution of computing queues. There are J computing queues. For each j, let Sj(n) denote the workload of backlogged tasks, that is, the number of slots needed to execute all these tasks by server j at the end of slot n, which can be expressed as
(6)
where ()+=max{,0} and xij(n) is 1 if user i’s tasks are allocated to server j and 0 otherwise. There are two constraints on xij(n). Firstly, xij(n) must be 0 or 1. Secondly, since a user can select one and only one server while a server can serve many users (i.e., “many-to-one”), we have
(7)
Next, we derive the formula of computing delay. Let Zi(n) denote the normalized total computing delay experiences by user i until slot n, which can be written as
(8)
where Zij(n) is the normalized total computing delay experienced by user i’s tasks in server j until slot n. For this case, we use different methods to calculate the area between the arriving curve and leaving curve. Let Dijh(n) denote the normalized delay (including execution time) experienced by the h-th task (1≤h≤Ni(n)). Then, we calculate the normalized area between the arriving curve and leaving curve as
(9)
corresponding to the shadowed area in Fig.1(b). Actually, task h experiences four types of delays: 1) It experiences delay Sj(n-1)+ which indicates that the task will not be instantly executed if the queue is not empty at the end of slot n-1; 2) It experiences delay which indicates that user i’s task has to wait the execution of user l’s tasks for l<i; 3) It experiences delay (h-1)αij which indicates that user i’s tasks are executed in sequence so that task h has to wait the execution of previous (h-1) tasks; 4) It experiences delay αij which is the execution time of task h itself. Integrating these together, we obtain the expression of Dijh(n) as
Dijh(n)=
(10)
For convenience, let
(11)
Then, we can have the formula of Zi(n) as
(12)
Finally, we define
(13)
as the measure of the computing delay experienced by user i’s tasks.
Let denote the average communication and computing delay for all users, respectively. Since communication queues are concatenated with computing queues, there are tradeoffs between Wavg and Zavg. If Wavg is small, Zavg will be large; otherwise, if Wavg is large, then Zavg will be small. Therefore, we need to optimize these two delays jointly. To qualify the tradeoff, a weight factor b≥0 is introduced to reflect the relative importance of these two delays. Therefore, the multiobjective optimization can be formulated as
(14)
subject to the constraints in Eqs.(2), (3), (7) and that Ni(n) are integers and xij(n) are binaries. This optimization problem must be transformed into a solvable form. Actually, this optimization problem consists of two subproblems. The first subproblem is min Wavg and the second subproblem is min Zavg. The detailed forms of these two subproblems are derived as follows.
First, we consider the subproblem min Wavg. As indicated in Eq.(4), minimizing Wavg,i is equivalent to minimizing the average communication queue length. Due to the Lyapunov optimization framework established in Ref.[17], define the Lyapunov function Then, the conditional Lyapunov drift is Δ(n)=E{L(n)-L(n-1]U(n-1)}, where U(n-1)=[U1(n-1),…,UI(n-1)] and E{} is the expectation operation. Exploiting that ((a-b)++c)2≤a2+b2+c2+2a(c-b) for any positive a, b, and c, we can have (Ui(n))2-(Ui(n-1))2≤(Ai(n))2+(Ni(n))2+2Ui(n-1)(Ai(n)-Ni(n)). According to the derivations in Ref.[17], we can estimate that Δ(n)≤B- is a constant. The Lyapunov-based algorithm is to make a transmission decision Ni(n) to minimize the right-hand-side of the drift bound. Note that the transmission decision on slot n only affects the final term on the right-hand-side. Thus, we need to design an algorithm minimizing We now use the concept of opportunistically maximizing an expectation. That is, the above expression is maximized by the algorithm that observes the current queues U(n-1) and chooses Ni(n) to minimize Therefore, the subproblem min Wavg can be transformed into the following optimization subproblem for each slot n:
(15)
subject to the constraints in Eqs.(2), (3) and that Ni(n) are integers, where Ui(n-1) is the communication queue length measured for the previous slot.
Secondly, we consider the subproblem min Zavg. In this case, a different method is taken. First, according to the definition of Zavg, this subproblem is equivalent to min On slot n, since Zi(n-1) has been given, the scheduling decision xij(n) only affects the term Dij(n). Therefore, we need to design an algorithm that minimizes
where S(n-1)=[S1(n-1),…,SJ(n-1)].Using the concept of opportunistically maximizing an expectation, the above expression is minimized by the algorithm that observes the current queue S(n-1) and chooses xij(n) to minimize
Therefore, the subproblem min Zavg can be transformed into the following optimization subproblem for each slot n:
(16)
subject to the constraints in Eq.(7) and that xij(n) are binaries, where Sj(n-1) in the expression of Dij(n) is the computing queue length measured for the previous slot.
According to the above derivations, the subproblem min Wavg and min Zavg can be transformed into the optimization problem in Eq.(15) and Eq.(16), respectively. Therefore, the problem in Eq.(14) can be transformed into the following optimization problem for each slot n:
(17)
subject to the constraints in Eqs.(2), (3), (7) and that Ni(n) are integers and xij(n) are binaries.
Let gij denote the partial derivative of the objective function in Eq.(17) with respect to xij(n), which can be calculated as
(18)
where N=[N1(n),N2(n),…,NI(n)] and x=[x11(n),x12(n),…,xIJ(n)], respectively. Let fi denote the partial derivative of the objective function in Eq.(17) with respect to Ni(n), which can be calculated as
(19)
where U=[U1(n-1),U2(n-1),…,UI(n-1)]. Additionally, at any (N,x), if fi≥0, user i does not offload any more task to any server; if Ni(n)=Ui(n-1), user i does not offload any more task to any server; if user i does not offload any more task to any server. Therefore, we define the feasible user set at (x, N) as
C(N,U,f)={i:fi<0,Ni(n)<Ui(n-1),
(20)
where f=[f1,f2,…,fI].
The proposed scheduling algorithm is as follows. First, initialize Ni(n)=0 for each i and U′i(n-1)=Ui(n-1) for each i. The algorithm executes the following steps.
Step 1 The values of x are determined. Initially, xij=0 for each (i,j). For each user i, calculate the gradient gij for each j by substituting N and x into Eq.(18), then select the server j*=arg min gij over all j, and assign user i to this server (i.e., set xij*=1).
Step 2 Update the values of N. First, calculate the values of f(N, U, x); secondly, select the user i*=arg min fi over all feasible iC; thirdly, increase Ni* by one (i.e., let Ni*←Ni*+1); fourthly, update Ui*(n-1)=Ui*(n-1)-1 and C(N, U, f). If C is not empty, go back to Step 1; otherwise, the algorithm stops.
The computation complexity of the proposed algorithm is analyzed as follows. The proposed scheduling algorithm consists of two parts. For the first part, the gradient gij is calculated for each user i and each server j. Thus, the computational complexity of this part is O(IJ). For the second part, the gradient fi is calculated for each user i. Thus, the computational complexity of this part is O(I). Hence, the computational complexity of these two parts is O(IJ). According to the proposed algorithm, these two parts are executed until set C becomes empty. According to the definition of C in Eq.(20), the times that C is not empty will be no more than (R/Ri), which is a constant. Thus, the times that these two parts are executed is no more than a constant. Therefore, the overall computational complexity of the proposed algorithm is O(IJ).
Consider a time-slotted cellular communications system, in which the duration Tslot of each slot is 10 ms. Assume that the total number of subcarriers provided by the cell is R=40. Assume that there are J=2 servers. Unless otherwise stated, the values of Fj are set to be 2×105×[1, 1.5]. For the users in the cell, the parameters are set as follows. Assume that there are I=10 users. For each user i, assume that his/her tasks arrive according to a Poisson process with an average inter-arrival time of Ti. Unless otherwise stated, the value of Ti is set to be 5 slots; the values of Ri are set to be [2, 2, 2, 2, 3, 3, 3, 3, 4, 4]; and the value of Ei is set to be 1.5×103×(1+i/10) for user i.
The main performance metric considered in this paper is the delay, which consists of two parts: the communication delay and computing delay. During the simulations, since it is not convenient to plot the delay of each user individually, we first obtain the average delays Wavg,i and Zavg,i for each user i, then obtain the average delays over all users, and then plot the values of Wavg and Zavg as the final results for each parameter setting. The simulation results are presented as follows.
First, the traditional algorithm is briefly introduced as follows. Define the Lyapunov function as where b is also a weight factor. The traditional algorithm is to make transmission decisions to minimize the conditional drift of the Lyapunov function[17]. Thus, the scheduling algorithm minimizes Sj(n-1) for each n subject to the constraints in Eqs.(2), (3), (7) and Ni(n) are integers and xij(n) are binaries. We use the same algorithm in Section 3.2 to solve this problem, except that the gradients in Eqs.(18) and (19) are replaced by gij(N,x)=bαijNi(n)Sj(n-1) and
Fig.2 shows the simulation results of Wavg, Zavg, and Wavg+Zavg with different b for the proposed algorithm (i.e., Alg1) and the traditional algorithm (i.e., Alg2). For the curves in Fig.2, we have the following observations. First, both algorithms can trade-off between the communication delay and computing delay. With the increase in b, the computing delay decreases, while the communication delay increases. For example, when b is increased from 10-2 to 101, the computing delay of the proposed algorithm decreases from 13.638 6 to 0.570 8 slots, while the communication delay of the proposed algorithm increases from 0.502 8 to 32.674 5 slots, and the total delay increases from 14.141 4 to 33.245 3 slots. Secondly, the total delay of the proposed algorithm (i.e., the line with circle mark) is always better than that of the traditional algorithm (i.e., the line with cross mark). For example, when b=10-1, the total delay of the traditional algorithm is 15.291 0 slot, while the total delay of the proposed algorithm is 13.426 6 slot, with a 12% off. This is due to the fact that the proposed algorithm directly minimizes the delay, while the traditional algorithm does not. Therefore, we can conclude that the delay performance of the proposed scheduling algorithm is better than that of the traditional algorithm.
Fig.2 Impact of the parameter b and comparison with traditional scheduling algorithm
Fig.3 shows the simulation results of Wavg, Zavg, and Wavg+Zavg for different total computing resources. Specifically, in this figure, the horizontal axis represents the value of where Fj is the baseline value of the total number of cycles provided by server j in one second, while decreases from 0.95Fj to 0.85Fj, the communication delay increases from 1.0148 to 3.769 3 slots, the computing delay increases from 0.254 1 to 0.744 5 slots, and the total delay increases from 1.268 9 to 4.513 8 slots. Furthermore, we can observe that, the total delay of the proposed algorithm (i.e., the blue line with circle mark) is always less than that of the traditional algorithm (i.e., the red line with circle mark). This is due to the fact that the proposed algorithm directly minimizes the delay, while the traditional algorithm does not.
Fig.3 Impact of the amount of computing resources provided by the computing server
Fig.4 shows the simulation results of Wavg, Zavg, and Wavg+Zavg for different task arrival rates. Specifically, in this figure, the horizontal axis represents the value of from 0.96Ti to 0.84Ti, the communication delay increases from 0.920 4 to 3.179 5 slots, the computing delay increases from 0.246 8 to 0.731 9 slots, and the total delay increases from 1.167 2 to 3.911 4 slots. Furthermore, we can observe that, the total delay of the propo-sed algorithm (i.e., the blue line with circle mark) is always better than that of the traditional algorithm (i.e., the red line with circle mark). This is due to the fact that the proposed algorithm directly minimizes the delay, while the traditional algorithm does not.
Fig.4 Impact of the average task arrival rate of users
1) The analytical formulae of the communication and computing queueing delays in the many-to-one multi-server cellular edge computing systems are derived without assuming the knowledge of the statistics of user task arrival traffic.
2) A novel resource scheduling algorithm which solves the delay optimization problem directly is proposed for the many-to-one multi-server cellular edge computing systems.
3) Under the considered simulation parameters, the proposed algorithm can achieve 12% less total delay, as compared to the traditional queue length optimization based algorithm. System parameters including the weight, the amount of computing resources provided by servers, and the average user task arrival rate have impact on the percentage of delay reduction.
[1] Checko A, Christiansen H L, Yan Y, et al. Cloud RAN for mobile networks: A technology overview[J]. IEEE Communications Surveys & Tutorials, 2015, 17(1): 405-426. DOI:10.1109/comst.2014.2355255.
[2] Peng M G, Sun Y H, Li X L, et al. Recent advances in cloud radio access networks: System architectures, key techniques, and open issues[J]. IEEE Communications Surveys & Tutorials, 2016, 18(3): 2282-2308. DOI:10.1109/comst.2016.2548658.
[3] Tran T X, Hajisami A, Pandey P, et al. Collaborative mobile edge computing in 5G networks: New paradigms, scenarios, and challenges[J]. IEEE Communications Magazine, 2017, 55(4): 54-61. DOI:10.1109/mcom.2017.1600863.
[4] Guo H Z, Liu J J, Zhang J. Computation offloading for multi-access mobile edge computing in ultra-dense networks[J]. IEEE Communications Magazine, 2018, 56(8): 14-19. DOI:10.1109/mcom.2018.1701069.
[5] Dinh T Q, La Q D, Quek T Q S, et al. Learning for computation offloading in mobile edge computing[J]. IEEE Transactions on Communications, 2018, 66(12): 6353-6367. DOI:10.1109/tcomm.2018.2866572.
[6] Wang F, Xu J, Wang X, et al. Joint offloading and computing optimization in wireless powered mobile-edge computing systems [J]. IEEE Transactions on Wireless Communications, 2018, 17(3): 1784-1797. DOI: 10.1109/TWC.2017.2785305.
[7] Zhou J Z, Zhang X, Wang W B. Joint resource allocation and user association for heterogeneous services in multi-access edge computing networks[J]. IEEE Access, 2019, 7: 12272-12282. DOI:10.1109/access.2019.2892466.
[8] Du J B, Zhao L Q, Chu X L, et al. Enabling low-latency applications in LTE-A based mixed fog/cloud computing systems[J]. IEEE Transactions on Vehicular Technology, 2019, 68(2): 1757-1771. DOI:10.1109/tvt.2018.2882991.
[9] Chen L X, Zhou S, Xu J. Computation peer offloading for energy-constrained mobile edge computing in small-cell networks[J]. ACM Transactions on Networking, 2018, 26(4): 1619-1632. DOI:10.1109/tnet.2018.2841758.
[10] Ko S W, Han K F, Huang K B.Wireless networks for mobile edge computing: Spatial modeling and latency analysis[J]. IEEE Transactions on Wireless Communications, 2018, 17(8): 5225-5240. DOI:10.1109/twc.2018.2840120.
[11] Fan Q, Ansari N. Application aware workload allocation for edge computing-based IoT[J]. IEEE Internet of Things Journal, 2018, 5(3): 2146-2153. DOI:10.1109/jiot.2018.2826006.
[12] Yang L C, Zhang H L, Li M, et al. Mobile edge computing empowered energy efficient task offloading in 5G[J]. IEEE Transactions on Vehicular Technology, 2018, 67(7): 6398-6409. DOI:10.1109/tvt.2018.2799620.
[13] Lyu X C, Ni W, Tian H, et al. Optimal schedule of mobile edge computing for Internet of Things using partial information[J]. IEEE Journal on Selected Areas in Communications, 2017, 35(11): 2606-2615. DOI:10.1109/jsac.2017.2760186.
[14] Kim Y, Kwak J, Chong S. Dual-side optimization for cost-delay tradeoff in mobile edge computing[J]. IEEE Transactions on Vehicular Technology, 2018, 67(2): 1765-1781. DOI:10.1109/tvt.2017.2762423.
[15] Feng H, Llorca J, Tulino A M, et al. Optimal control of wireless computing networks[J]. IEEE Transactions on Wireless Communications, 2018, 17(12): 8283-8298. DOI:10.1109/twc.2018.2875740.
[16] Kim Y, Lee H W, Chong S. Mobile computation offloading for application throughput fairness and energy efficiency[J]. IEEE Transactions on Wireless Communications, 2019, 18(1): 3-19. DOI:10.1109/twc.2018.2868679.
[17] Neely M J. Stochastic network optimization with application to communication and queueing systems[M]//Synthesis Lectures on Communication Networks.Morgan & Claypool, 2010: 1-211. DOI:10.2200/s00271ed1v01y201006cnt007.