一种使用协作预测的自组织网络故障检测方法

With the increase of mobile equipment and the demands of high throughput and user quality of service (QoS), the challenge of performance optimization and network maintenance is upon us. Besides, the deployment of long-term evolution (LTE) networks over traditional networks raises the challenge of handling the complex scenario of heterogeneous networks. To solve these challenges, the concept of SON has been widely researched since it is a high degree automatic management process in a cost-efficient manner, and will also be essential in future technologies, such as LTE-A and 5G.

The traditional automatic detection of an outage cell is mainly characterized into two categories: statistical analysis[1-2] and data mining[3-5]. Meanwhile, cell outage detection in heterogeneous networks has also been researched extensively. Due to the dense deployment nature and sparse user statistics of small cells, the algorithms designed in homogenous networks are mostly not suitable any more. In Ref.[6], Xue et al. proposed a cell outage detection method in a two-tier macro-pico network employing the KNN classification algorithm, which is not universally applicable for other kinds of small cells. The outage detection problem in dense 5G networks is studied in Ref.[7], where cells are represented with four states and the transition probability among these states is described in a hidden Markov model. The cooperative cell outage detection based on collaborative filtering and sequential hypothesis testing[8] can efficiently trigger the detection procedure without inter-cell communications, while requiring macro base stations to collect user statistics for further detection. Onireti et al.[9] proposed a cell outage detection scheme for heterogeneous networks (HetNets) with separated control plane and data plane. However, this approach has poor performance when there are few users or slight power attenuation in a serving cell.

Besides,the outage detection approaches with the collaboration of neighbor cells are also considered. Related characteristic parameters of neighbor cells are utilized in Ref.[10] firstly, based on which, the problem that the target cell has no access to user data or computation can be solved with the handover statistics in neighbor cells[11].However, the detection process based on sequential time series is supposed to be transmitted to the management center, which brings a large communication cost and complicated computation in dense wireless networks.

To solve the outage detection problem in networks with heavily distributed small cells, our approach consists of an outage triggering phase based on the spatial correlations and a detecting phase based on the temporal correlations of user statistics. The main conclusions of this paper are follows: 1) The cooperative prediction approach can achieve a higher detection accuracy even when there are few or no users in the problematic cell at the sacrifice of computation in neighbor cells; 2) The distributed computing paradigm moves computation to data instead of collecting large amounts of raw data for central computation units, which greatly reduces the data transmission consumption; 3) Both the spatial and temporal correlations of user statistics are considered, which can greatly improve the accuracy of the detection scheme.

1 System Architecture

1.1 System model

We consider a typical heterogeneous network architecture where femtocells are overlaid on other cells. A femtocell operates under the femtocell access point (FAP) and performs the function of automatic neighbor relations (ANR), which maintains the integrity and effectiveness of the neighbor cell list (NCL). The NCL of a femtocell is updated by its connected users and it points out the neighbor femtocells that need to be monitored and reported according to Ref.[12].

We assume that the FAP in outage experiences a degradation of transmission power while the computation and communication functions are not influenced during the process of operation. We also assume that the transmission powers of FAPs are constant during the detection process. The users are connected to the cells with the strongest RSRP signals and periodically report the RSRP statistics of all neighboring cells to their associated FAPs.

1.2 Detection framework

The overall detection process consists of two phases: a threshold-learning phase and an outage triggering and detecting phase.

In the threshold-learning phase, we need training data collected from the normal operating scenarios to configure our detection model. We generate a specific triggering threshold for each cell and an average detecting threshold for the whole network in the reference scenarios. The 95% highest prediction deviations t1, t2 and average abnormal rate μ1, μ2 in the reference scenario are computed beforehand, which are used to be the thresholds.

In the outage triggering phase, each FAP runs the triggering algorithm with the reported statistics from its associated users and reports the results to the corresponding neighboring femtocells to monitor their states. For example, user a served by cell A is able to receive signals from cells B,C and D, then the RSRP statistic reported by user a is calculated to monitor the states of neighboring cells B,C and D. Each cell receives report results from its neighboring cells and updates the rate of abnormal results D1 until it is higher than the triggering threshold μ1, and in this case the cell is triggered when detected. Assume that cell B with neighboring cells A,C and E is triggered. In the outage detecting phase, a triggered cell informs its neighboring cells to run the detection algorithm and report the results back for the final decision. The triggered cell with higher abnormal result rate D2 than decision threshold μ2 is considered to be in outage. As shown in Fig.1, thresholds t1, t2, μ1, μ2 are generated in the training phrases before detection, while d1, d2, D1, D2 are calculated during triggering and detecting processes. The triggering and detecting algorithms, namely collaborative filtering and grey prediction, are clarified in the next section.

2 Algorithm Description

2.1 Collaborative filtering with KNN (KNN-CF)

In order to predict the normal RSRP statistics, we use collaborative filtering to explore the spatial correlations among the user reports. Treating user equipment (UE) as users and FAPs as items, we utilize the user-based collaborative filtering to predict the expected normal RSRP

of user u belonging to the target FAP f, which is estimated as follows:

where rv,f is the collected RSRP from FAP f of user v not necessarily belonging to FAP f, in the collaborator set C(u) and wu,v is the interpolation weight between users u and v to be computed. In Ref.[8], collaborator set C(u) is denoted as a set of users selected in the benchmark data for correlation computation, which can receive the signals from the FAP associated with u. To estimate the interpolation weights, we formulate the problem as an optimization of the least squares problem as

Assume that there are m users in set C(u) and n FAPs (except f) in the network. In Eq.(2), the 1×n vector U is user u’s RSRP, the m×n matrix R is the benchmark users’ RSRP, and the 1×m vector w is the interpolation weights between user u and benchmark users in C(u). Based on w, Eq.(1) can also be written with the m×1 vector Rf, the RSRP of users in C(u) from f, as

However, the benchmark data selected in the whole collaborator set C(u) has low correlations, which has a negative impact on the performance of the CF algorithm. Meanwhile, the triggering decision made by only one RSRP statistic is not reliable enough, which may cause a high false alarm rate and high computation waste. Hence, the KNN-CF algorithm incorporating the KNN algorithm and statistical analysis into the collaborative filtering algorithm is proposed.Our KNN-CF algorithm is summarized (see Algorithm 1). First, we select N (N>K) statistics from C(u) which has the most common cells’ RSRP with ru and then choose K nearest neighbors of user statistics according to the Pearson correlation coefficients, which are treated as the benchmark data of collaborative filtering. The serving cell of user u calculates and compares the predicted normal RSRP

with the actually detected ru,f, reporting abnormal 0

is larger than the reporting threshold t1 or otherwise normal 1 to cell f. Cell f receives reports from the serving cell of user u and other neighboring cells, then it updates the rate of abnormal results D1 until it is higher than the triggering threshold μ1, and in this case cell f is triggered for detection.

Algorithm 1 KNN-based collaborative filtering (KNN-CF) in triggering phase

Input:N, K, t1, μ1, ru, C(u);

Output: triggering result.

For each targeting cell f detected in ru do

Select N statistics with the largest number of common FAPs with ru in C(u).

Select K statistics with the highest similarity with ru calculated according to Person correlation coefficients:

Generate U, R, Rf and compute

according to Eq.(4).

Compare with training threshold t1 and reports 0/1 to targeting cells:

For all targeting cell f do

Receive 0/1 reports from monitoring cells and update abnormal report rate:D1=mf/(mf+nf)

if D1>μ1, the detection stage is triggered;

else, continue updating D1.

2.2 Cooperative grey model

In Ref.[9], the control BS is used to detect the outage of the triggered data BS by predicting the RSRP of all the UEs that are associated with it prior to the outage. The performance of this approach is poor when there are few users served by a small cell. Moreover, an outage cell with no active users is unlikely to be detected and self-healed. To solve these problems, we propose a cooperative grey model (Co-Grey) prediction algorithm which compares the current RSRP report from the triggered cell f with historical statistics under the help of f ′s neighboring cells. The algorithm is described in Algorithm 2.

We assume that there are m neighboring cells informed by cell f in the detection phase and each neighboring cell fi (1≤i≤m) has ni users receiving signals from cell f during recent l records, where l is the window size of the grey model history sequences. The non-negative RSRP sequence of user uj(1≤j≤ni) prior to the triggered moment is denoted as ri, j =[ri, j(1), ri, j(2),…,ri, j(l)]. The first step of grey prediction is the accumulated generating operation (AGO):

A grey model AGO sequence takes values proportional to the ramping rate and it can be fitted with the first-order linear grey differential equation (6) and solved with the least squares method as Eq.(7). After solving coefficients a and b of the grey modeling in Eq.(6), the predicted normal RSRP statistic at time l+1 can be calculated by an inverse accumulated generating operation (IAGO) as

The neighboring cells fi(1≤i≤m) along with the triggered cell f calculate and compare the grey prediction results

with the actually detected ri,j(l+1), reporting abnormal 0

larger than the reporting threshold t2 or otherwise normal 1 to cell f. Cell f receives 0/1 reports and calculates the rate of abnormal results D2. The triggered cell f is considered to be in outage if D2 is higher than the decision threshold μ2.

Algorithm 2 Cooperative grey (Co-Grey) model prediction in the detecting phase

Output: detecting result.

Let m neighbor cells be able to detect the situation of f cooperatively

Let each neighbor cell fi have ni users receiving signals from f

Grey modeling as Eqs.(6) and (7)

Grey prediction and IAGO as Eqs.(8) and (9)

Report 1 to cell f, mf=mf+1

Report 0 to cell f, nf=nf+1

Compute abnormal rate: D2=mf /(mf+nf)

if D2>μ2, f is determined to be in outage;

else, f is determined to be in normal.

3 Simulation Results and Analysis

We consider a HetNet cellular network comprised of heavily distributed femtocells, which are distributed randomly within an area of 1 000 m×1 000 m. The scenario that we set up consists of 100 FAPs and 1 000 users. We assume that the users in an area follow a Poisson point process. In order to eliminate the influence of the network boundary, we assume that the dynamic users move out of one boundary and enter the opposite boundary with the same speed and direction according to the wrap-around method. The simulation parameters are based on the 3GPP specification[13]and the reduction of the transmission power is regarded as ranging from 40 to 10 dBm at the interval of 5 dBm to represent the cell outage. Users are associated with the FAP with the strongest RSRP and send RSRP reports to their serving cells every 0.1 s. Other detailed parameters are listed in Tab.1.

Fig.2 illustrates the performance of the triggering stage. Prediction deviations are defined as the differences between the predicted RSRP of collaborative filtering and the reported RSRP of user equipment. The number of neighbor cells is defined as the number of cells within the distance of 100 m, which explains the existence of prediction results while the number of neighbor cells is 0. As is shown, the prediction deviation decreases as the number of neighbor cells increases and the improved KNN-CF algorithm reduces the prediction deviations by around 10% compared with the CF algorithm[8], which can perform better in a dense network environment. The improvement of the CF algorithm leads to more precise triggering, which greatly reduces computational costs and false alarms.

Fig.3 depicts the overall detection rate for various user densities, comparing the performance of standard collaborative filtering and grey prediction algorithm with our improved KNN-CF and Co-Grey algorithms. We compute the abnormal rates of the triggered cells, namely the true positive rate and false positive rate, before the final decision. The cell which has a higher abnormal rate than threshold μ2 is supposed to be in outage. As shown in Fig.4, although there is no false alarm with high threshold μ2, the false positive rate of detection reports using the standard algorithm is higher than that using our improved algorithm, which indicates a higher probability of false alarm. To clarify, the outage cell can be detected with the help of neighbor cells when there is no active user connected to it, which is impossible to solve[9]. It is shown that the detection rate increases when the user density increases and the improved algorithm outperforms Ref.[9] by around 14%, especially when there are sparse users in the network. It can also be seen that the detection rate is lower with larger shadow fading factor σ, which results in statistics with a greater randomness.

4 Conclusion

In this paper, we propose a cell outage detection mechanism for automatic network failure detection based on the distributed computation of cooperative neighbor cells.The improved collaborative filtering and grey prediction algorithms are applied to dig out the abnormal information regarding both spatial and temporal correlations. The simulation results show that with the distributed computing paradigm, our approach can increase the detection accuracy and reduce the overall data transmission greatly, which is beneficial not only for a self-organizing femtocell network but also for a dense 5G network.

[1]Liao Q, Wiczanowski M, Stańczak S. Toward cell outage detection with composite hypothesis testing [C]//IEEE International Conference on Communications (ICC). Ottawa, ON, Canada, 2012: 4883-4887. DOI:10.1109/ICC.2012.6364384.

[2]De-La-bandera I, Barco R, Munoz P, et al. Cell outage detection based on handover statistics[J]. IEEE Communications Letters, 2015, 19(7): 1189-1192. DOI:10.1109/lcomm.2015.2426187.

[3]Turkka J, Chernogorov F, Brigatti K, et al. An approach for network outage detection from drive-testing databases[J]. Journal of Computer Networks and Communications, 2012, 2012: 1-13. DOI:10.1155/2012/163184.

[4]Zoha A, Saeed A, Ali I, et al. Data-driven analytics for automated cell outage detection in self-organizing networks [C]//2015 11th International Conference on the Design of Reliable Communication Networks (DRCN). Kansas City, MO, USA, 2015: 203-210. DOI:10.1109/DRCN.2015.7149014.

[5]Wang J, Phan N Q, Pan Z W, et al. An improved TCM-based approach for cell outage detection for self-healing in LTE HetNets [C]//2016 IEEE 83rd Vehicular Technology Conference (VTC Spring). Nanjing, China, 2016: 16125563-1-16125563-5. DOI:10.1109/VTCSpring.2016.7504129.

[6]Xue W Q, Peng M G, Ma Y, et al. Classification-based approach for cell outage detection in self-healing heterogeneous networks[C]//2014 IEEE Wireless Communications and Networking Conference (WCNC). Istanbul, Turkey, 2014: 2822-2826. DOI:10.1109/WCNC.2014.6952896.

[7]Alias M, Saxena N, Roy A. Efficient cell outage detection in 5G HetNets using hidden Markov model[J]. IEEE Communications Letters, 2016, 20(3): 562-565. DOI:10.1109/lcomm.2016.2517070.

[8]Wang W, Zhang J, Zhang Q. Cooperative cell outage detection in self-organizing femtocell networks [C]//Proceedings IEEE INFOCOM. Turin, Italy, 2013:782-790. DOI:10.1109/INFCOM.2013.6566865.

[9]Onireti O, Zoha A, Moysen J, et al. A cell outage management framework for dense heterogeneous networks[J]. IEEE Transactions on Vehicular Technology, 2016, 65(4): 2097-2113. DOI:10.1109/tvt.2015.2431371.

[10]Mueller C M,Kaschub M, Blankenhorn C, et al. A cell outage detection algorithm using neighbor cell list reports[M]//Self-Organizing Systems. Berlin, Germany: Springer, 2008: 218-229. DOI:10.1007/978-3-540-92157-8_19.

[11]Zhang T, Feng L, Yu P, et al. A handover statistics based approach for cell outage detection in self-organized heterogeneous networks [C]//IFIP/IEEE Symposium on Integrated Network and Service Management (IM). Lisbon, Portugal, 2017:628-631. DOI:10.23919/INM.2017.7987346.

[12]Hamalainen S, Sanneck H, Sartori C. LTE self-organizing networks(SON):Network management automation for operational efficiency[M]. Beijing:China Machine Press, 2013.

[13]3rd Generation Partnership Project.Technical specification group radio access network; Envolved universal terrestrial radio access (E-UTRA); Further advancements for E-UTRA physical layer aspects,3GPP TR 36.201 [S]. 3rd Generation Partnership Project, 2012.

A SON solution for cell outage detection using a cooperative prediction approach