OS fingerprinting techniques have some weaknesses, as well. The second weakness is that some of the techniques assume that the attacker spoofs the MAC address using Linux-based operating system tools. This assumption could cause some attackers to bypass the intrusion detection system. The attackers can use a capability that the Windows operating system provides to change the MAC address of a given user. Finally, vendor information, capability information and other similar fingerprinting techniques can be easily spoofed using off-the-shelf devices.
RSS approaches also have some limitations. Some researchers have reported that RSS samples from a given sender follow a Gaussian distribution, whilst other researchers revealed that the distribution is not Gaussian [ 29 , 30 ] or that it is not rare to notice non-Gaussian distributions of the samples [ 18 ].
A New MAC Address Spoofing Detection Technique Based on Random Forests
As [ 18 ] reported, we found that it is not rare to find many peaks in the collected RSS samples. This suggests that the detection techniques [ 2 , 3 , 18 , 27 , 28 ] based on clustering algorithms that are closely related to our proposal are not the optimal solutions because these solutions assume that the samples are always Gaussian. Therefore, their solutions generate false alerts or miss some intrusions if the data are not Gaussian distributed. Furthermore, we discovered that in multiple cases, the distribution of the data from a single device constructs two clusters, so it is hard for the clustering algorithm-based approaches to perform well in these situations.
Motivated by these concerns, we utilized a machine learning algorithm that can deal with both data that are Gaussian distributed and, more importantly, data that are not actually Gaussian distributed. Thus, in this article, we proposed a detection method based on random forests, because it can determine the dataset shape in order to obtain better results and the hard-to-spoof measurement i.
The rest of the research covers the following sections: Section 2 reviews the related work; Section 3 introduces the detection method; Section 4 explains the experimental setup; Section 5 evaluates the proposed technique; Section 6 discuses the proposed method; and Section 7 concludes the research. Chen et al. The authors assume that the RSS samples form a Gaussian. They assume that the RSS samples at a given period at N-sensors form an N-dimensional vector, and the number of clusters is two i.
They then use the Euclidean distance algorithm to compute the distance between the two centroids and eventually detect any MAC address spoofing. In practice, their approach might not work very well, especially when the hacker and legitimate device are close to each other. The centroids of both devices are close to each other, which makes it hard to differentiate the RSS samples that come from the hacker.
In addition, their approach struggles with non-Gaussian data distributions. Finally, one device can form two independent clusters, as we explain in the next sections. Sheng et al. They assume that the RSS samples from a given sender-sensor pair follow a Gaussian and apply a GMM clustering algorithm to detect spoofing. The solution that they propose has some limitations: As a result, their approach would not perform well, especially in high traffic wireless networks.
Yang et al. This algorithm is better than K-means because it is robust against any noise and outliers that the data might contain. However, they have similar assumptions to those in [ 2 , 3 ]. They assume that there are two clusters i. They also assume that, under normal conditions, the distance between the two medoids should be small because there is only one cluster at a specific location that is the legitimate device. In contrast, under abnormal behavior, the distance between the two medoids should be large, and this suggests the existence of an attacker [ 27 , 28 ].
In addition, one device can have two independent clusters that could degrade the accuracy of their proposed solution.
Sequence number-based approaches [ 25 , 26 ] have been proposed by several researchers exploiting the fact that every data and management frame has a sequence number in the MAC header. The sequence number typically is incremented by one when the sending device sends a management or data frame. The sensor captures the frames from the same MAC address, and if it finds there is a gap between two consecutive frames, it assumes that MAC address spoofing has occurred.
These approaches cannot work well when the legitimate station is not sending any frames. In addition, it cannot detect an attacker when it only sends control frames, as control frames do not have sequence numbers. This technique is used to distinguish between the rogue device and the genuine device. The data rates and modulation types are extracted from the physical layer meta-data such as RadioTap and Prism of each captured frame to detect rogue devices. The modulation types and the data rates depend on the rate adoption algorithm.
The information that they use to detect spoofing belongs to the physical layer, which makes their approach more robust against spoofing. The only problem with their approach is that it depends on a small number of data rates and modulation types to detect attackers. Thus, it is possible that the attacker uses the same modulation type and the data rate as the legitimate device because they are limited. Tao et al. The first stage is OS fingerprinting, which can be applied to the network layer in the protocol stack.
The authors extended the synchronization SYN -based OS fingerprinting because it is capable of differentiating the attacker from the legitimate device only if the attacker injects data frames into the network. They utilized the capability information, traffic indication map and tag information which includes the vendor information to extend it. The second stage employs the data link layer, the sequence number field in particular. They utilized the idea that there could be a sequence number gap between the legitimate device and the attacker consecutive frames.
The third stage brings to play the RSS, which belongs to the physical layer; unfortunately, the authors did not explain this stage in much detail. The authors established some rules to detect the MAC address spoofing. They used a simple and yet effective technique to combine the outputs from the three stages. Every stage outputs either normal or abnormal states of every upcoming frame. They then combined the outputs to evaluate how severe the suspicious frame is; if the analyzer finds the outputs of more than one stage to be abnormal, the alert is triggered.
If the OS fingerprinting stage alone is abnormal, the alert is triggered.
This indicates that the MAC address of the AP is masqueraded, because the OS fingerprinting that the authors used depends on fields that are vital to the APs, such as capability information. Some drawbacks exist in such approaches: OS fingerprinting also assumes that most of the tools that attackers use are based on Linux-based operating systems.
This is somehow a valid assumption, but the Windows operating system also provides a capability to change the MAC address of any wireless card in the WLAN. The sequence number techniques have several drawbacks, as explained previously, so combining both SN and OS fingerprinting could miss some intrusions. RSS has been adopted by researchers for localization for several years because of its correlation to the location of a wireless device [ 32 , 33 , 34 , 35 , 36 , 37 ].
The goal of localization is to focus on RSS samples of a single device. In contrast, in spoofing detection, it is sometimes difficult to distinguish between two devices at different locations that claim to be the owner of a specific wireless device through spatial information alone, especially when they are in close proximity.
We exploit the fact that RSS samples at a specific location are similar while the RSS samples at two different locations are distinctive. To distinguish an attacker, we should first develop the characteristics of normal behavior by building a profile of the legitimate device. The network architecture is assumed to be similar to the one that is in Figure 1a , which consists of sensors monitoring the network. Every sensor captures frames from nearby wireless devices. Each sensor sends the important information of the captured packets, as shown in Figure 1a , to the server for global detection.
The console receives the packets, normalizes the RSS samples using the timestamps or sequence number, combines the packets and constructs the sample. Each sample contains the information of the same packet from both sensors. Network architecture and profiling. The proposed framework involves two stages: In the offline stage, the legitimate device profile is built. During profiling, we label the legitimate device RSS samples for the training set as zero and all possible other locations as one to construct a profile of the legitimate device.
A Study on MAC Address Spoofing Attack Detection Structure in Wireless Sensor Network Environment
Once we are satisfied with our predictor, we can serialize it, as shown in Figure 1b , to predict new unseen data. After serialization, the training procedure depicted in the lower part of the figure is not necessary for real-time prediction. Thus, in the online stage, any new packet can be fed immediately to the predictor. The predictor then predicts if the packet comes from a legitimate device or not. If it finds that the packet is coming from a suspicious device, an alert is triggered. We used the Python library [ 38 ] in our experiment to train and test our detection method.
Algorithm 1 shows the training set using the random forests ensemble method. Random forests uses a specified number of trees e. Each tree works on a different subset of the dataset randomly to create the ensemble [ 39 ]. To detect MAC address spoofing, we used the prediction ability of random forests after serialization to predict unseen new samples, as indicated in Algorithm 2.
The new sample is classified as normal or abnormal, if the predictor finds it to be different from the profiled samples. We covered an area of m 2 using 15 locations marked by the red dots in Figure 2 to evaluate our proposed method. The distance between any two neighbors is about four meters from 3—5 m. We tried to simulate the attacker to be at every possible place throughout our test-bed.
We placed two sensors, indicated by the triangles, to cover as much ground as possible of the network diameter. We also used some active probing techniques to force the device to respond to specific frames in order to speed up the process of profiling. Each sensor captures enough packets for legitimate device profiling. The total number of combinations is ; we chose one location to be the location of the legitimate device e.
We tested all possible combinations. To avoid high variance and determine whether the dataset is sufficient to train a random forests classifier of trees, we used the learning curve of one of the noisiest datasets, that of Locations 6 and 7, where the distance between the two locations is less than 4 m, shown in Figure 3 a. We started with about samples and determined that we could improve the accuracy and reduce the variance. At about 15, samples, the variance was eliminated and stabilized, indicating that a dataset of 20, observations is more than enough.
Figure 3 b shows how random forests of trees separate the data-points when the attacker is 10 m away from a genuine user. The random forests ensemble method performed very well in the presence of outliers and can separate data of any shape. Optimization and data separation. Figure 4 a illustrates the signal attenuation that signal strength might face in wireless networks.
We picked two of the sampled locations to represent this phenomenon and measure consecutive packets at each location. One sampled location is close to the first sensor, and the other one is close to the other sensor. The two subplots show an attenuation of about a 3. It is not rare to see some signal attenuation in our experiments. This phenomenon exists because of several factors, such as multi-path fading and obstacles that could make the signal oscillate, especially when there is a significant distance between the sender and receiving device.
Data distribution and attenuation. Sensor 1 data distribution; c Location 8: Sensor 2 data distribution. The distribution of the data from Location 8 at the two sensors is shown in Figure 4b ,c. Some researchers state that the distribution of the transmitter and sensor pair is Gaussian [ 2 , 3 ], while other researchers report that the distribution is not Gaussian [ 29 , 30 ] or that it is not rare to see non-Gaussian distributions of RSS samples [ 18 ].
How does ARP spoofing work?
We found that non-Gaussian distributions are not rare and have different distribution shapes and peaks. The distribution of 10, RSS samples is shown in the figure. Figure 4b shows a distribution of data that form two Gaussians with one peak that is slightly greater than the other one i. This suggests that using clustering algorithm-based approaches [ 2 , 3 , 18 , 27 , 28 ] can generate many false alerts or cause the intrusion detection system to allow large margins that permit attackers to harm the network.
To evaluate our proposed solution and compare it to previous work [ 2 , 3 , 18 , 27 , 28 ], we implemented the four possible GMM kernels, because the kernel that [ 18 ] used was not indicated in their article. We considered only the best performing kernel i. We first calculated the accuracy of the previously-proposed solutions [ 2 , 3 , 18 , 27 , 28 ] along with our proposed method. The clustering algorithm-based approaches [ 2 , 3 , 18 , 27 , 28 ] did not work well, as shown in Table 1 a, especially when the two locations were close to each other because of the reasons mentioned earlier see Section 4.
Our proposed method achieved the best accuracy of We tested all of the detection methods where the distances between the two locations were less than 4 m, as shown in Table 1 b, between 4 and 8 m, as shown in Table 1 c, and between 8 and 13 m, as shown in Table 1 d. When the locations are close to each other, the clustering algorithm-based approaches [ 2 , 3 , 18 , 27 , 28 ] did not perform well, with a minimum of All of the techniques did slightly better when the locations were a little further apart, as shown in Table 1 c. To evaluate our detection method more rigorously, we used the receiver operating characteristic ROC curve, shown in Figure 5a , which plots the detection rate, that is the true positive rate or sensitivity against the 1 - specificity or false positive rate FPR.
We evaluated our detection method to measure the tradeoff between correct detection and FPR for different distances between the attacker and legitimate device. We also measured the prediction time to see if it is possible to predict the captured frames in real time. Table 2 shows the average testing time, standard deviation, minimum and maximum values for 10, samples of all of the tested locations. The clustering algorithm-based methods, of Chen et al. Figure 5b illustrates the overall performance of our detection method and the existing methods with regard to testing time.
Our detection method has a good performance in terms of testing time, with an average of only ms. ROC curve of the proposed method and testing time of all of the methods. RSS measurements can be utilized to differentiate wireless devices based on location. Some factors play a vital role in measuring the RSS, such as multi-path fading, absorption effects, transmission power and the distance between the transmitter and the receiver.
Our experiment shows multiple situations where the data forms different shapes and peaks. This is probably because WLAN devices interfere with one another. In addition, microwave ovens and Bluetooth might cause more collision and interference in the frequency band. Thus, our proposed method is very effective because unlike the previous solutions [ 2 , 3 , 18 , 27 , 28 ] that could deal with the data if it were only Gaussian distributed our method could pick the data of any shape. The overall accuracy of our proposed method is We tested the proposed method where the distances between the genuine device and the attacker are less than 4 m, from 4—8 m and from 4—8 m.
The longest distance between any two locations in our test-bed is about 13 m. Although we did not test any two locations where the distance is more than 13 m, we believe that the accuracy would be perfect as the distance between the attacker and the legitimate device increases to more than 13 m.
We also did not test different types of antennas, such as directional or beam antennas, because this research assumes that the attacker uses an omnidirectional antenna, so more sophisticated attacks might remain undetected. The sensors placement is significant to determine the difference between the profiled legitimate device samples and the masquerader frames. Figure 6 shows how important the features after training are at determining the two locations for three different combinations note that understanding feature importance is a capability that is provided by almost all of the ensemble methods.
The first feature comprises the RSS samples captured by the first sensor, and the second feature consists of the RSS samples captured by the second sensor. The figure shows which sensor determines most of the samples of Locations 1 and It appears that the two sensors are close: In this case, the distance between the attacker and the legitimate device is about 12 m. The legitimate device i. The hacker i. Location 4 is about 5 m from Sensor 2 and is about 11 m from Sensor 1. In addition, the distance from the attacker to the legitimate device is about 4 m.
Location 8 is about 2 m away from the first sensor and about 10 m away from the second sensor. Location 9 is about 4 m away from the first sensor and 11 m away from the first sensor. The two locations are about 4 m away from each other. In this article, we proposed a technique based on the random forests ensemble method, which characterizes the shape of a dataset to detect MAC address spoofing, instead of assuming that the data are Gaussian distributed. All previous methods based on clustering algorithms assume that there are two clusters, which is not a valid assumption, because one device, such as an AP, can form two clusters.
Based on our extensive experiments and evaluations, we determined that our proposed method performs very well in terms of accuracy and prediction time. We proposed a technique to detect MAC address spoofing based on random forests, as it outperforms all of the clustering algorithm-based approaches that were proposed previously, in terms of accuracy. Furthermore, it has a good prediction time. In our future work, we will consider an outlier or novelty detection method to detect MAC address spoofing.
We plan to use an approach that is based on a one-class SVM to build a profile for legitimate devices. The authors acknowledge the reviewers for their valuable comments that significantly improved the paper. This research paper was implemented and written by Bandar Alotaibi as a part of his PhD dissertation under the supervision of Khaled Elleithy.
Extensive discussions of the tested algorithms and evaluation metrics to determine the best performing algorithm were done by both authors. National Center for Biotechnology Information , U. Journal List Sensors Basel v. Sensors Basel. Published online Feb Leonhard M. Reindl, Academic Editor. Author information Article notes Copyright and License information Disclaimer.
- 1. Introduction.
- chef michael smith chili mac and cheese;
- A New MAC Address Spoofing Detection Technique Based on Random Forests.
International Conference on Advanced Communication and Networking. Conference paper. This is a preview of subscription content, log in to check access. Willens, S. Network access control system and process. Google Patents Google Scholar. Ye, W. Medium access control with coordinated adaptive sleeping for wireless sensor networks. Alexander, S. Wright, J.: White Paper Google Scholar.
Virendra, M. Securing information through trust management in wireless networks. The Workshop on Secure Knowledge Management, pp.
- hp psc 1510 all in one driver mac.
- ARP Spoofing Attack: Detection, Prevention and Protection!
- tecla atalho print screen mac;
- What Is ARP Spoofing? — Attacks, Detection, And Prevention.
- ftp server mac os x 10.5.8;
- ARP Spoofing Attacks:!
Guo, F. Sequence number-based MAC address spoof detection.