Fine-grained Measurement of Performance Metrics in the Internet of Things

Funded By National Science Foundation

Overview

The Internet of Things (IoT) enables an exchange of data that was not previously available, such as temporal and spatial distribution of soil moisture, power consumption of electrical appliances in smart buildings, and integrity of concrete structures. Each IoT application scenario requires unique guarantees on certain performance metrics (such as latency, loss, power consumption etc.) from the IoT infrastructure deployed in that scenario. A small deterioration in these performance metrics can cause violations of service level agreements and result in revenue and functional losses. In order to ensure that the IoT infrastructure delivers the guarantees on these performance metrics, the first step is to measure these metrics. In this project, we are develop frameworks to measure IoT performance metrics, which are required by the IoT network managers to reactively perform troubleshooting and to proactively locate and update potential future bottlenecks. The performance metrics that this project will measure include both quality of service metrics such as latency, loss, and throughput and resource utilization metrics such as power consumption, storage utilization, radio on-time etc. The framework will measure these metrics passively, i.e., without interrupting the regular operations of the IoT network. In developing such a framework for measuring IoT performance metrics, we are working on a three phased approach: a recording phase, during which IoT devices record information about the performance metrics of interest; a transportation phase, during which IoT devices transport the recorded information to a server that has ample storage (such as a data center); and a querying phase, during which a network manager or an automated monitoring service queries the server for the values of the performance metrics of interest.

Team

Publications

2020


ICDCS '20

Characterizing the Impact of TCP Coexistence in Data Center Networks
Anirudh Ganji, Anand Singh, Muhammad Shahzad
Download: [paper]


ICCCN '20

Choosing TCP Variants for Cloud Tenants – A Measurement based Approach
Anirudh Ganji, Anand Singh, Muhammad Shahzad
Download: [paper]


IEEE TPDS '20

SF-sketch: A Two-stage Sketch for Data Streams
Lingtong Liu, Yulong Shen, Yibo Yan, Tong Yang, Muhammad Shahzad, Bin Cui, Gaogang Xie
Impact factor at the time of acceptance: 3.402
Download: [paper]


MASS '20

Distributed and Privacy Preserving Routing of Connected Vehicles to Minimize Congestion
Surabhi R. Boob, Shakir Mahmood, Muhammad Shahzad
Download: [paper]


IFIP Networking '20

Distributed and Privacy Preserving Routing of Connected Vehicles to Minimize Congestion (Extended Abstract)
Surabhi Boob, Shakir Mahmood, Muhammad Shahzad
Download: [paper]

2019


ICCCN '19

Characterizing the Performance of WiFi in Dense IoT Deployments
Anirudh Ganji, Griffin Page, Muhammad Shahzad
Download: [paper]


IFIP Networking '19

A Framework to Secure IoT Networks Against Network Layer Attacks (Extended Abstract)
Raghav H. Venkatnarayan, Prasesh Adina, Shakir Mahmood, Muhammad Shahzad
Download: [paper]


IFIP Networking '19

Characterizing the Performance of WiFi in Dense IoT Deployments (Extended Abstract)
Anirudh Ganji, Griffin Page, Muhammad Shahzad
Download: [paper]

2018


ICNP '18

IoTm: A Lightweight Framework for Fine-grained Measurements of IoT Performance Metrics
Anirudh Ganji, Muhammad Shahzad
Download: [paper]


IEEE/ACM ToN '18

Identifying and Estimating Persistent Items in Data Streams
Haipeng Dai, Muhammad Shahzad
Impact factor at the time of acceptance: 3.11
Download: [paper]


WWW Journal '18

FID-Sketch: an Accurate Sketch to Store Frequencies in Data Streams
Tong Yang, Haowei Zhang, Hao Wang, Muhammad Shahzad, Qin Xin, Xue Liu, Xiaoming Li
Impact factor at the time of acceptance: 1.405
Download: [paper]


Mobile IoT SSP '18

Impacts and Detection of Network Layer Attacks on IoT Networks
Prasesh Adina, Raghav H. Venkatnarayan, Muhammad Shahzad
Download: [paper]

2017


BigData '17

Rectangular Hash Table: Bloom Filter and Bitmap Assisted Hash Table with High Speed
Tong Yang, Binchao Yin, Hang Li, Muhammad Shahzad, Steve Uhlig, Bin Cui, Xiaoming Li
Download: [paper]


IEEE IC '17

Continuous Authentication and Authorization for the Internet of Things (invited; not peer-reviewed)
Muhammad Shahzad, Munindar P. Singh
Impact factor at the time of publishing: 1.4
Download: [paper]


ICDCS '17

Fast and Accurate Tracking of Population Dynamics in RFID Systems
Muhammad Shahzad, Alex X. Liu
Acceptance rate: 16.9%
Download: [paper]


ICDE '17

SF-sketch: A Fast, Accurate, and Memory Efficient Data Structure to Store Frequencies of Data Items (Extended Abstract)
Tong Yang, Lingtong Liu, Yibo Yan, Muhammad Shahzad, Yulong Shen, Xiaoming Li, Bin Cui, Gaogang Xie
Download: [paper]


IEEE/ACM ToN '17

A Shifting Framework for Set Queries
Tong Yang, Alex Liu, Muhammad Shahzad, Yang Dongsheng, Qiaobin Fu, Gaogang Xie, Xiaoming Li
Impact factor at the time of acceptance: 3.376
Download: [paper]

Activities

We proposed a three phased approach to developing the framework to measure performance metrics in IoT networks and systems: a recording phase, during which IoT devices record information about the performance metric(s) of interest; a transportation phase, during which IoT devices transport the recorded information to a server that has ample storage (such as a data center); and a querying phase, during which a network manager or an automated monitoring service queries the server for the values of the performance metrics for the IoT devices of interest. Following are some of the projects supported through this grant.

SF-Sketch: Sketches es are probabilistic data structures designed for recording frequencies of items in a multi-set. They are widely used in various fields, especially for gathering Internet statistics from distributed data streams in network measurements. In distributed and IoT streaming applications, a sketch in each monitoring node "fills up" very quickly and then its content is transported to a remote collector responsible for answering queries. Due to this and due to the often small sizes of on-chip memories in IoT devices, the size of the contents transferred must be kept as small as possible while meeting the desired accuracy requirement. To obtain significantly higher accuracy while keeping the same update and query speed as the best prior sketches, we have developed a new sketch - the Slim-Fat (SF) sketch. The key idea behind the SF-sketch is to maintain two separate sketches: a larger sketch, the Fat-subsketch, and a smaller sketch, the Slim-subsketch. The Fat-subsketch is used for updating and periodically producing the Slim-subsketch, which is then transferred to the remote collector for answering queries quickly and accurately. We have also derived the error bound as well as an accurate model of the correct rate of the SF-sketch, and verify their correctness through experiments. We implemented and extensively evaluated the SF-sketch along with several prior sketches. Our results show that when the size of our Slim-subsketch and of the widely used Count-Min (CM) sketch are kept the same, our SF-sketch outperforms the CM-sketch by up to 33.1 times in terms of accuracy (when the ratio of the sizes of the Fat-subsketch and the Slim-subsketch is 16:1). [ICDE 2017, IEEE TPDS '20]

Transport Protocols in Data Centers: The switch fabrics of today’s data centers carry traffic controlled by a variety of TCP congestion control algorithms. This leads us to ask: how does the coexistence of multiple variants of TCP on shared switch fabric impact the performance achieved by different applications in data centers? To answer this question, we conducted an extensive set of experiments with coexisting TCP variants on Leaf-Spine and Fat-Tree switch fabrics. We executed common IoT and data center workloads, which include streaming, MapReduce, and storage workloads, using four commonly used TCP variants, namely BBR, DCTCP, CUBIC, and New Reno. We also extensively executed iPerf workloads using these 4 TCP variants to purely study the impact of the coexistence of TCP variants on each other’s performance without incorporating the network behavior of the application layer. Our experiments resulted in a large set of network traces comprised of 160 billion packets. We have presented comprehensive observations from these traces that have important implications in ensuring optimal utilization of data center switch fabric and in meeting the network performance needs of application layer workloads. [ICDCS '20]

Transport Protocols in Cloud: Cloud computing has become the de-facto paradigm for fulfilling the computing needs of a myriad of applications, especially the IoT applications. While a significant amount of work exists on how cloud providers can improve the networking performance of their cloud platforms, very little has been done to explore how a cloud tenant can achieve the best performance for their applications. We have studied and presented how the choice of the TCP variant impacts the performance achieved by tenant applications. We have developed a generic measurement based approach to identify the best TCP variant for any given application in a given cloud environment. Our approach is comprised of first measuring several performance metrics including throughput, latency, and loss in the given cloud platform for several TCP variants, and then identifying the best TCP variant based on three things: observations from the measurements, nature of the traffic of the given application, and application requirements such as high throughput or low latency. We studied the effectiveness of our approach by implementing it in two large public clouds, Amazon’s AWS and Google’s GCP, and presenting our observations from several case studies using three common cloud applications, namely streaming, distributed input-output, and sort, and four common TCP variants, namely Cubic, New Reno, BBR, and DCTCP. From our observations, we found that just by changing the TCP variant that an application uses, the average throughput can be increased by up to 13.7% and the round trip time can be decreased by up to 5 times. [ICCCN '20]

Minimize Congestion Through Connected Vehicles: With the internet of things enabling connectivity for a large number of vehicles on the roads, there is an opportunity to leverage their connectivity to minimize congestion on roads by calculating fast routes for vehicles in a way that each vehicle contributes as little to the congestion as possible. The existing commercial and research based approaches of calculating routes for vehicles suffer from one or more of the following two limitations: 1) they are not privacy preserving in the sense that they receive destination addresses from users and may either store and use them for other commercial purposes or are at a risk of getting hacked and exposing these addresses to hackers; and 2) they require expensive infrastructure such as road side units (RSUs). To address these limitations, we have developed a distributed and privacy preserving routing protocol, namely DPR, which the connected vehicles collaboratively and repeatedly execute to calculate fast routes to their destinations such that the overall congestion on the road network is significantly reduced and at the same time the privacy of the vehicles is preserved. The DPR protocol relies on direct vehicle to vehicle communication and does not need any new infrastructure such as RSUs. We have implemented and evaluated our DPR protocol through simulations on a real road network under several traffic conditions. Our results show that DPR reduces the average travel time of vehicles that travel a distance of 1000, 2500, and over 4000 meters by 15%, 32%, and 42%, respectively. This reduction in travel time is significant considering that this improvement results purely from software manipulations and without requiring any new infrastructure. [IFIP Networking '20, MASS '20]

WiFi Characterization in Dense IoT Deployments: With the advent of the internet of things, the number of wireless devices and the amount of wireless data traffic is increasing at an unprecedented rate. By 2020, the number of connected devices per person is expected to exceed 6.58. Due to the ubiquitous availability of IEEE 802.11n/ac (i.e., WiFi) in most modern homes and enterprise environments, the majority of the new home and enterprise focused IoT devices that are entering the market use IEEE 802.11n/ac to connect to the internet. Our literature survey revealed that there are no prior measurement studies on the performance of IEEE 802.11n/ac’s MAC in dense IoT environments. Thus, we conducted a comprehensive measurement study of various aspects of IEEE 802.11n/ac’s MAC using real IoT devices and realistic IoT workloads and presented several useful observations that demonstrate the need to revise some of the key aspects of IEEE 802.11n/ac’s MAC to make it suitable for dense IoT environments. This need for revisions stems from the fact that IEEE 802.11n/ac was primarily designed keeping conventional devices, such as laptops, smart phones, and servers in view. For such conventional devices, the amount of uplink traffic is significantly smaller compared to the amount of downlink traffic. Contrary to this, in dense IoT deployments, the amount of uplink traffic is significantly larger compared to the downlink traffic. We conducted our study on a real test bed comprising a large number of Raspberry Pis deployed in a real world environment and studied the impact of the density and type of IoT traffic on the throughput of the wireless system, bandwidth utilization of RTS and CTS frames, block acknowledgments, and frame aggregation sizes. We further studied the impact of TCP’s congestion control mechanism on the performance of IEEE 802.11n/ac’s MAC. [ICCCN 2019]

IoT Network Layer Security: IoT networks (IoTNs) are being used in a large number of applications including data center monitoring, industrial monitoring, and supply chain. As IoTNs are comprised of networked nodes, their implementations usually follow the 7-layered OSI model. As different layers provide different services, the anatomies of attacks on different layers are also different. Thus, the approach to defend against the attacks on any given layer also needs to be customized according to the anatomy of the attacks on that layer. In this project, our objective was to develop a distributed and lightweight framework that can quickly detect any arbitrary NL attack in any given IoTN, localize the compromised nodes launching the attack, and automatically mitigate the attack by isolating the compromised nodes without human intervention. We developed NLSec, a lightweight framework that achieves this objective. NLSec is completely distributed, i.e., it is implemented entirely in the IoT nodes and does not require any centralized computing. NLSec is also independent of the NL protocol that any given IoTN may be using. To develop intuitions that guided us in designing NLSec, we conducted an exploratory measurement study on a real IoTN test-bed and studied how the effects of various NL attacks manifest. From this study, we made two key observations. First, whenever one or more compromised nodes launch an NL attack, various performance metrics, such as CPU usage and input and output data rates, of the victim nodes change compared to when there is no attack. This is because victim nodes have to do a different amount of work when under attack due to the retransmissions and the processing of new route advertisements. Second, the change in the performance metrics is more significant in the nodes that are fewer hops away from the compromised nodes compared to the nodes that are farther away because the disruptive effects of a compromised node are more dominant on its closer neighbors. To detect that a given IoTN is under NL attack, NLSec leverages the first observation. It continuously measures changes in the performance metrics of all nodes in a distributed manner to find anomalies in them. To localize the compromised nodes, NLSec leverages the second observation and identifies the nodes that are connected to a large number of nodes with anomalous performance metrics as the compromised nodes. To mitigate the attack, NLSec coordinates with the NL protocol of the given IoTN and recalculates routes to isolate the compromised nodes. NLSec eliminates the need for immediate human intervention to mitigate attacks. [IFIP Networking 2019] [ACM Mobile IoT SSP 2018]

IoTm Framework: Most IoT applications require unique guarantees on various performance metrics (such as latency, CPU availability, power fairness, etc.) from the IoT infrastructure. A small deterioration in these performance metrics can cause serious violations of service level agreements. To ensure that the deployed IoT infrastructure delivers the guarantees on these metrics, the first step is to measure these metrics. We have developed IoTm, a framework for measuring IoT performance metrics, which include both IoT network's quality of service (QoS) metrics and IoT node's resource utilization (RU) metrics. IoTm has two key properties: 1) it is lightweight and thus amenable for implementation on resource constrained IoT nodes; and 2) it can perform measurements at fine-grained levels and not just at aggregate levels. IoTm is comprised of two components, a lightweight IoT node unit (INU), which resides in each of the IoT nodes, and a control and query unit (CQU), which resides in a logically centralized management server. The primary role of INU is to record appropriate information about the desired performance metrics in the IoT nodes. To record the information, INU leverages a generic data structure that we proposed. CQU is responsible for identifying the metrics and the IoT nodes on which those metrics should be monitored to achieve a desired measurement objective. CQU also stores the copies of data structures that the INU sends to it for long term storage. Both INU and CQU further contain query processing engines, which operate on the information stored in the data structures to answer measurement queries. [ICNP 2018]

Measurements in Data Streams: With the advent of the Internet of Things, data stream mining is becoming more and more important. Data stream mining is the process of extracting information from a data stream, where a data stream is an ordered sequence of data items that can often be read only once using limited computing and storage resources. One of the heavily studied problems in data stream mining is the fundamental problem of mining frequent items, which deals with finding items that occur most frequently in a given data stream over a period of time. A more generalized version of frequent item mining is the persistent frequent item mining. A persistent frequent item, unlike a frequent item, does not necessarily occur more frequently compared to other items over a short period of time, rather persists and occurs more frequently compared to other items only over a long period of time. We have addressed the fundamental problem of finding persistent items and estimating the number of times each persistent item occurred in a given data stream during a given period of time at any given observation point. We have developed a novel Persistent items Identification and Estimation scheme (PIE) that can not only accurately identify each persistent item with a probability greater than any desired false negative rate (FNR), but can also accurately estimate the number of occurrences of each persistent item. The key idea of PIE is that it uses Raptor codes to encode the ID of each item that appears at the observation point during a measurement period and stores only a few bits of the encoded ID in the memory. The item that is persistent occurs in enough measurement periods that enough encoded bits for the ID can be retrieved from the observation point to decode them correctly and get the ID of the persistent item. To estimate the number of occurrences of any given persistent item, PIE uses maximum likelihood estimation-based statistical techniques on the information already recorded during the measurement periods. [IEEE ToN 2018]

FID-Sketch: We proposed FID-sketch. Sketches are being extensively used in a large number of real world applications to estimate frequencies and other characteristics of data items. Due to the unprecedented increase in the amount of IoT data and a relatively slower increase in the size of on-chip memories, existing sketches are becoming increasingly unable to keep the accuracy of the estimates at an acceptable level. We have designed a new sketch, called FID-sketch, that has a significantly higher accuracy and a much smaller on-chip memory footprint compared to the existing sketches. The key intuition behind the design of the FID-sketch is that before inserting an item, unlike prior sketches, it first estimates the current value of the frequency of that item stored in the sketch, and then increments as few counters as possible instead of incrementing a pre-determined fixed number of counters. We carried out extensive experiments to evaluate and compare the performance of FID-sketch with existing sketches. Our experimental results show that our FID-sketch significantly outperforms the state-of-the-art with 36.7 times lower relative error. We have released the source code of our proposed sketch and other related sketches that we implemented at Github. [FID code] [WWW Journal 2018]

Shifting bloom filter: Set queries, such as membership queries, association queries, and multiplicity queries, are fundamental operations in IoT systems and applications. Membership queries check whether an element is a member of a given set. Association queries identify which set(s) among a pair of sets contain a given element. Multiplicity queries check how many times an element appears in a multi-set. Performance measurement applications, such as measuring flow sizes, latencies, and throughput, often involve these three types of queries. We addressed the fundamental problem of designing a probabilistic data structure that can quickly process set queries using a small amount of memory, which makes it well-suited for IoT devices. We developed a shifting bloom filter (ShBF) framework for representing and querying sets. We demonstrated the effectiveness of ShBF using three popular types of set queries mentioned above, i.e., membership queries, association queries, and multiplicity queries. The key novelty of ShBF is on encoding the auxiliary information of a set element in a location offset. In contrast, prior bloom filter (BF) based set data structures allocate additional memory to store auxiliary information. We further extended our shifting framework from BF-based data structures to sketch-based data structures, which are widely used to store multiplicities of items. We conducted experiments using real-world network traces, and our results show that ShBF significantly advances the state-of-the-art in terms of accuracy, speed, and memory footprint for all three types of set queries that we tested. [ToN 2017]

Dynamic RFID estimation: In the proposal, we proposed to explore both out-of-band and in-band transportation of collected information from IoT devices to the servers. For out-of-band transportation, we proposed to explore the use of RFID systems because the RFID tags can transmit data without requiring any dedicated power source. More specifically, we worked on the problem of estimating the number of arriving and departing tags between any two time instants in dynamically changing RFID tag populations. This work finds direct application in monitoring low power IoT deployments to estimate the number of IoT nodes active at any given point in time. Such estimates are required to perform important tradeoffs in IoT systems, such as minimizing duty cycle of IoT sensing nodes without compromising sensing quality. [ICDCS 2017]

Rectangular Hash Table: Extending our recording phase to efficient hash tables, we proposed rectangular hash table. Hash tables are widely used data structures that achieve an O(1) average lookup speed. Unfortunately, hash tables suffer from collisions and the rate of collisions is largely determined by the load factor. Broadly speaking, existing research has taken two approaches to improve the performance of hash tables. The first approach trades-off collision rate with memory usage, but only works well under low load. The second approach pursues high load and no hash collisions, but comes with update failures. We have designed a practical and efficient hash table that achieves high load factor, low hash collision rate, fast lookup speed, fast update speed, and zero update failures. To achieve this goal, we take a three-step approach. First, we developed a set of hashing techniques that leverage Bloom filters to significantly reduce hash collision rates. Second, we introduced a novel kick mechanism to achieve a high load factor. Last, we developed bitmaps to significantly accelerate the kick mechanism. Theoretical analysis and experimental results show that our hashing schemes significantly outperform the state-of-the-art. Our hash table achieves a high load factor (greater than 95%), a low collision rate (less than 0.56%), and the number of hash buckets almost equals to the number of key-value pairs. Given n key-value pairs, the collision rate is reduced to 0 by either using 1.18 times n buckets or by allowing up to 5 blind kicks. We have released the source code of the implementations of our hash table and of 6 prior hash tables at Github. [RHT code] [IEEE BigData 2017]

Sponsors

Organization: National Science Foundation
Program: Networking Technology and Systems (NeTS)
Sponsor's website of this project