How Netify Works
Netify is a cloud-based service, however, similar to technology like smart-meters or IP-based security systems, Netify uses a piece of software that resides on site (the agent) that feeds data scanned on the network to the Netify engine in the cloud. Why architect it like this? The agent is designed to be extremely lightweight and can be run on equipment having even modest hardware specs - the ubiquitous wireless router is a good example. It also allows us to port the agent to almost any device. Collecting, analyzing and displaying data that scales with the amount of captured metadata on even a small network can only reliable be done in the cloud. It also provides a single login to manage any number of networks - handy if you administer multiple networks.
We want to be as transparent as possible about how our technology works. Like you, we'd have questions too about installing a device that reports traffic metadata... questions like:
- How does my data get collected?
- What data do you collect?
- How to you protect my privacy and anonymity?
- Is it secure?
- What do you do with my metadata?
To answer these questions (and others), we have compiled a detailed explanation of how Netify works below. If you still have unanswered questions, check out the Additional Reading links, our FAQ or send us a note.
Netify Daemon (Netifyd)
The Netify Daemon (Netifyd) captures network traffic across internal (LAN) and external (WAN) interfaces using libpcap. This is a passive capture interface which means that netifyd can not alter or manipulate the packets - it only 'sees' up to a certain number of bytes per packet. These captured packets are then examined and if found to be of a suitable size and type (protocol), they enter the Netify nDPI protocol detection engine. The nDPI engine associates packets to packet flows which can be either unidirectional or bidirectional conversations between two network devices. These flows can originate from an internal host to an external host, an internal host to another internal host, or vice versa. These flows are not to be confused with TCP streams. Flows can be of any IP protocol (UDP, ICMP, etc) or version (4 or 6). The nDPI engine attempts to identify the flow's protocol or associated service as quickly as possible. For TCP flows, the first 10 packets are examined, and for all other datagram flows, the first 8 packets are processed.
Protocols vs. Applications
The Netify Daemon and Netify Cloud/Portal classify traffic as a protocol, a service, or in many cases, both. To Netify, a protocol is a real world protocol, such as FTP, HTTP, SMTP, etc. An application is a more general way to classify a conversation which tells us what service the packet flow relates to. For example, a conversation can be detected as HTTPS based on the TLS handshake, that would set the protocol to HTTPS. Further analysis of the SSL certificate(s) may reveal that the common name contains the string ".fbcdn.net". This would be further used to identify the conversation as HTTPS - Facebook. If there was no valid certificate found, or the common name was not recognized (as is the case for most self-signed certificates), the special application ID of "No certificate" would be used. It is possible to have packet flows detected by application where the protocol is unknown. For example, if a proprietary protocol is captured but there is no supported dissector available, yet we know that the IPv4 block is owned by Citrix, the application of the conversation will be guessed, in this case; GoToMeeting. Address and port-based detections are not as accurate or reliable compared to detections made by dissectors (proper DPI detection).
Detection by Dissector
The nDPI engine will always try to perform an initial deep-packet inspection, as this is by far the most accurate and reliable form of analysis. The flow's packets are passed through protocol dissectors. These are small, high-speed (compiled) tests that examine the contents of a packet flow. As an example, for a protocol like SMTP (simple mail transfer protocol) the corresponding dissector will look for SMTP-specific strings, such as:
This is just a simple example as there are further tests that attempt to significantly reduce over matching (false-positive detections). If all dissectors run and there was no match or if the small number of packets to examine is exhausted without a match then protocol detection is "guessed".
"HELO *.*" or "MAIL FROM:<*>"
Classification when Protocol and Application are Unknown
If deep-packet inspection fails, the nDPI engine will then attempt to guess the protocol based off a series of educated guesses. The engine knows who is involved in a conversation and it knows what ports they're talking on. Using these criteria, the engine will attempt to make a guess, for example the IPv4 block 188.8.131.52/18 is registered to Netflix, so packet flows with an endpoint in that address range would be associated with Netflix (as an application, not a protocol). Further guesses can be made by examining the ports in use. For example, an undetected flow using port 20 and/or 21 would be guessed to be FTP control/data.
Netify Daemon Internals
The Netify Daemon was built to run exclusively on a Linux host. There are no current plans to port to other platforms as Netify is heavily integrated with Linux-only APIs such as Netfilter/Conntrack IPTables, and Netlink. As mentioned in the Overview, Netifyd captures traffic from internal and external network interfaces. The ideal location for Netifyd to be installed is on the network gateway. Here, it has complete visibility of conversations across all internal and external networks. By being able to "see" both sides of a gateway, Netifyd can easily determine who's talking to who — which hosts are internal LAN devices versus remote/Internet endpoints. It can do this dynamically which greatly simplifies configuration. However, it is possible to run Netifyd in an injected mode where packets from a mirrored switch port are passed to a Netify device for processing. The downside to this configuration is that the network topology must be specified statically which can become out-of-sync with the live environment when/if it changes.
A discrete capture and detection thread is started for each network interface. These threads are scheduled on dedicated CPU cores when possible. This design makes use of multi-core CPUs (which are increasingly commonplace) to parallelize the workload. Doing DPI detection of high-bandwidth networks can be very CPU intensive. As Netifyd uses a modified nDPI engine to do the work, the benchmarking data presented in this whitepaper would be relevant reading for anyone interested in packet processing performance.
Each detection thread then keeps an automatically-pruned unordered hash map of flows that it has processed. The main server thread collects these flows at a regular interval (currently every 15 seconds). The collection of flows along with a variety of other useful statistics (number of IP, TCP, UDP, fragmented, discarded packets, etc), and metrics like all visible network hosts with corresponding MAC/IP addresses, are encoded into a compressed JSON payload. The data is uploaded to the Netify Sink Cloud Servers for storage, reporting and further analysis.
Privacy and Anonymity
In order to role out compelling features like real-time alerts, malware detection, executive reports and active firewall control, we need to analyze the data coming in. To analyze data, you need to be able to read it.
Given these design requirements, our task - as it related to privacy - could be broken down into two parts.
- The association between a user's account (e.g. our CRM database) and any of the flow data coming from Netify agent must require a private key that only the user had access to.
- Any metadata our users provided that could potentially reveal their identify or account must not be readable by anyone but themselves. Zero-knowledge.
To accomplish the first task, we placed network flow data and customer CRM data in two disparate containers. The relationship between a user account and the data from an active Netify agent requires a key known only to the account holder. This key is never transmitted to either Netify server node (Netify Informatics API or the Netify portal). Without the key, it's impossible to associate network traffic data to an Netify account.
To accomplish the second part, we couldn't encrypt the flow data, however, we could encrypt the metadata users of Netify provided to use the service. For example, getting an alert that a device with a MAC address of 00:13:2v:02:89:bf is pretty useless, however, associating that MAC with a user like "Ben", belonging to a group "Development team" using a device like "Fujitsu S710 Laptop" is useful. We multi-purposed the private key described above to encrypt all metadata provided by users so that their privacy and identity could not be compromised.