metaSwarm Technical Overview
MetaSwarm has over 34 patents pending on breakthrough technologies that provide the basis of the Essurance system. Collectively, these technologies encompass multiple electronic communications modalities (ECMs). That is, MetaSwarm technologies apply not only to email communications, but also to short message service (SMS) communications used by mobile phones, instant messaging (IM), and websites. The technologies apply to:
Behavioral Envelope (BE)
Body Link Analysis
Anonymous Data Correlation
Behavioral Envelope (BE) Construction
A Behavioral Envelope (BE) represents the combined characteristics of a message, website, or other form of electronic communication. An important feature of Behavioral Envelopes is the capability of comparing large numbers of them in minimal time. The Behavioral Envelope differs depending on the type of electronic communication:
Bulk Message Envelope—A bulk message is an email, SMS, or IM/IRC message for which many copies are sent to users. Most spam consists of bulk messages. We can find messages that have identical or substantially similar properties; and create a description that defines the properties of the collection of messages. The message collection descriptor is the Bulk Message Envelope (BME).
Bulk Website Envelope—Each website has a set of characteristics based on its structure and contents. In the same way that a BME is a collection descriptor of a message, the Bulk Website Envelope (BWE) is a collection descriptor for a website (or blog). Given the nature and diversity of websites, no two websites should be identical, unless it is mirrored (in which case the mirrored site should have been authorized). Therefore, identical and substantially similar websites raise a flag. The BWE makes possible a comparison of tens of thousands of websites that can lead to fake or cloned websites.
Bulk Emitter Envelope and Bulk Receiver Envelope—Whereas the Bulk Message and Website Envelopes are content-based, the Bulk Emitter and Receiver Envelopes are media-based. They center on message propagation and the channel end points. The Bulk Emitter Envelope (BEE) profiles the original emitter mail server and any intermediate remailers and relays. The Bulk Receiver Envelope (BRE) profiles the receiving mail server or remailer. The profiles indicate whether any obfuscation has been attempted anywhere in the message transmission chain.
The premise for MetaSwarm’s validation technology is to verify what is good, rather than attempt to identify what might be bad. Verifying what is good is certain and effective, and much more efficient than trying to identify what is bad when the bad is constantly changing.
To verify what’s good, MetaSwarm uses Partner Lists, notPhish tags, and Behavioral Envelope technology to validate that messages and websites are genuine; that is, they are not forged, fake, or in any way altered. A browser plug-in enables users to check the validity of an email message or website.
Partner List—MetaSwarm uses the Partner List to specify the good. Each MetaSwarm client, such as a bank, uses the Partner List as a repository for its valid information—its valid URLs, valid partner URLs, profiles of its websites, profiles of e-mailings, and Partner List activation dates and times. The Partner List contains information relating to the corporate or government client in general, but may also have information on a per-mailing or mass-mailing basis.
notPhish tags—These tags are inserted into client messages and websites, and specify the applicable Partner List to be used to verify their integrity. The tags allow mass customization of email, while allowing validation of the entire mass mailing with one validation block.
notPhish plug-in—The notPhish plug-in, distributed by a client to its customers, adds a notPhish icon onto the customer’s browser and email application window. The plug-in validates messages and websites against the Partner List at the customer’s computer and uses the icon to indicate a message’s or website’s validity to the customer. Only an exact match to the Partner List is valid.
HyperSwarming is a powerful set of technologies that makes vast amounts of information manageable efficiently. They include:
Behavioral Envelope Annotation—HyperSwarm processes annotate Behavioral Envelopes. The annotations are based on:
- Transmission characteristics that indicate concrete details of a message or website, such as, for messages, the sender and recipient names, domains, number of relays used, timestamps, IP addresses, etc.
- Construction characteristics that indicate whether a message or website uses obfuscating methods, such as use of invisible text, scripts, HTML tags, etc.
- Content-based characteristics that classify the content of a message or website. MetaSwarm uses a base set of categories for annotating the content, including religious, political, pharmaceutical, pornographic, and financial. Categories and subcategories are customizable—users may define any set of tokens of interest, including Bayesian fuzzy keyword or token phrases. Because of the language-independence of the MetaSwarm technologies, keywords and tokens may be in any language.
- Relational characteristics that indicate how an entity (message or website) relates to other entities with respect to any of the transmission, construction, or content-based characteristics.
Swarming and HyperSwarming—The swarming process builds relationship models for a specific type of Behavioral Envelope (bulk messages, websites, message emitters, and message receivers). The swarming is based on the Behavioral Envelope annotations. So, for example, email messages with a common annotation would be clustered together.
HyperSwarming takes swarming to a higher level, building relationship models across different types of Behavioral Envelopes. HyperSwarming relates information gleaned from email messages, for example, to information gleaned from websites. So, an email message with a particular annotation may be linked to one or multiple websites with the same annotation.
HyperSwarming can build relationships based on a focal point, such as a single message or website in the same space (e.g., email or website) or across spaces (e.g., email and website).
SwarmTracking—SwarmTracking is the data mining technology used with HyperSwarming, providing analysis tools for following the relationships made evident through swarming and HyperSwarming. SwarmTracking analysis tools generated the following diagrams that illustrate a cluster and a subcluster of domains with a relationship to a specific spam domain.
HyperSwarming makes possible monitoring, filtering, and control mechanisms:
Monitoring—Data mining can be highly targeted and can be automated based on specific Behavioral Envelope properties or on any type of content tokens, including file types and keywords. Automated reporting and notification further reduce the time and attention required of human resources.
Filtering—Filtering can be content-based (to filter out spam, for example), or it can be based on any Behavioral Envelope property, including filtering by protocol or port usage.
Control—Web-based interfaces enable users to specify control levels, including Parental Control (to protect children against inappropriate messages and websites), Corporate Control (to protect against misuse or abuse of corporate intranets), and Access Control by User and/or Protocol (HTTP, SMTP, FTP, DNS, etc.).
Body Link Analysis
MetaSwarm was the first to realize the importance of focusing on the links in message bodies; links that spam and phishing messages use to lure users to fraudulent sites.
Body links may route a user to any of numerous domains. However, MetaSwarm technologies reduce body link domains to base domains; that is, the domains that somebody spends money to acquire. The focus on base domains leads more directly to phishers and spammers.
Anonymous data correlation
As a collection descriptor, a Behavioral Envelope represents the characteristics of a message or website—while retaining the privacy of the contents. This feature makes possible anonymous data correlation that enables the sharing of information without divulging private or confidential data.
With anonymous data correlation, one owner of a Behavioral Envelope database can search the Behavioral Envelope database of a second owner—without either owner knowing the contents or scope of the other’s database:
- If the search yields a match, then both owners know that they share some data elements, which may be cause for further investigation by either or both database owners, or the owners may choose to share additional data relating to the shared elements.
- If the search does not yield a match, then each owner remains unaware of the other’s database contents.
Anonymous data correlation provides a basis for collaboration by groups that typically do not share data.