Enhancing PacketFlowAI: Upgrading Packet Classification Algorithm with Hyperdimensional Computing

Oct 20

10 min read

Re-posted with permission from Robert McMenemy at: Enhancing PacketFlowAI: Upgrading My Packet Classification Algorithm with Hyperdimensional Computing

Introduction

Network security is one of the most pressing issues in today’s digital landscape. The ability to classify and detect malicious network traffic in real-time is crucial to prevent cyber attacks. PacketFlowAI is an open-source project I designed for packet classification, offering an automated approach to monitoring and analysing network traffic to detect security threats such as DDoS, malware, port scanning and phishing attacks.

While the traditional machine learning models I used in PacketFlowAI have been effective they have limitations in terms of scalability, robustness to noise and their computational efficiency. To address these challenges I’ve upgraded PacketFlowAI using Hyperdimensional Computing (HDC) — a computing paradigm inspired by brain-like processing of information. This blog post provides a comprehensive overview of the mathematical foundations, architecture and practical implementation of HDC in PacketFlowAI.

What is PacketFlowAI?

PacketFlowAI is a real-time packet classification system I built to handle various types of network traffic. It categorises packets as benign or malicious based on features extracted from network data.

The project is designed to scale for large network environments in turn making it ideal for intrusion detection systems (IDS) and other security applications. However as network traffic increases in complexity, traditional machine learning approaches begin to struggle with processing speed, noise tolerance and the flexibility to adapt to new attack patterns.

By transitioning to Hyperdimensional Computing (HDC), PacketFlowAI not only improves its classification accuracy but also introduces a scalable brain-inspired architecture capable of handling high-dimensional and noisy data.

The Challenges of Traditional Packet Classification Approaches

Traditional approaches to packet classification rely heavily on feature engineering followed by models like logistic regression, decision trees or even deep learning models.

However, these approaches have key challenges:

Sensitivity to Noise: Network traffic is often noisy or incomplete. Traditional models can overfit to noise or struggle when data is incomplete or corrupted.
Feature Scaling and Generalisation: As new attack patterns and features emerge traditional models often require retraining and feature redesign to incorporate these changes.
Limited Scalability: Handling large-scale, real-time network data in traditional machine learning pipelines can result in significant computational overhead.

Given these limitations it was crucial to look for a paradigm that could tackle complex, noisy and high-dimensional data while remaining scalable and adaptable to evolving network threats. Hyperdimensional computing provides an elegant solution to these challenges.

Enter Hyperdimensional Computing (HDC)

What is Hyperdimensional Computing?

Hyperdimensional Computing (HDC) is a computing framework where data is represented by hypervectors — high-dimensional, random vectors (often binary or bipolar). This paradigm is inspired by the way the human brain processes information in distributed and high-dimensional patterns.

In HDC, hypervectors can represent anything from numbers to sequences of symbols or more complex structures such as images or network packets. These hypervectors are manipulated through simple, highly parallelizable mathematical operations thus allowing for efficient and robust computing.

Why Use HDC for Packet Classification?

HDC provides several benefits that make it particularly well-suited for packet classification in network security:

Noise Robustness: Hypervectors inherently exhibit tolerance to small errors or noise in the data. Small changes in the input do not dramatically affect the high-dimensional representation.
Scalability: The operations on hypervectors — such as bundling, binding and permutation — are computationally efficient and can easily be scaled across distributed systems or hardware accelerators.
Brain-Inspired Efficiency: HDC models mimic the distributed representation of information in the brain, making them flexible for adapting to complex and variable data.
Minimal Feature Engineering: Unlike traditional approaches that require significant feature engineering, HDC automatically adapts to new features by encoding them as hypervectors.
Real-Time Capabilities: The highly parallelizable nature of HDC allows for fast processing, making it ideal for real-time network traffic analysis.

Data Representation with Hypervectors

One of the most significant shifts in this upgrade to PacketFlowAI is the move from traditional feature vectors to hypervectors. Hypervectors are high-dimensional vectors (e.g., 10,000 dimensions) used to represent the data. In the case of PacketFlowAI, both the packet features and textual data (such as packet descriptions) are converted into hypervectors.

Encoding Packet Features with Hypervectors

We take numerical and categorical features extracted from network packets and convert them into hypervectors.

Here’s how this process works:

Key Packet Features

IP Version
IP Length
TCP Source Port
TCP Destination Port
TCP Flags

For each feature, the value is mapped to a hypervector through quantisation (for numerical features) or random vector assignment (for categorical features). This allows us to represent these features in a unified, high-dimensional space.

Hypervector Encoding Process

class HypervectorEncoder:
    def __init__(self, dimension, num_levels=100):
        self.dimension = dimension
        self.num_levels = num_levels
        self.level_hvs = self._generate_level_hvs()
        self.feature_hvs = {}

    def _generate_random_hv(self):
        return np.random.choice([-1, 1], size=self.dimension)

    def encode_numerical(self, feature_name, value, min_value, max_value):
        level = int((value - min_value) / (max_value - min_value) * (self.num_levels - 1))
        return self.level_hvs[level]

    def bundle(self, vectors):
        bundled = np.sum(vectors, axis=0)
        return np.sign(bundled)

In the above code, we:

Quantise numerical values into levels and assign each level a unique hypervector.
Map categorical values directly to random hypervectors.
Bundle all hypervectors representing different features into a single high-dimensional representation.

Encoding Text Data

Text data (such as packet explanations) can be represented as a sequence of characters, with each character being mapped to a hypervector. These hypervectors are combined to form a representation for the entire text.

class TextHypervectorEncoder:
    def __init__(self, dimension):
        self.dimension = dimension
        self.char_hvs = self._generate_char_hvs()

    def encode_text(self, text):
        hvs = [self.char_hvs.get(char, np.zeros(self.dimension)) for char in text]
        bundled = np.sum(hvs, axis=0)
        return np.sign(bundled)

This approach allows us to represent long strings of text, like packet explanations, in a concise and high-dimensional format that can be efficiently processed by the model.

The Mathematics Behind Hyperdimensional Computing

Hypervectors: The Core Representation

In hyperdimensional computing, the data is encoded into hypervectors, which are vectors with thousands of dimensions (e.g., 10,000). These vectors are typically random and binary (e.g., elements are -1 or 1). Hypervectors have several key properties that make them robust and computationally efficient:

High Dimensionality: The high dimensionality (e.g., 10,000) means that random hypervectors are almost orthogonal to each other. This allows them to represent different items without interference.
Distributed Representation: Each bit in the hypervector carries a small amount of information, and the entire vector represents the whole item. This distributed nature makes the representation tolerant to errors and noise.

Bundling: Element-wise Summation

One of the core operations in hyperdimensional computing is bundling, which is the element-wise summation of hypervectors. Bundling allows us to represent multiple features as a single hypervector.

The sign function ensures that the resulting hypervector remains binary (-1 or 1). The summation captures the combined information of the input hypervectors.

Binding: Element-wise Multiplication

Binding is another essential operation in hyperdimensional computing. It is used to combine two hypervectors in a way that the result is orthogonal to both inputs. This is useful for encoding associations between features.

Permutation: Circular Shifting

The permutation operation introduces order to sequences of hypervectors by circularly shifting their elements. This is particularly useful for encoding sequences of features or text.

Noise Tolerance and Robustness

One of the key mathematical properties of hypervectors is their tolerance to noise. Since each hypervector has thousands of dimensions, small changes (such as flipping a few bits) do not significantly alter the overall vector. This property makes HDC models inherently robust to noisy or incomplete data, which is a common challenge in network traffic analysis.

Model Architecture Redesign: HVModel

With the data now represented as hypervectors, the next step is to design a model that can operate directly on these high-dimensional inputs. The redesigned model: HVModel, is optimized for processing hypervectors and learning patterns in this unique data space.

Architecture Overview

The architecture of HVModel is simple yet powerful. The model consists of several fully connected layers with ReLU activations and dropout regularization to prevent overfitting.

Model Layers

Input Layer: The input to the model is a hypervector of high dimensionality (e.g., 10,000 dimensions).
Hidden Layers: Two fully connected layers, each with 512 and 256 neurons, respectively.
Dropout: Dropout layers are added between fully connected layers to improve generalization and prevent overfitting.
Output Layer: The output layer produces a probability distribution over the possible classes (e.g., benign vs. malicious traffic).

Model Definition

class HVModel(nn.Module):
    def __init__(self, input_dim, num_categories):
        super(HVModel, self).__init__()
        self.fc1 = nn.Linear(input_dim, 512)
        self.dropout1 = nn.Dropout(0.5)
        self.fc2 = nn.Linear(512, 256)
        self.dropout2 = nn.Dropout(0.5)
        self.fc3 = nn.Linear(256, num_categories)

    def forward(self, x):
        x = x.float()
        x = F.relu(self.fc1(x))
        x = self.dropout1(x)
        x = F.relu(self.fc2(x))
        x = self.dropout2(x)
        x = self.fc3(x)
        return x

Why HVModel Works for Hypervectors

Dimensionality: The model is designed to handle high-dimensional input data, making it well-suited for hypervector-based inputs.
Non-Linear Transformations: The ReLU activation functions allow the model to learn non-linear relationships between features encoded in hypervectors.
Dropout Regularization: Dropout helps improve the model’s ability to generalize by preventing overfitting to the training data.

Updated Training and Evaluation Process

The training and evaluation processes are redesigned to accommodate hypervector-based inputs. This section walks through how we preprocess data, train the HVModel, and evaluate its performance.

Data Preprocessing

The data preprocessing step is responsible for converting raw packet data into hypervectors, which will then be used for model training and evaluation.

This involves two key steps:

Packet Feature Encoding: Convert numerical and categorical features from network packets into hypervectors.
Text Data Encoding: Convert packet explanations or metadata into hypervectors using a character-level encoding scheme.

Preprocessing Code Example

def preprocess_data(dataset):
    hv_data = []
    targets = []
    for item in dataset:
        features = extract_features(item['Packet/Tags'])
        packet_hv = encode_packet_features(features)
        hv_data.append(packet_hv)

        label = extract_label(item['Explanation'])
        targets.append(label)

    hv_data = np.stack(hv_data)
    hv_data = torch.tensor(hv_data, dtype=torch.float32)
    targets = torch.tensor(targets, dtype=torch.long)
    return hv_data, targets

In this function:

Packet features are extracted from the dataset, and each feature is converted into a hypervector.
The target labels (e.g., benign vs. malicious) are also extracted and stored for supervised training.

Model Training

Once the data is preprocessed, we train the HVModel using the train() function. This function implements the standard supervised learning loop: forward pass, loss calculation, backward pass (for gradient computation), and parameter update.

def train(model, device, train_loader, optimizer, epoch):
    model.train()
    total_loss = 0
    for batch_idx, (hv_features, targets) in enumerate(train_loader):
        hv_features, targets = hv_features.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(hv_features)
        loss = nn.CrossEntropyLoss()(outputs, targets)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

        if batch_idx % log_interval == 0:
            logging.info(f'Train Epoch: {epoch} [{batch_idx * len(hv_features)}/{len(train_loader.dataset)} '
                  f'({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')

Optimizer: We use Adam optimizer to adjust the model’s parameters based on the computed gradients.
Loss Function: Cross-entropy loss is used for classification tasks, comparing the predicted probabilities with the true labels.

Model Evaluation

After each epoch, the model is evaluated on a separate validation set to monitor its performance. The evaluation metrics include precision, recall, F1 score, and accuracy.

def evaluate(model, device, test_loader):
    model.eval()
    all_preds = []
    all_targets = []

    with torch.no_grad():
        for hv_features, targets in test_loader:
            hv_features, targets = hv_features.to(device), targets.to(device)
            outputs = model(hv_features)
            preds = outputs.argmax(dim=1, keepdim=True).squeeze()

            all_preds.extend(preds.cpu().numpy())
            all_targets.extend(targets.cpu().numpy())

    precision = precision_score(all_targets, all_preds, average='weighted')
    recall = recall_score(all_targets, all_preds, average='weighted')
    f1 = f1_score(all_targets, all_preds, average='weighted')
    accuracy = sum(p == t for p, t in zip(all_preds, all_targets)) / len(all_preds)

    logging.info(f'Precision: {precision:.4f}, Recall: {recall:.4f}, F1 Score: {f1:.4f}, Accuracy: {accuracy:.4f}')

This function computes the performance metrics after each epoch to assess how well the model generalizes to unseen data.

Detailed Code Walkthrough

In this section, we’ll dive deeper into the key code snippets, explaining how each piece fits into the overall packet classification system.

Packet Feature Encoding

The encode_packet_features() function takes the extracted packet features and transforms them into hypervectors using the HypervectorEncoder.

def encode_packet_features(features):
    hv_list = []

    hv_list.append(packet_hv_encoder.encode_numerical('ip_version', features['ip_version'], 0, 6))
    hv_list.append(packet_hv_encoder.encode_numerical('ip_len', features['ip_len'], 0, 65535))
    hv_list.append(packet_hv_encoder.encode_numerical('tcp_sport', features['tcp_sport'], 0, 65535))
    hv_list.append(packet_hv_encoder.encode_numerical('tcp_dport', features['tcp_dport'], 0, 65535))
    hv_list.append(packet_hv_encoder.encode_numerical('tcp_flags', features['tcp_flags'], 0, 255))

    packet_hv = packet_hv_encoder.bundle(hv_list)
    return packet_hv

In this function, each feature is first encoded as a hypervector and then bundled together to create a composite hypervector representing the entire packet.

Text Data Encoding

Similarly, the encode_text() function transforms packet explanations into hypervectors.

def encode_text(text):
    text_hv = text_hv_encoder.encode_text(text)
    return text_hv

This function is responsible for transforming raw textual data into hypervectors using the TextHypervectorEncoder.

Packet Processing

Once the data has been transformed into hypervectors, the packet classification model can be used to make predictions. The process_packet() function handles the classification of each packet.

def process_packet(packet, model, device):
    packet_hv = preprocess_packet(packet)
    if packet_hv is None:
        logging.info("Packet preprocessing returned None, skipping.")
        return

    packet_hv = torch.tensor(packet_hv, dtype=torch.float32).unsqueeze(0).to(device)
    with torch.no_grad():
        output = model(packet_hv)
        prediction = torch.argmax(output, dim=1).item()

    if prediction != 0:
        redirect_packet(packet)

This function:

Preprocesses the packet into a hypervector.
Passes the hypervector through the trained model to make a prediction.
Redirects malicious packets if they are classified as such.

Real-World Use Cases and Applications

Real-Time Intrusion Detection

PacketFlowAI can be deployed in real-time network environments where quick decision-making is critical. With its hyperdimensional computing-based design, the system is capable of processing large volumes of data efficiently, classifying traffic, and detecting malicious packets on the fly.

Handling High-Dimensional Data

In modern network security, the amount of data and the number of features extracted from packets can be overwhelming. By using hypervectors, we can represent this high-dimensional data in a condensed, yet highly informative format that retains essential properties of the data.

Scalability in Large Network Environments

Thanks to its highly parallelisable nature, hyperdimensional computing is well-suited for handling vast amounts of network traffic data. PacketFlowAI can scale across distributed systems or hardware accelerators without suffering from performance bottlenecks.

Benefits of Hyperdimensional Computing Over Traditional Methods

Noise Tolerance

Traditional models tend to overfit or produce incorrect results when exposed to noisy or incomplete data. Hyperdimensional computing, on the other hand, is inherently robust to noise, as the distributed nature of hypervectors allows the system to tolerate minor fluctuations in the data.

Scalability

Unlike traditional models that require major redesigns to incorporate new features, hyperdimensional models can easily scale. Each new feature can be encoded as a hypervector and bundled with the existing representation without requiring major changes to the model’s architecture.

Efficiency and Speed

Operations on hypervectors — such as element-wise addition and multiplication — are computationally efficient and can be executed in parallel. This makes hyperdimensional computing particularly well-suited for real-time applications like packet classification in large-scale networks.

Practical Advantages in Network Security

Flexibility

HDC allows for the easy integration of new features as they become necessary, making it adaptable to emerging network threats and novel attack vectors.

Real-Time Capabilities

By using hyperdimensional computing, PacketFlowAI can process and classify network packets in real-time, a crucial requirement for modern intrusion detection systems.

Robustness to Adversarial Attacks

Adversarial attacks, which are designed to confuse machine learning models, are less effective against hyperdimensional models due to their distributed nature and robustness to small perturbations in the input data.

Future Directions for PacketFlowAI

While the integration of hyperdimensional computing into PacketFlowAI is a significant improvement, there are several avenues for future research and development:

Hybrid Models: Combining hyperdimensional computing with other AI techniques, such as deep learning or symbolic reasoning, could further enhance performance.
Hardware Acceleration: Implementing HDC models on hardware accelerators such as FPGAs or GPUs could dramatically improve the system’s processing speed.
Multimodal Data Integration: Incorporating additional data types, such as images or sensor readings, by encoding them as hypervectors would expand the applicability of PacketFlowAI beyond network security.

Conclusion

The integration of hyperdimensional computing into PacketFlowAI represents a major leap forward in network packet classification. By transitioning from traditional machine learning models to hypervector-based representations, we’ve improved the system’s robustness, scalability and efficiency.

As we look ahead, hyperdimensional computing offers exciting possibilities for network security and beyond. We encourage the community to explore the upgraded version of PacketFlowAI and contribute to its continued development.

References

Kanerva, P. (2009). Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors. Cognitive Computation, 1(2), 139–159.
Rahimi, A., & Kanerva, P. (2016). A Robust and Efficient Method for Short Text Classification using Class Hypervectors. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI-16).
PacketFlowAI GitHub Repository