GPS Spoofing & Jamming Detection System

Comprehensive Documentation

System Overview

The GPS Spoofing & Jamming Detection System is a desktop-based forensic application designed to analyze, detect, and classify GPS spoofing and signal jamming attacks by evaluating GNSS-based telemetry logs extracted from UAV (Unmanned Aerial Vehicle) systems.

Purpose

This system analyzes telemetry-based GNSS signal logs to identify threats such as:

  • GPS Spoofing (fake coordinates injection)
  • RF Jamming (signal interference)
  • Signal Degradation
  • Satellite Anomalies

Key Capabilities

orensic-Grade Analysis: Read-only processing with SHA-256 hashing and chain-of-custody tracking for courtroom admissibility.
Machine Learning Detection: Uses Isolation Forest, One-Class SVM, and LSTM Autoencoder to detect anomalies with high accuracy.
Comprehensive Reporting: Generates PDF/HTML reports with visualizations, timelines, and detailed analysis results.

System Architecture

The system follows a modular architecture with four main modules working sequentially:

  • GPS Signal Log Processing Module: Processes raw GPS logs
  • Feature Extraction & Statistical Analysis Module: Extracts mathematical features
  • ML-Based Detection Engine: Detects anomalies using machine learning
  • Forensic Visualization & Reporting Module: Generates visual reports

GPS Signal Log Processing Module

Processes raw GPS logs and prepares them for analysis

⬇️

Feature Extraction & Statistical Analysis Module

Extracts mathematical and statistical features from GPS data

⬇️

ML-Based Detection Engine

Uses machine learning to detect spoofing and jamming attacks

⬇️

Forensic Visualization & Reporting Module

Generates visual evidence and forensic reports

Data Flow

The system processes data through the following stages:

  1. Input: Raw GPS log files (TXT, CSV, DAT, LOG, JSON, ZIP)
  2. Processing: Cleaning, normalization, and validation
  3. Feature Engineering: Mathematical feature extraction
  4. ML Analysis: Anomaly detection and classification
  5. Output: Forensic reports with visualizations

Module Details

GPS Signal Log Processing Module

Purpose

Reads raw GPS signal logs and prepares them for ML analysis by cleaning, normalizing, and standardizing GPS-specific fields.

Key Features

  • Log Parsing & Cleaning: Extracts GPS fields (Latitude, Longitude, Altitude, Speed, Timestamp), removes empty/duplicate/corrupted entries, fixes inconsistent time formats
  • GPS Data Normalization: Converts coordinates to decimal degrees, standardizes frequency to 1Hz, applies moving average filters for noise smoothing
  • Integrity Checks: Identifies gaps in GPS timestamps, validates coordinate ranges, detects unrealistic altitude/speed values
  • Supported Formats: TXT, CSV, DAT, LOG, JSON, ZIP (GNSS data archives)

Output

Clean, ML-ready GPS dataset with metadata summary (duration, total points, missing values, file hash)

Feature Extraction & Statistical Analysis Module

Purpose

Extracts mathematical and statistical features from GPS logs that indicate spoofing or jamming behavior.

Feature Categories

Position-Based Features
  • Coordinate jump detection (sudden position changes >100m)
  • GPS drift per second (movement rate)
  • Distance between consecutive GPS points
  • Unrealistic speed or acceleration patterns
Signal-Condition Features
  • Satellite count spikes/drops
  • HDOP/PDOP/VDOP abnormalities (Dilution of Precision metrics)
  • SNR (Signal-to-Noise Ratio) fluctuations via C/N0
Temporal Features
  • Timestamp gaps (missing data periods)
  • Irregular log intervals
  • Out-of-order timestamps (indicating spoofing)
Statistical Feature Engineering
  • Rolling mean/variance calculations
  • Drift variance analysis
  • Speed/heading correlation changes

Output

Feature vector dataset for ML models with visualization graphs (coordinate jumps, GPS drift, speed spikes)

ML-Based Detection Engine

Purpose

Uses machine learning to detect abnormal GPS behavior caused by spoofing (fake coordinates) or jamming (signal disturbance).

ML Approaches

  • Isolation Forest: Unsupervised anomaly detection using random partitioning
  • One-Class SVM: Identifies abnormal GPS paths by learning normal behavior boundaries
  • LSTM Autoencoder: Time-series anomaly detection using neural networks (optional)

Training Data

✓ All models are trained on REAL GPS DATA only

  • Training data source: User-uploaded GPS log files from data/processed/ directory
  • Data type: Real GPS telemetry logs (NO synthetic data generated)
  • Training files: All processed CSV files in the processed directory
  • Models learn normal GPS behavior patterns from actual flight data

Model Performance Metrics

The system provides comprehensive performance evaluation for each model:

  • Isolation Forest Metrics: Anomaly detection rate, mean/std scores, contamination parameter
  • One-Class SVM Metrics: Anomaly detection rate, mean/std scores, nu parameter
  • LSTM Autoencoder Metrics: Mean reconstruction error, error variance, median error
  • Metrics are displayed after training and included in forensic reports

Spoofing Detection

Detects patterns indicating GPS spoofing:

  • Sudden fake GPS jumps (unrealistic position changes)
  • Unrealistic direction changes (impossible turns)
  • Fabricated coordinates inconsistent with movement
  • Mismatching drift + timestamp patterns

Jamming Detection

Identifies patterns indicating signal jamming:

  • GPS instability (location fluctuations)
  • Timestamp delays caused by signal loss
  • Sudden stops in GPS updates
  • High drift variance over short intervals

Model Output

  • Anomaly Score (0-1): Probability of anomaly (higher = more suspicious)
  • Classification: Normal / Spoofed / Jammed
  • Severity Levels: None / Low / Medium / High / Critical
  • Exact Timestamps: When anomalies occurred
  • Feature Explanations: Which features triggered the classification
  • Model Performance Metrics: Evaluation scores and accuracy metrics for each model

Forensic Visualization & Reporting Module

Purpose

Generates visual forensic evidence and official reports summarizing detected GPS spoofing/jamming events.

Visualizations

  • Flight Path Analysis: Plot normal path vs spoofed path with GPS jumps highlighted in red
  • Interactive Map (HTML reports): Leaflet.js + OpenStreetMap view showing the full flight path (green) and highlighting spoofed points (red) and jammed points (yellow) with detailed popups
  • GPS Metrics Charts: Drift-per-second chart, speed vs time graph, anomaly score timeline
  • Feature Graphs: Coordinate jumps, GPS drift, speed spikes visualization

Forensic Report Generation

  • Automated Reports: PDF and HTML formats
  • Timeline of Anomalies: Chronological list of detected events
  • Severity Scoring: Risk assessment for each event
  • Chain-of-Custody Summary: File hashes and processing metadata
  • Suspicious GPS Segments Table: Detailed breakdown of anomalies

System Workflow

Step 1: Process GPS Logs

  1. Click "PROCESS LOGS" button in GPS Signal Log Processing tab
  2. Select your GPS log file (supports TXT, CSV, DAT, LOG, JSON, ZIP)
  3. System processes the file:
    • Computes SHA-256 hash for integrity
    • Parses and extracts GPS fields
    • Cleans and removes corrupted entries
    • Normalizes coordinates and timestamps
    • Applies frequency standardization (1Hz)
    • Validates data integrity
  4. Clean dataset is saved and passed to Feature Extraction module

Step 2: Extract Features

  1. Switch to Feature Extraction tab (data automatically received from GPS Signal Log Processing)
  2. Click "EXTRACT FEATURES" button
  3. System extracts:
    • Position-based features (jumps, drift, distances)
    • Signal-condition features (satellite count, HDOP, SNR)
    • Temporal features (gaps, intervals)
    • Statistical features (rolling means/variances)
  4. Feature dataset is saved and visualization graphs are generated
  5. Features are passed to ML Detection Engine

Step 3: Run ML Analysis

  1. Switch to ML Detection Engine tab (features automatically received from Feature Extraction)
  2. First Time: Click "TRAIN MODELS" to train ML models on your real GPS data
  3. Training information displayed:
    • Data type: REAL GPS DATA (no synthetic data)
    • Number of training files used
    • Total training samples
    • Model performance metrics (anomaly detection rates, scores, etc.)
  4. Select ML methods (Isolation Forest, One-Class SVM, LSTM Autoencoder)
  5. Click "RUN ANALYSIS" button
  6. System performs:
    • Anomaly detection using selected ML models
    • Classification (Normal/Spoofed/Jammed)
    • Severity assessment
    • Feature explanation generation
    • Model evaluation metrics calculation
  7. Results are passed to Forensic Reporting module

Step 4: Generate Reports

  1. Switch to Forensic Reporting tab (analysis results automatically received from ML Detection Engine)
  2. Click "GENERATE REPORT" or choose format (PDF/HTML)
  3. System generates:
    • Executive Summary: Total points, anomaly counts, anomaly rate
    • Interactive GPS Map (HTML only): Leaflet.js map showing full flight path (green line) with spoofed (red) and jammed (yellow) markers with detailed popups
    • Model Performance Metrics: Evaluation metrics for each trained model (detection rates, scores, etc.)
    • Training Data Information: Data source, file count, sample count, confirmation of real GPS data
    • Visual Flight Path Analysis: Static charts and plots
    • GPS Metrics Charts: Drift, speed, anomaly score timelines
    • Anomaly Timeline: Chronological list of detected events
    • Forensic Report: Complete chain-of-custody documentation
  4. Report is saved in the reports/ directory

Usage Guide

Installation

# Install dependencies pip install -r requirements.txt # Run the application python main.py

Quick Start

  1. Launch Application: Run python main.py
  2. GPS Signal Log Processing: Select and process your GPS log file
  3. Feature Extraction: Extract features from processed data
  4. ML Detection Engine: Train models (first time) and run analysis
  5. Forensic Reporting: Generate and view forensic reports

Supported File Formats

Format Description Use Case
TXT / LOG / DAT Plain text GPS logs Standard GPS log files
CSV Comma-separated values Spreadsheet-formatted GPS data (including decoded UAV flight logs such as DJI flight records)
JSON JavaScript Object Notation Structured GPS data (GNSS format)
ZIP Compressed archive GNSS dataset archives (NAV-PVT, NAV-DOP files)

Required GPS Data Fields

The system expects the following fields in GPS logs:

  • Latitude: GPS latitude coordinate
  • Longitude: GPS longitude coordinate
  • Timestamp: Time of GPS reading
  • Altitude: (Optional) Height above sea level
  • Speed: (Optional) Vehicle speed
  • Heading: (Optional) Direction of movement
  • Satellite Count: (Optional) Number of visible satellites
  • HDOP/PDOP/VDOP: (Optional) Dilution of Precision metrics

Performance & Large Datasets

Memory Optimization: The system includes advanced memory management for large GPS log files:

  • Chunked Processing: Large CSV files are read in chunks (5,000 rows at a time)
  • Row Limiting: Default maximum of 10,000 rows per file (configurable) for memory efficiency
  • Optimized Data Types: Uses float32 instead of float64 (50% memory savings)
  • Vectorized Operations: Fast distance calculations using Haversine formula
  • Automatic Sampling: Maps show up to 2,000 normal GPS points (all anomalies always shown)
  • Encoding Detection: Automatic detection of file encoding (UTF-8, Latin-1, etc.) for compatibility

Model Training & Evaluation

Training Process:

  • Models are trained on REAL GPS data from processed log files
  • Training data comes from all files in data/processed/processed_*.csv
  • No synthetic or dummy data is ever generated
  • Performance metrics are automatically calculated and displayed after training
  • Evaluation metrics include anomaly detection rates, score statistics, and model confidence

Performance & Large Datasets

Memory Optimization: The system includes advanced memory management for large GPS log files:

  • Chunked Processing: Large CSV files are read in chunks (5,000 rows at a time)
  • Row Limiting: Default maximum of 10,000 rows per file (configurable) for memory efficiency
  • Optimized Data Types: Uses float32 instead of float64 (50% memory savings)
  • Vectorized Operations: Fast distance calculations using Haversine formula (100x+ faster than loops)
  • Automatic Sampling: Maps show up to 2,000 normal GPS points (all anomalies always shown)
  • Encoding Detection: Automatic detection of file encoding (UTF-8, Latin-1, CP1252, etc.) for compatibility
Note: At minimum, Latitude, Longitude, and Timestamp are required. Other fields enhance detection accuracy but are optional.

Technical Details

Forensic & Security Principles

  • Read-Only Processing: Original files are never modified
  • SHA-256 Hashing: All inputs and outputs are hashed for integrity verification
  • Chain-of-Custody: Complete audit trail of all processing actions
  • Forensic-Grade Reporting: Courtroom-admissible documentation

Machine Learning Models

Training Data: All models are trained on REAL GPS DATA only. Training data comes from processed GPS log files in the data/processed/ directory. No synthetic or dummy data is ever generated.

Model Type Purpose Best For Performance Metrics
Isolation Forest Unsupervised Anomaly Detection General anomaly detection, fast processing Anomaly detection rate, mean/std scores, contamination parameter
One-Class SVM Unsupervised Boundary Learning Learning normal GPS behavior patterns Anomaly detection rate, mean/std scores, nu parameter
LSTM Autoencoder Unsupervised (Neural Network) Time-Series Anomaly Complex temporal pattern detection (optional) Mean reconstruction error, error variance, median error

Why All Models Are Unsupervised

All three ML models in this system use unsupervised learning approaches. This design choice is intentional and essential for GPS spoofing/jamming detection:

  • No Labeled Attack Data Required: Unsupervised models don't need pre-labeled "spoofed" or "jammed" examples. They learn normal GPS behavior patterns from clean, real GPS data.
  • Real-World Practicality: Collecting labeled attack examples in real-world scenarios is difficult, expensive, and may not cover all attack types. Unsupervised models adapt to detect any deviation from normal behavior.
  • Pattern Recognition: The models learn what "normal" GPS telemetry looks like (typical coordinate movements, speed patterns, signal quality). Anything that deviates significantly from these learned patterns is flagged as anomalous.
  • Zero-Day Attack Detection: Since the models detect anomalies rather than known attack signatures, they can potentially identify new or previously unseen attack methods.
  • Self-Learning: The system can continuously improve by training on new GPS data without requiring manual labeling of attack examples.

How It Works: During training, models are fed only normal GPS data (no labels). They build internal representations of normal patterns. During detection, any data point that significantly deviates from these learned patterns receives a high anomaly score and is classified as spoofed or jammed based on the specific features that triggered the anomaly.

Note: Model performance metrics are automatically calculated during training and included in forensic reports. These metrics help assess model accuracy and detection capabilities on your specific GPS data.

Classification Logic

The system classifies GPS data points as:

  • Normal: Anomaly score < 0.7, no suspicious features detected
  • Spoofed: High anomaly score with coordinate jumps, unrealistic speed/turns, or fabricated coordinates
  • Jammed: High anomaly score with signal issues (satellite drops, HDOP spikes, timestamp gaps, drift variance)

Severity Levels

Severity Anomaly Score Range Description
None < 0.3 Normal GPS behavior
Low 0.3 - 0.5 Minor anomalies, likely environmental
Medium 0.5 - 0.75 Moderate anomalies, requires investigation
High 0.75 - 0.9 Strong indicators of attack
Critical ≥ 0.9 Severe attack detected

Technology Stack

Python 3.8+ PySide6 scikit-learn TensorFlow pandas numpy matplotlib reportlab geopy

Data Processing Pipeline

  1. Input Validation: Check file format and required fields
  2. Hash Calculation: Compute SHA-256 hash for integrity
  3. Parsing: Extract GPS data based on file format
  4. Cleaning: Remove duplicates, corrupted entries, outliers
  5. Normalization: Convert coordinates, standardize timestamps, resample to 1Hz
  6. Smoothing: Apply moving average filters (5-point window)
  7. Validation: Check coordinate ranges, speed limits, altitude bounds
  8. Feature Extraction: Calculate mathematical and statistical features
  9. ML Analysis: Run anomaly detection models
  10. Classification: Assign labels and severity levels
  11. Reporting: Generate visualizations and forensic reports

Output Files Structure

data/ ├── processed/ # Cleaned GPS datasets (CSV) ├── features/ # Extracted features (CSV) │ └── graphs/ # Feature visualization graphs └── models/ # Trained ML models (.pkl, .h5) reports/ ├── visualizations/ # Generated charts and graphs ├── *.pdf # PDF forensic reports └── *.html # HTML forensic reports

Code Walkthrough (What to Study)

Goal: This section tells you exactly which files to read, the key classes/functions, and the exact line ranges where they are implemented.

Tip: Study in the order below — it matches the runtime pipeline of the application.

1) App Entry + Module Wiring

File Line Range(s) What to Study / What It Does
main.py L11-L69 (MainWindow tabs + signal wiring)
L70-L74 (app start)
Creates the 4-tab GUI and wires the pipeline: GPS Log Processing → Feature Extraction → ML Detection → Forensic Reporting.

2) GPS Signal Log Processing (Input + Cleaning + Hashing)

File Line Range(s) What to Study / What It Does
components/gps_log_processing.py L11-L27 (ProcessingThread)
L176-L193 (file picker + start thread)
L194-L203 (emit processed data)
GUI tab that lets you select a log file and processes it in a background thread. Emits features_ready with the processed dataframe + metadata.
utils/log_processor.py L21-L26 (SHA-256 hashing)
L28-L93 (main file router + row limits)
L155-L199 (ZIP processing without extraction)
L201-L265 (CSV chunked loading + dtype optimization)
L267-L288 (encoding detection)
L326-L356 (normalization + resample/smoothing calls)
L358-L383 (frequency standardization)
L385-L396 (moving average smoothing)
L402-L420 (metadata summary)
Core ingestion pipeline: detects file type (CSV/JSON/TXT/ZIP), reads safely (encoding + chunking), enforces row limits (max_rows), cleans/normalizes/validates, saves to data/processed/processed_*.csv, and returns metadata (duration, missing values, hash).

3) Feature Extraction & Statistical Analysis

File Line Range(s) What to Study / What It Does
components/feature_extraction.py L10-L29 (FeatureExtractionThread)
L176-L178 (receives processed dataframe)
L180-L194 (start extraction)
L196-L204 (emit extracted features)
GUI tab that receives processed GPS data and launches feature extraction in a background thread. Emits features_ready containing feature_df.
utils/feature_extractor.py L30-L87 (extract_features: builds feature_df + saves CSV)
L89-L163 (generate_feature_graphs)
L165-L207 (position features: Haversine distance, jumps, drift)
L209-L231 (signal features: satellite/HDOP/PDOP/VDOP flags)
L233-L241 (temporal features: gaps/irregular intervals)
L243-L253 (statistical features: rolling means/variances)
Converts GPS telemetry into ML features. Also preserves lat/lon for mapping and saves data/features/extracted_features.csv and graph images in data/features/graphs/.

4) ML Training + ML Inference (Where the Model Trains)

Important about “accuracy”: These models are unsupervised. Without ground-truth labels (Normal/Spoofed/Jammed) you cannot compute supervised accuracy/precision/recall. The system instead reports performance metrics such as anomaly rate and score statistics.

If you want true accuracy: you need a labeled dataset and then compute confusion matrix / precision / recall on held-out labeled data.

File Line Range(s) What to Study / What It Does
components/ml_detection.py L10-L35 (MLDetectionThread)
L203-L260 (TRAIN MODELS + display metrics + training info)
L261-L280 (RUN ANALYSIS)
L282-L299 (emit analysis results)
GUI tab for training and inference. Training metrics + training-data info are displayed after training.
utils/ml_detector.py L36-L120 (train_models: trains IF + SVM + optional LSTM)
L152-L204 (evaluate_models: metrics computation)
L206-L236 (load_models: reads data/models/)
L237-L253 (save_models: writes data/models/)
L254-L481 (detect_anomalies: inference + classification + evaluation_metrics)
L507-L539 (load_training_data: dataset loader from data/processed/processed_*.csv)
Where training happens: train_models() fits scaler + Isolation Forest + One-Class SVM (+ optional LSTM).

Which dataset is used to train:
  • In the GUI: training uses the current run’s feature_df passed from Feature Extraction (real GPS data).
  • Fallback loader: load_training_data() loads data/processed/processed_*.csv (all processed logs) if training is called without a dataframe.
Key outputs: returns results_df with anomaly_score, classification, severity, and evaluation_metrics.

5) Visualization + Reporting (PDF/HTML + Leaflet Map)

File Line Range(s) What to Study / What It Does
components/forensic_reporting.py L10-L29 (ReportGenerationThread)
L195-L209 (generate_report + threading)
GUI tab that generates PDF/HTML reports from analysis_results.
utils/report_generator.py L34-L42 (generate_report router)
L44-L166 (PDF report)
L168-L478 (HTML report: includes Leaflet map + metrics table)
L480-L482 (compute_report_hash)
Builds the forensic report. The HTML report embeds:
  • Executive summary (counts + anomaly rate)
  • Model performance metrics table (from evaluation_metrics)
  • Leaflet.js map showing path + jam/spoof markers
  • Charts + anomaly timeline + chain-of-custody hash
utils/visualization.py L14-L57 (flight path plot)
L59-L92 (drift chart)
L94-L127 (speed chart)
L129-L169 (anomaly score chart)
L171-L186 (generate_all_visualizations)
Generates the static PNG charts saved to reports/visualizations/ and referenced by reports.

6) Dataset ZIP Inspection (Without Extraction)

File Line Range(s) What to Study / What It Does
utils/visualize_dataset.py L16-L63 (ZIP stats analyzer)
L135-L327 (HTML summary report generator)
L328-L366 (visualize_zip_file entrypoint)
L388-L406 (CLI usage)
Lets you inspect what is inside the GNSS dataset ZIP files without extracting. Outputs HTML and charts to reports/dataset_visualizations/.

Datasets (Where the Data Lives)

GNSS Dataset (with Interference and Spoofing) Part I/Raw data/ # ZIP archives GNSS Dataset (with Interference and Spoofing) Part III/Processed data/ # processed dataset ZIPs/JSONs decoded_flightlog.csv # example user-supplied decoded flight log CSV data/processed/ # processed outputs saved by GPSLogProcessor (processed_*.csv) data/features/ # extracted_features.csv + graphs/ data/models/ # trained ML models (isolation_forest.pkl, oneclass_svm.pkl, scaler.pkl, lstm_autoencoder.h5) reports/ # generated PDF/HTML reports + visualization PNGs