System Overview
The GPS Spoofing & Jamming Detection System is a desktop-based forensic application designed to analyze, detect, and classify GPS spoofing and signal jamming attacks by evaluating GNSS-based telemetry logs extracted from UAV (Unmanned Aerial Vehicle) systems.
Purpose
This system analyzes telemetry-based GNSS signal logs to identify threats such as:
- GPS Spoofing (fake coordinates injection)
- RF Jamming (signal interference)
- Signal Degradation
- Satellite Anomalies
Key Capabilities
System Architecture
The system follows a modular architecture with four main modules working sequentially:
- GPS Signal Log Processing Module: Processes raw GPS logs
- Feature Extraction & Statistical Analysis Module: Extracts mathematical features
- ML-Based Detection Engine: Detects anomalies using machine learning
- Forensic Visualization & Reporting Module: Generates visual reports
GPS Signal Log Processing Module
Processes raw GPS logs and prepares them for analysis
Feature Extraction & Statistical Analysis Module
Extracts mathematical and statistical features from GPS data
ML-Based Detection Engine
Uses machine learning to detect spoofing and jamming attacks
Forensic Visualization & Reporting Module
Generates visual evidence and forensic reports
Data Flow
The system processes data through the following stages:
- Input: Raw GPS log files (TXT, CSV, DAT, LOG, JSON, ZIP)
- Processing: Cleaning, normalization, and validation
- Feature Engineering: Mathematical feature extraction
- ML Analysis: Anomaly detection and classification
- Output: Forensic reports with visualizations
Module Details
GPS Signal Log Processing Module
Purpose
Reads raw GPS signal logs and prepares them for ML analysis by cleaning, normalizing, and standardizing GPS-specific fields.
Key Features
- Log Parsing & Cleaning: Extracts GPS fields (Latitude, Longitude, Altitude, Speed, Timestamp), removes empty/duplicate/corrupted entries, fixes inconsistent time formats
- GPS Data Normalization: Converts coordinates to decimal degrees, standardizes frequency to 1Hz, applies moving average filters for noise smoothing
- Integrity Checks: Identifies gaps in GPS timestamps, validates coordinate ranges, detects unrealistic altitude/speed values
- Supported Formats: TXT, CSV, DAT, LOG, JSON, ZIP (GNSS data archives)
Output
Clean, ML-ready GPS dataset with metadata summary (duration, total points, missing values, file hash)
Feature Extraction & Statistical Analysis Module
Purpose
Extracts mathematical and statistical features from GPS logs that indicate spoofing or jamming behavior.
Feature Categories
Position-Based Features
- Coordinate jump detection (sudden position changes >100m)
- GPS drift per second (movement rate)
- Distance between consecutive GPS points
- Unrealistic speed or acceleration patterns
Signal-Condition Features
- Satellite count spikes/drops
- HDOP/PDOP/VDOP abnormalities (Dilution of Precision metrics)
- SNR (Signal-to-Noise Ratio) fluctuations via C/N0
Temporal Features
- Timestamp gaps (missing data periods)
- Irregular log intervals
- Out-of-order timestamps (indicating spoofing)
Statistical Feature Engineering
- Rolling mean/variance calculations
- Drift variance analysis
- Speed/heading correlation changes
Output
Feature vector dataset for ML models with visualization graphs (coordinate jumps, GPS drift, speed spikes)
ML-Based Detection Engine
Purpose
Uses machine learning to detect abnormal GPS behavior caused by spoofing (fake coordinates) or jamming (signal disturbance).
ML Approaches
- Isolation Forest: Unsupervised anomaly detection using random partitioning
- One-Class SVM: Identifies abnormal GPS paths by learning normal behavior boundaries
- LSTM Autoencoder: Time-series anomaly detection using neural networks (optional)
Training Data
✓ All models are trained on REAL GPS DATA only
- Training data source: User-uploaded GPS log files from
data/processed/directory - Data type: Real GPS telemetry logs (NO synthetic data generated)
- Training files: All processed CSV files in the processed directory
- Models learn normal GPS behavior patterns from actual flight data
Model Performance Metrics
The system provides comprehensive performance evaluation for each model:
- Isolation Forest Metrics: Anomaly detection rate, mean/std scores, contamination parameter
- One-Class SVM Metrics: Anomaly detection rate, mean/std scores, nu parameter
- LSTM Autoencoder Metrics: Mean reconstruction error, error variance, median error
- Metrics are displayed after training and included in forensic reports
Spoofing Detection
Detects patterns indicating GPS spoofing:
- Sudden fake GPS jumps (unrealistic position changes)
- Unrealistic direction changes (impossible turns)
- Fabricated coordinates inconsistent with movement
- Mismatching drift + timestamp patterns
Jamming Detection
Identifies patterns indicating signal jamming:
- GPS instability (location fluctuations)
- Timestamp delays caused by signal loss
- Sudden stops in GPS updates
- High drift variance over short intervals
Model Output
- Anomaly Score (0-1): Probability of anomaly (higher = more suspicious)
- Classification: Normal / Spoofed / Jammed
- Severity Levels: None / Low / Medium / High / Critical
- Exact Timestamps: When anomalies occurred
- Feature Explanations: Which features triggered the classification
- Model Performance Metrics: Evaluation scores and accuracy metrics for each model
Forensic Visualization & Reporting Module
Purpose
Generates visual forensic evidence and official reports summarizing detected GPS spoofing/jamming events.
Visualizations
- Flight Path Analysis: Plot normal path vs spoofed path with GPS jumps highlighted in red
- Interactive Map (HTML reports): Leaflet.js + OpenStreetMap view showing the full flight path (green) and highlighting spoofed points (red) and jammed points (yellow) with detailed popups
- GPS Metrics Charts: Drift-per-second chart, speed vs time graph, anomaly score timeline
- Feature Graphs: Coordinate jumps, GPS drift, speed spikes visualization
Forensic Report Generation
- Automated Reports: PDF and HTML formats
- Timeline of Anomalies: Chronological list of detected events
- Severity Scoring: Risk assessment for each event
- Chain-of-Custody Summary: File hashes and processing metadata
- Suspicious GPS Segments Table: Detailed breakdown of anomalies
System Workflow
Step 1: Process GPS Logs
- Click "PROCESS LOGS" button in GPS Signal Log Processing tab
- Select your GPS log file (supports TXT, CSV, DAT, LOG, JSON, ZIP)
- System processes the file:
- Computes SHA-256 hash for integrity
- Parses and extracts GPS fields
- Cleans and removes corrupted entries
- Normalizes coordinates and timestamps
- Applies frequency standardization (1Hz)
- Validates data integrity
- Clean dataset is saved and passed to Feature Extraction module
Step 2: Extract Features
- Switch to Feature Extraction tab (data automatically received from GPS Signal Log Processing)
- Click "EXTRACT FEATURES" button
- System extracts:
- Position-based features (jumps, drift, distances)
- Signal-condition features (satellite count, HDOP, SNR)
- Temporal features (gaps, intervals)
- Statistical features (rolling means/variances)
- Feature dataset is saved and visualization graphs are generated
- Features are passed to ML Detection Engine
Step 3: Run ML Analysis
- Switch to ML Detection Engine tab (features automatically received from Feature Extraction)
- First Time: Click "TRAIN MODELS" to train ML models on your real GPS data
- Training information displayed:
- Data type: REAL GPS DATA (no synthetic data)
- Number of training files used
- Total training samples
- Model performance metrics (anomaly detection rates, scores, etc.)
- Select ML methods (Isolation Forest, One-Class SVM, LSTM Autoencoder)
- Click "RUN ANALYSIS" button
- System performs:
- Anomaly detection using selected ML models
- Classification (Normal/Spoofed/Jammed)
- Severity assessment
- Feature explanation generation
- Model evaluation metrics calculation
- Results are passed to Forensic Reporting module
Step 4: Generate Reports
- Switch to Forensic Reporting tab (analysis results automatically received from ML Detection Engine)
- Click "GENERATE REPORT" or choose format (PDF/HTML)
- System generates:
- Executive Summary: Total points, anomaly counts, anomaly rate
- Interactive GPS Map (HTML only): Leaflet.js map showing full flight path (green line) with spoofed (red) and jammed (yellow) markers with detailed popups
- Model Performance Metrics: Evaluation metrics for each trained model (detection rates, scores, etc.)
- Training Data Information: Data source, file count, sample count, confirmation of real GPS data
- Visual Flight Path Analysis: Static charts and plots
- GPS Metrics Charts: Drift, speed, anomaly score timelines
- Anomaly Timeline: Chronological list of detected events
- Forensic Report: Complete chain-of-custody documentation
- Report is saved in the
reports/directory
Usage Guide
Installation
Quick Start
- Launch Application: Run
python main.py - GPS Signal Log Processing: Select and process your GPS log file
- Feature Extraction: Extract features from processed data
- ML Detection Engine: Train models (first time) and run analysis
- Forensic Reporting: Generate and view forensic reports
Supported File Formats
| Format | Description | Use Case |
|---|---|---|
| TXT / LOG / DAT | Plain text GPS logs | Standard GPS log files |
| CSV | Comma-separated values | Spreadsheet-formatted GPS data (including decoded UAV flight logs such as DJI flight records) |
| JSON | JavaScript Object Notation | Structured GPS data (GNSS format) |
| ZIP | Compressed archive | GNSS dataset archives (NAV-PVT, NAV-DOP files) |
Required GPS Data Fields
The system expects the following fields in GPS logs:
- Latitude: GPS latitude coordinate
- Longitude: GPS longitude coordinate
- Timestamp: Time of GPS reading
- Altitude: (Optional) Height above sea level
- Speed: (Optional) Vehicle speed
- Heading: (Optional) Direction of movement
- Satellite Count: (Optional) Number of visible satellites
- HDOP/PDOP/VDOP: (Optional) Dilution of Precision metrics
Performance & Large Datasets
Memory Optimization: The system includes advanced memory management for large GPS log files:
- Chunked Processing: Large CSV files are read in chunks (5,000 rows at a time)
- Row Limiting: Default maximum of 10,000 rows per file (configurable) for memory efficiency
- Optimized Data Types: Uses float32 instead of float64 (50% memory savings)
- Vectorized Operations: Fast distance calculations using Haversine formula
- Automatic Sampling: Maps show up to 2,000 normal GPS points (all anomalies always shown)
- Encoding Detection: Automatic detection of file encoding (UTF-8, Latin-1, etc.) for compatibility
Model Training & Evaluation
Training Process:
- Models are trained on REAL GPS data from processed log files
- Training data comes from all files in
data/processed/processed_*.csv - No synthetic or dummy data is ever generated
- Performance metrics are automatically calculated and displayed after training
- Evaluation metrics include anomaly detection rates, score statistics, and model confidence
Performance & Large Datasets
Memory Optimization: The system includes advanced memory management for large GPS log files:
- Chunked Processing: Large CSV files are read in chunks (5,000 rows at a time)
- Row Limiting: Default maximum of 10,000 rows per file (configurable) for memory efficiency
- Optimized Data Types: Uses float32 instead of float64 (50% memory savings)
- Vectorized Operations: Fast distance calculations using Haversine formula (100x+ faster than loops)
- Automatic Sampling: Maps show up to 2,000 normal GPS points (all anomalies always shown)
- Encoding Detection: Automatic detection of file encoding (UTF-8, Latin-1, CP1252, etc.) for compatibility
Technical Details
Forensic & Security Principles
- Read-Only Processing: Original files are never modified
- SHA-256 Hashing: All inputs and outputs are hashed for integrity verification
- Chain-of-Custody: Complete audit trail of all processing actions
- Forensic-Grade Reporting: Courtroom-admissible documentation
Machine Learning Models
Training Data: All models are trained on REAL GPS DATA only. Training data comes from processed GPS log files in the data/processed/ directory. No synthetic or dummy data is ever generated.
| Model | Type | Purpose | Best For | Performance Metrics |
|---|---|---|---|---|
| Isolation Forest | Unsupervised | Anomaly Detection | General anomaly detection, fast processing | Anomaly detection rate, mean/std scores, contamination parameter |
| One-Class SVM | Unsupervised | Boundary Learning | Learning normal GPS behavior patterns | Anomaly detection rate, mean/std scores, nu parameter |
| LSTM Autoencoder | Unsupervised (Neural Network) | Time-Series Anomaly | Complex temporal pattern detection (optional) | Mean reconstruction error, error variance, median error |
Why All Models Are Unsupervised
All three ML models in this system use unsupervised learning approaches. This design choice is intentional and essential for GPS spoofing/jamming detection:
- No Labeled Attack Data Required: Unsupervised models don't need pre-labeled "spoofed" or "jammed" examples. They learn normal GPS behavior patterns from clean, real GPS data.
- Real-World Practicality: Collecting labeled attack examples in real-world scenarios is difficult, expensive, and may not cover all attack types. Unsupervised models adapt to detect any deviation from normal behavior.
- Pattern Recognition: The models learn what "normal" GPS telemetry looks like (typical coordinate movements, speed patterns, signal quality). Anything that deviates significantly from these learned patterns is flagged as anomalous.
- Zero-Day Attack Detection: Since the models detect anomalies rather than known attack signatures, they can potentially identify new or previously unseen attack methods.
- Self-Learning: The system can continuously improve by training on new GPS data without requiring manual labeling of attack examples.
How It Works: During training, models are fed only normal GPS data (no labels). They build internal representations of normal patterns. During detection, any data point that significantly deviates from these learned patterns receives a high anomaly score and is classified as spoofed or jammed based on the specific features that triggered the anomaly.
Note: Model performance metrics are automatically calculated during training and included in forensic reports. These metrics help assess model accuracy and detection capabilities on your specific GPS data.
Classification Logic
The system classifies GPS data points as:
- Normal: Anomaly score < 0.7, no suspicious features detected
- Spoofed: High anomaly score with coordinate jumps, unrealistic speed/turns, or fabricated coordinates
- Jammed: High anomaly score with signal issues (satellite drops, HDOP spikes, timestamp gaps, drift variance)
Severity Levels
| Severity | Anomaly Score Range | Description |
|---|---|---|
| None | < 0.3 | Normal GPS behavior |
| Low | 0.3 - 0.5 | Minor anomalies, likely environmental |
| Medium | 0.5 - 0.75 | Moderate anomalies, requires investigation |
| High | 0.75 - 0.9 | Strong indicators of attack |
| Critical | ≥ 0.9 | Severe attack detected |
Technology Stack
Data Processing Pipeline
- Input Validation: Check file format and required fields
- Hash Calculation: Compute SHA-256 hash for integrity
- Parsing: Extract GPS data based on file format
- Cleaning: Remove duplicates, corrupted entries, outliers
- Normalization: Convert coordinates, standardize timestamps, resample to 1Hz
- Smoothing: Apply moving average filters (5-point window)
- Validation: Check coordinate ranges, speed limits, altitude bounds
- Feature Extraction: Calculate mathematical and statistical features
- ML Analysis: Run anomaly detection models
- Classification: Assign labels and severity levels
- Reporting: Generate visualizations and forensic reports
Output Files Structure
Code Walkthrough (What to Study)
Goal: This section tells you exactly which files to read, the key classes/functions, and the exact line ranges where they are implemented.
Tip: Study in the order below — it matches the runtime pipeline of the application.
1) App Entry + Module Wiring
| File | Line Range(s) | What to Study / What It Does |
|---|---|---|
main.py |
L11-L69 (MainWindow tabs + signal wiring)L70-L74 (app start)
|
Creates the 4-tab GUI and wires the pipeline: GPS Log Processing → Feature Extraction → ML Detection → Forensic Reporting. |
2) GPS Signal Log Processing (Input + Cleaning + Hashing)
| File | Line Range(s) | What to Study / What It Does |
|---|---|---|
components/gps_log_processing.py |
L11-L27 (ProcessingThread)L176-L193 (file picker + start thread)L194-L203 (emit processed data)
|
GUI tab that lets you select a log file and processes it in a background thread.
Emits features_ready with the processed dataframe + metadata.
|
utils/log_processor.py |
L21-L26 (SHA-256 hashing)L28-L93 (main file router + row limits)L155-L199 (ZIP processing without extraction)L201-L265 (CSV chunked loading + dtype optimization)L267-L288 (encoding detection)L326-L356 (normalization + resample/smoothing calls)L358-L383 (frequency standardization)L385-L396 (moving average smoothing)L402-L420 (metadata summary)
|
Core ingestion pipeline:
detects file type (CSV/JSON/TXT/ZIP), reads safely (encoding + chunking),
enforces row limits (max_rows), cleans/normalizes/validates, saves to
data/processed/processed_*.csv, and returns metadata (duration, missing values, hash).
|
3) Feature Extraction & Statistical Analysis
| File | Line Range(s) | What to Study / What It Does |
|---|---|---|
components/feature_extraction.py |
L10-L29 (FeatureExtractionThread)L176-L178 (receives processed dataframe)L180-L194 (start extraction)L196-L204 (emit extracted features)
|
GUI tab that receives processed GPS data and launches feature extraction in a background thread.
Emits features_ready containing feature_df.
|
utils/feature_extractor.py |
L30-L87 (extract_features: builds feature_df + saves CSV)L89-L163 (generate_feature_graphs)L165-L207 (position features: Haversine distance, jumps, drift)L209-L231 (signal features: satellite/HDOP/PDOP/VDOP flags)L233-L241 (temporal features: gaps/irregular intervals)L243-L253 (statistical features: rolling means/variances)
|
Converts GPS telemetry into ML features. Also preserves lat/lon for mapping and saves
data/features/extracted_features.csv and graph images in data/features/graphs/.
|
4) ML Training + ML Inference (Where the Model Trains)
Important about “accuracy”: These models are unsupervised. Without ground-truth labels (Normal/Spoofed/Jammed) you cannot compute supervised accuracy/precision/recall. The system instead reports performance metrics such as anomaly rate and score statistics.
If you want true accuracy: you need a labeled dataset and then compute confusion matrix / precision / recall on held-out labeled data.
| File | Line Range(s) | What to Study / What It Does |
|---|---|---|
components/ml_detection.py |
L10-L35 (MLDetectionThread)L203-L260 (TRAIN MODELS + display metrics + training info)L261-L280 (RUN ANALYSIS)L282-L299 (emit analysis results)
|
GUI tab for training and inference. Training metrics + training-data info are displayed after training. |
utils/ml_detector.py |
L36-L120 (train_models: trains IF + SVM + optional LSTM)L152-L204 (evaluate_models: metrics computation)L206-L236 (load_models: reads data/models/)L237-L253 (save_models: writes data/models/)L254-L481 (detect_anomalies: inference + classification + evaluation_metrics)L507-L539 (load_training_data: dataset loader from data/processed/processed_*.csv)
|
Where training happens: train_models() fits scaler + Isolation Forest + One-Class SVM (+ optional LSTM).
Which dataset is used to train:
results_df with anomaly_score, classification, severity, and evaluation_metrics.
|
5) Visualization + Reporting (PDF/HTML + Leaflet Map)
| File | Line Range(s) | What to Study / What It Does |
|---|---|---|
components/forensic_reporting.py |
L10-L29 (ReportGenerationThread)L195-L209 (generate_report + threading)
|
GUI tab that generates PDF/HTML reports from analysis_results.
|
utils/report_generator.py |
L34-L42 (generate_report router)L44-L166 (PDF report)L168-L478 (HTML report: includes Leaflet map + metrics table)L480-L482 (compute_report_hash)
|
Builds the forensic report. The HTML report embeds:
|
utils/visualization.py |
L14-L57 (flight path plot)L59-L92 (drift chart)L94-L127 (speed chart)L129-L169 (anomaly score chart)L171-L186 (generate_all_visualizations)
|
Generates the static PNG charts saved to reports/visualizations/ and referenced by reports.
|
6) Dataset ZIP Inspection (Without Extraction)
| File | Line Range(s) | What to Study / What It Does |
|---|---|---|
utils/visualize_dataset.py |
L16-L63 (ZIP stats analyzer)L135-L327 (HTML summary report generator)L328-L366 (visualize_zip_file entrypoint)L388-L406 (CLI usage)
|
Lets you inspect what is inside the GNSS dataset ZIP files without extracting.
Outputs HTML and charts to reports/dataset_visualizations/.
|