Project Overview / Specs
Machine learning is used to predict whether a near-Earth object (NEO) is hazardous, helping prioritize which asteroids may pose a risk to Earth. Early detection and classification is crucial for planetary defense and scientific research. The goal was to maximize accuracy and use data visualization to show which features are most critical for classification.
Data Description
- ID and Name
- Estimated Diameter (min/max)
- Relative Velocity
- Miss Distance (from Earth)
- Absolute Magnitude
- Orbiting Body (Earth)
- Sentry Object (Boolean)
- Target Variable: hazardous (Boolean)
Tech Stack & Libraries
- Python, pandas, numpy
- matplotlib, seaborn
- scikit-learn (StandardScaler, LogisticRegression)
Why These Features?
- Relative velocity: Faster objects are harder to deflect and may cause more damage.
- Miss distance: Objects passing closer to Earth are more likely to be hazardous.
- Diameter & magnitude: Larger and brighter objects are easier to detect and may pose greater risk.
Exploratory Data Analysis (EDA)
- Histograms for diameter, velocity, miss distance, magnitude
- Split data by hazardous/non-hazardous for comparison
- Standardized sample sizes to avoid bias
| Feature | Median | Mean | Std |
|---|---|---|---|
| Estimated Diameter (min) | 0.0484 | — | — |
| Relative Velocity | 44190 | 48066 | 25293 |
| Miss Distance | 37846679 | 37066546 | 22352040 |
| Absolute Magnitude | 23.7 | 23.5 | 2.89 |
Preview

Scatterplot showing hazardous (blue) and non-hazardous (red) near-Earth objects by relative velocity and miss distance.
Model & Results
Data Preparation & Modeling
- Split data by hazardous/non-hazardous for comparative analysis
- Standardized sample sizes to avoid bias
- Standardized velocity and miss distance using StandardScaler
- Train/test split: 80% train, 20% test
- Logistic Regression model trained on standardized data
Model Parameters
- Coefficients (velocity, miss distance): [0.69, -0.04]
- Intercept: 0.12
Results
- Overall accuracy: 0.70 (70%)
- Classification report:
- Precision (False/True): 0.68 / 0.72
- Recall (False/True): 0.68 / 0.72
- F1-Score (False/True): 0.68 / 0.72