Project Overview / Specs

Machine learning is used to predict whether a near-Earth object (NEO) is hazardous, helping prioritize which asteroids may pose a risk to Earth. Early detection and classification is crucial for planetary defense and scientific research. The goal was to maximize accuracy and use data visualization to show which features are most critical for classification.

Data Description

  • ID and Name
  • Estimated Diameter (min/max)
  • Relative Velocity
  • Miss Distance (from Earth)
  • Absolute Magnitude
  • Orbiting Body (Earth)
  • Sentry Object (Boolean)
  • Target Variable: hazardous (Boolean)

Tech Stack & Libraries

  • Python, pandas, numpy
  • matplotlib, seaborn
  • scikit-learn (StandardScaler, LogisticRegression)

Why These Features?

  • Relative velocity: Faster objects are harder to deflect and may cause more damage.
  • Miss distance: Objects passing closer to Earth are more likely to be hazardous.
  • Diameter & magnitude: Larger and brighter objects are easier to detect and may pose greater risk.

Exploratory Data Analysis (EDA)

  • Histograms for diameter, velocity, miss distance, magnitude
  • Split data by hazardous/non-hazardous for comparison
  • Standardized sample sizes to avoid bias
FeatureMedianMeanStd
Estimated Diameter (min)0.0484
Relative Velocity441904806625293
Miss Distance378466793706654622352040
Absolute Magnitude23.723.52.89

Preview

Scatterplot of hazardous and non-hazardous NEOs

Scatterplot showing hazardous (blue) and non-hazardous (red) near-Earth objects by relative velocity and miss distance.

Model & Results

Data Preparation & Modeling

  • Split data by hazardous/non-hazardous for comparative analysis
  • Standardized sample sizes to avoid bias
  • Standardized velocity and miss distance using StandardScaler
  • Train/test split: 80% train, 20% test
  • Logistic Regression model trained on standardized data

Model Parameters

  • Coefficients (velocity, miss distance): [0.69, -0.04]
  • Intercept: 0.12

Results

  • Overall accuracy: 0.70 (70%)
  • Classification report:
    • Precision (False/True): 0.68 / 0.72
    • Recall (False/True): 0.68 / 0.72
    • F1-Score (False/True): 0.68 / 0.72