Jackson Johannessen | CS Portfolio

Project Overview / Specs

Machine learning is used to predict whether a near-Earth object (NEO) is hazardous, helping prioritize which asteroids may pose a risk to Earth. Early detection and classification is crucial for planetary defense and scientific research. The goal was to maximize accuracy and use data visualization to show which features are most critical for classification.

Data Description

ID and Name
Estimated Diameter (min/max)
Relative Velocity
Miss Distance (from Earth)
Absolute Magnitude
Orbiting Body (Earth)
Sentry Object (Boolean)
Target Variable: hazardous (Boolean)

Tech Stack & Libraries

Python, pandas, numpy
matplotlib, seaborn
scikit-learn (StandardScaler, LogisticRegression)

Why These Features?

Relative velocity: Faster objects are harder to deflect and may cause more damage.
Miss distance: Objects passing closer to Earth are more likely to be hazardous.
Diameter & magnitude: Larger and brighter objects are easier to detect and may pose greater risk.

Exploratory Data Analysis (EDA)

Histograms for diameter, velocity, miss distance, magnitude
Split data by hazardous/non-hazardous for comparison
Standardized sample sizes to avoid bias

Feature	Median	Mean	Std
Estimated Diameter (min)	0.0484	—	—
Relative Velocity	44190	48066	25293
Miss Distance	37846679	37066546	22352040
Absolute Magnitude	23.7	23.5	2.89

Preview

Scatterplot of hazardous and non-hazardous NEOs

Scatterplot showing hazardous (blue) and non-hazardous (red) near-Earth objects by relative velocity and miss distance.

Model & Results

Data Preparation & Modeling

Split data by hazardous/non-hazardous for comparative analysis
Standardized sample sizes to avoid bias
Standardized velocity and miss distance using StandardScaler
Train/test split: 80% train, 20% test
Logistic Regression model trained on standardized data

Model Parameters

Coefficients (velocity, miss distance): [0.69, -0.04]
Intercept: 0.12

Results

Overall accuracy: 0.70 (70%)
Classification report:

Precision (False/True): 0.68 / 0.72
Recall (False/True): 0.68 / 0.72
F1-Score (False/True): 0.68 / 0.72