☄️ Hazardous Asteroid Classification - NASA JPL Asteroid by Alexander D. Rios

Abstract

The problem we address is identifying potentially hazardous asteroids for planet Earth. The scope of this project can be interesting for various entities, ranging from astronomical research centers to aerospace agencies.

The motivation behind this project arises from the difficulty I believe exists in tackling such a topic, which allows me to apply all my mathematical, physical, and statistical knowledge for data study. I recognize that it is one of the most complicated projects I have undertaken due to my limited knowledge in the field of astronomy, but I believe I have the necessary tools to deal with some of the challenges this type of project imposes.

The first hypothesis I can formulate is that asteroids with orbits closer to Earth’s orbit have higher chances of representing a risk for the planet than those further away, and that this is independent of their size.

To verify my hypothesis, I used various methods, both qualitative and quantitative, as well as different designs. Some of the main ones were visualizations like boxplots and statistical descriptions of the data. Key visualizations included boxplots of the H and moid variables, as well as visualizations of orbits close to Earth and the neo vs. moid histogram.

Some of the extensions of my work were calculating the orbital paths of the asteroids through 5 orbital elements or Keplerian elements, the approximate calculation of the asteroid diameters, since this and some other data (prefix, name, albedo, diameter_sigma, and diameter) had to be excluded from the study due to missing data, i.e., they contained too many null values making them unusable columns. I also had to discard 25,637 records due to extreme outliers, which skewed the useful data. These were discarded considering the vast amount of data available (1M records), but I took care not to unbalance the data more than it already was. On the other hand, I filled in the missing H data with the mean because they are homogeneous values.

Conclusion: Based on the tools used so far, I concluded that the danger of an asteroid lies in the possibility of it intercepting Earth’s orbit or not. This means that asteroids with orbits close to Earth’s orbit are potentially hazardous, while those with orbits farther away may also be, but with lower probabilities. I also demonstrated that their size does not contribute to their potential danger unless they can intercept Earth’s orbit. Clearly, if a massive asteroid were to intercept Earth, it would cause greater damage than a less massive one.

Objective

Throughout human history, there have been countless discussions about the end of the world. One of the main and most plausible causes is the impact of an asteroid. Such an event could be so catastrophic that it threatens to wipe out all existing life on planet Earth.

But from our position as data scientists, what can we do? To answer this question, we have access to a dataset containing information about various asteroids known to humanity. This dataset describes specific physical and temporal characteristics of these asteroids. Based on this, several questions arise:

Is there a pattern that allows us to identify potentially hazardous asteroids?

What are the probabilities of an asteroid colliding with Earth in the coming years?

Business context

An observatory in Argentina has detected several asteroids near Earth's orbit. Additionally, it has determined that this weekend, there will be a meteor shower, which consists of debris from asteroids. Fortunately, the observatory has been collecting precise data on these asteroids for several years up to the present.

NASA has hired us to identify visual patterns in this data to help classify whether these asteroids pose a threat to our ecosystem. The goal is to take preventive actions to alter their course and avoid a potential impact, thus preventing the extinction of humanity.

Business problem

Based on the data provided by the observatory, we need to create visualizations to answer the following questions:

Are all asteroids near Earth's orbit potentially hazardous?
What type of orbit do most asteroids have?
Is there a relationship between an asteroid's hazard level and its physical size?
Between Mars and Jupiter, there is an asteroid belt. How does Jupiter's massive size affect the orbits of these asteroids?
Are there asteroids with orbits smaller than Earth's that pose a potential threat?

Analytical context

The observatory has provided us with a dataset in .CSV format containing approximately one million records on asteroids. Some of the recorded characteristics include orbital eccentricity, longitude of the descending node, absolute magnitude of the asteroid, among 43 other available features.

The internal index of the dataset is named id.

Based on this data, we need to carry out the following tasks:

Read and preview the dataset.
Detect and process missing data, determine whether it can be discarded, or otherwise, fill in the gaps.
Detect and process outliers.
Identify relevant features.
Analyze and create visualizations of the data to answer the proposed questions and identify useful patterns.

About the dataset

This dataset was created by the researcher in Astronomy and Astrophysics, Mir Sakhawat Hossain. It is officially maintained by the Jet Propulsion Laboratory (JPL) of the California Institute of Technology, an organization supervised by NASA. This dataset contains various types of data related to asteroids.

It can be used in Machine Learning projects for both classification and regression tasks.

Column definitions

Feature	Description
id	Internal ID
spkid	Primary ID
fullname	Full designation/name of the object
pdes	Primary designation of the object
name	Object name as per the International Astronomical Union
prefix	Comet prefix
neo	Near-Earth Object (Y/N)
pha	Potentially Hazardous Asteroid (Y/N)
H	Absolute magnitude parameter
diameter	Object diameter (equivalent to a sphere) (km)
albedo	Geometric albedo
diameter_sigma	1-sigma uncertainty in the object's diameter (km)
orbit_id	Orbit solution ID
epoch	Osculation epoch in Julian day format (TBD)
epoch_mjd	Osculation epoch in Modified Julian day format (TBD)
epoch_cal	Osculation epoch in calendar date/time format (TBD)
equinox	Reference frame equinox
e	Eccentricity
a	Semi-major axis (au)
q	Perihelion distance (au)
i	Inclination. Angle relative to the x-y ecliptic plane (deg)
om	Longitude of the ascending node (deg)
w	Argument of perihelion (deg)
ma	Mean anomaly (deg)
ad	Aphelion distance (au) (also called Q)
n	Mean motion (deg/d)
tp	Time of perihelion passage (TBD)
tp_cal	Time of perihelion passage in calendar date/time format (TBD)
per	Orbital sidereal period (d)
per_y	Orbital sidereal period (years)
moid	Minimum orbit intersection distance with Earth (au)
moid_ld	Minimum orbit intersection distance with Earth (LD)
sigma_e	Eccentricity (1-sigma uncertainty)
sigma_a	Semi-major axis (1-sigma uncertainty) (au)
sigma_q	Perihelion distance (1-sigma uncertainty) (au)
sigma_i	Inclination. Angle relative to the x-y ecliptic plane (1-sigma uncertainty) (deg)
sigma_om	Longitude of the ascending node (1-sigma uncertainty) (deg)
sigma_w	Argument of perihelion (1-sigma uncertainty) (deg)
sigma_ma	Mean anomaly (1-sigma uncertainty) (deg)
sigma_ad	Aphelion distance (1-sigma uncertainty) (au)
sigma_n	Mean motion (1-sigma uncertainty) (deg/d)
sigma_tp	Time of perihelion passage (1-sigma uncertainty) (TBD)
sigma_per	Orbital sidereal period (1-sigma uncertainty) (d)
class	Orbit classification
rms	Normalized orbit fit RMS (arcsec)

Dataset analysis and Training models

The dataset analysis and the models training were conducted in Jupyter Notebook. You can find this file in this repository folder.

The training script for the selected model is available in this repository file.

The pipeline to preprocess the dataset, along with the label encoder and the final model, was exported to a file named HAP-model.bin

Running the project locally

Using Flask

The script to deploy the model using Flask is predict.py.

Pipfile and Pipfile.lock set up the Pipenv environment.

First, you need to install from Pipfile:

pipenv install

The virtual environment can be activated by running

pipenv shell

Once in the virtual enviroment, you can run the following commands:

cd scripts
python predict.py

You can test the model by running:

python scripts/test.py

Don't forget to update the url variable in the test.py file to:

url = "http://localhost:9696/predict"

Using Waitress as WSGI server

Once in the virtual enviroment, you can run the following commands:

cd scripts
waitress-serve --listen=0.0.0.0:9696 predict:app

You can test the model by running:

python scripts/test.py

Don't forget to update the url variable in the test.py file to:

url = "http://localhost:9696/predict"

Local deployment with Docker

Dockerfile contain the Docker instructions.

To build the container, you can run the following command:

docker build -t pha-model .

To run it:

docker run -p 9696:9696 -it pha-model:latest

Visualizations with Poliastro

Streamlit app

streamlit_app.mp4

Here’s the link to my Streamlit App.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

☄️ Hazardous Asteroid Classification - NASA JPL Asteroid by Alexander D. Rios

Abstract

Objective

Business context

Business problem

Analytical context

About the dataset

Column definitions

Dataset analysis and Training models

Running the project locally

Using Flask

Using Waitress as WSGI server

Local deployment with Docker

Visualizations with Poliastro

Streamlit app

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
analysis		analysis
app		app
dataset		dataset
model		model
scripts		scripts
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

aletbm/Hazardous_Asteroid_Classification

Folders and files

Latest commit

History

Repository files navigation

☄️ Hazardous Asteroid Classification - NASA JPL Asteroid by Alexander D. Rios

Abstract

Objective

Business context

Business problem

Analytical context

About the dataset

Column definitions

Dataset analysis and Training models

Running the project locally

Using Flask

Using Waitress as WSGI server

Local deployment with Docker

Visualizations with Poliastro

Streamlit app

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages