This project analyzes NYC 311 noise complaint data to uncover spatial and temporal patterns and forecast future complaint volumes. By understanding when and where noise complaints are most likely to occur, city agencies can better allocate enforcement resources and address recurring problem areas. The goal is to help answer questions like:
- When do most noise complaints occur?
- Which boroughs report the most noise?
- Are there seasonal or time-based trends?
- Can we forecast complaint volume over time? By season? By location?
- Python (pandas, numpy, matplotlib, seaborn, scikit-learn) for data cleaning, exploration, and forecasting
- SQL for querying, aggregating, and joining datasets
- Jupyter Notebooks for exploratory analysis and model development
NYC 311 Service Requests filtered to noise complaints (sample dataset: 311_noise_complaints_2024.csv). Includes date/time, complaint type, borough, and geolocation information.
nyc-noise/
├── data_raw/ # Raw data (unmodified source files)
│ └── 311_noise_complaints_2024.csv
├── data_processed/ # Cleaned/aggregated data ready for analysis
├── notebooks/ # Jupyter notebooks for EDA, forecasting, mapping
│ └── nyc_311_noise_analysis.ipynb
├── src/ # Python scripts for cleaning, feature engineering
├── assets/ # Images/plots for README and reports
├── dashboards/ # Tableau/Power BI dashboards
├── reports/ # Project reports or summaries
├── sql/
│ └── init_table.sql # Drops/creates table + loads CSV
├── scripts/
│ └── setup_db.py # Creates database + runs init_table.sql
│
├── environment.yml # Conda environment (alternative to requirements.txt)
├── LICENSE # Open-source license
└── README.md # Project overview and instructions
Starting baseline time series forecast models (seasonal naive, SARIMA)
- Evaluate model accuracy and identify high-risk time windows
- Create a Tableau dashboard for interactive exploration
This project uses PostgreSQL for data storage and Conda for environment management.
Follow the steps below to set up the environment, load the data, and generate the cleaned datasets.
- PostgreSQL installed and accessible via
psql - Conda (or Mamba) installed
- A PostgreSQL user with permission to create databases and tables
From the repo root (nyc-noise/), create and activate the environment:
conda env create -f environment.yml
conda activate nycnoiseRun the setup script to create the database, build tables, and load data:
python scripts/setup_db.pyThis will:
- Create a database called
nyc_noiseif it does not already exist. - Run
sql/init_table.sqlto create two tables:noise_complaints_2024(raw, full schema)noise_complaints_clean(slimmed, analysis-ready schema)
- Export the cleaned SQL dataset to
data_processed/noise_complaints_clean_sql.csv.
You can also use the Jupyter notebook to produce a parallel cleaned dataset:
jupyter notebook notebooks/nyc_311_noise_analysis.ipynbThe notebook will:
- Clean and transform the raw dataset with pandas.
- Save an additional file to
data_processed/noise_complaints_clean_py.csv. - Export key visualizations into
assets/for use in the README or reports.
Check the processed files in your repo:
head data_processed/noise_complaints_clean_sql.csv
head data_processed/noise_complaints_clean_py.csvYou can connect Tableau, Python, or other tools directly to these CSVs.
- By default the script uses your system username as the Postgres user.
- To override, set the environment variable
PGUSERbefore running the script:
PGUSER=your_pg_username python scripts/setup_db.py

