This project involves the development of a comprehensive database system to analyze NYC crime data. The primary objective was to integrate and process real-world public datasets into a structured format for deeper insights into crime patterns and stakeholder involvement. Utilizing PostgreSQL for data management and relational mapping, we implemented a robust ETL (Extract, Transform, Load) pipeline, designed a custom ER diagram, and developed a queryable database schema.
- Integrated two large-scale public datasets:
- NYPD Calls for Service (5.43M rows, 18 columns)
- NYPD Complaint Data Historic (8.91M rows, 35 columns)
- Preprocessed and resolved inconsistencies in raw datasets, reducing redundant information while ensuring alignment with database constraints.
- Designed an enhanced ER diagram with entities such as
Crime_Scene,Incident,NYPD,Dispatch_Duration, and relationships likeMonitor,Send,Arrive, andOccurred. - Integrated ISA hierarchies for the
NYPDentity (Transit_PoliceandPrecinct) to model specialization and total participation constraints. - Mapped the ER diagram into a normalized PostgreSQL schema.
- Implemented triggers to enforce participation and integrity constraints (e.g., ensuring
Crime_Sceneis linked to a validIncident). - Added array attributes to store multi-dimensional contact information (e.g., sub-officer details for
Transit_Police). - Integrated a text search feature for court reviews using the
TEXTdata type.
- Designed and executed 10+ complex SQL queries to analyze:
- Crime patterns by location and time.
- Victim demographics by race and gender.
- Stakeholder (hospital and court) involvement in crime responses.
- Description: Outlines the objectives, scope, and methodology for building the database.
- Key Contents:
- Problem statement and data sources.
- Initial ER diagram design.
- Project goals and expected outcomes.
- Description: Maps the initial ER diagram into a SQL schema, defining the tables and their relationships in PostgreSQL.
- Key Contents:
- SQL scripts for creating tables and relationships.
- Definitions of data types and constraints.
- Description: Includes an updated ER diagram with additional entities and relationships, reflected in an enhanced SQL schema.
- Key Contents:
- Extended ER diagram for additional requirements.
- SQL scripts for the enhanced schema with new constraints and relationships.
- Description: Implements assertions in PostgreSQL to enforce data integrity and business rules.
- Key Contents:
- SQL assertions to validate data across tables.
- Examples include enforcing valid date ranges or attribute constraints.
- Description: Enhances the schema by incorporating PostgreSQL array data types for efficient multi-dimensional data storage.
- Key Contents:
- Examples of array attributes in tables.
- Queries demonstrating manipulation and retrieval of array data.
- Database Management System: PostgreSQL
- Entities and Relationships: Modeled real-world scenarios with a revised ER diagram incorporating feedback and stakeholder considerations.
- Triggers and Constraints:
- Ensured database integrity with custom triggers for participation and ISA constraints.
- Implemented cascading updates and deletions for hierarchical data models.
