Skip to content

Conversation

@rsatija
Copy link

@rsatija rsatija commented Jul 9, 2025

Submitting this as a PR in case others find this useful. Note that this has been vibecoded with a fair amount of testing and is working well, but may have unanticipated bugs. Happy to continue to update if its helpful.

Summary

Enhanced the flight data extraction capabilities to include flight numbers, departure/arrival airport codes, and connecting airports for multi-segment flights.

Changes Made

New Fields Added to Flight Dataclass

  • flight_number: Extracted from itinerary URL data attributes
  • departure_airport: First airport in the itinerary
  • arrival_airport: Last airport in the itinerary
  • connecting_airports: List of intermediate airports for connecting flights

Enhanced HTML Parsing Logic

  • Added extraction of flight numbers from data-travelimpactmodelwebsiteurl attribute
  • Implemented airport code parsing from itinerary URLs
  • Added support for multi-segment connecting flights
  • Improved regex patterns for robust extraction across different airlines

rsatija added 2 commits July 9, 2025 12:19
- Add flight_number field to Flight dataclass in schema.py
- Implement flight number extraction from data-travelimpactmodelwebsiteurl attribute
- Support extraction for multiple airlines including Delta, JetBlue, and Frontier
- Add debug output for Delta and Frontier flights to help with development
- Update test scripts to display flight numbers in output
- Fix extraction logic to search within individual flight items instead of entire document
- Add departure_airport and arrival_airport fields to Flight dataclass
- Extract airport codes from data-travelimpactmodelwebsiteurl attribute
- Support extraction for all airlines (Delta, JetBlue, American, Frontier, etc.)
- Update test script to display airport codes in output
- Airport codes are extracted from URL patterns like 'itinerary=JFK-LAX-F9-2503-20250801'
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jul 9, 2025
@Manouchehri
Copy link
Contributor

What's the point of departure_airport and arrival_airport? Won't it always be the same as what we provided in our query?

Copy link
Contributor

@Manouchehri Manouchehri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeError: Flight.__init__() got an unexpected keyword argument 'connecting_airports'

@Manouchehri
Copy link
Contributor

Would be nice if this could get the connecting flight numbers too. :)

@Manouchehri
Copy link
Contributor

TypeError: Flight.__init__() got an unexpected keyword argument 'connecting_airports'

I fixed this in #68.

@rsatija
Copy link
Author

rsatija commented Jul 9, 2025

Thanks! The reason the airport codes may not be the same is that some searches span multiple airports (i.e. NYC covers a few, so its helpful to know which specific airport is for a particular flight)

@Manouchehri

This comment was marked as outdated.

@Manouchehri
Copy link
Contributor

Doesn't get_flights only accept airport names (like JFK, EWR, and LGA)?

…field to Flight dataclass - Improve error handling to show relevant HTML parts instead of full page - Fix connecting airports extraction logic for multi-segment flights
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants