Timeout Issue When Scraping Emails from Multiple URLs Using Python requests

I am working on a web scraping project using Python's `requests` library. The goal is to scrape emails from numerous URLs. To handle network delays, I set the timeout parameter as `timeout=(10, 10)`.

However, when I run the script for multiple URLs, I encounter an issue where the program gets stuck on a request and does not respect the timeout settings. This results in the script hanging indefinitely, especially when scraping a large number of URLs.

Here’s the code snippet I’m using:

```
import requests  

urls = [  
    "http://example.com",  
    "http://anotherexample.com",  
    # ... more URLs  
]  
HEADERS={"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"}
for url in urls:  
    try:  
        response = requests.get(url, headers=HEADERS, timeout=(10, 10))  
        if response.status_code == 200:  
            # Extract emails (simplified for demonstration)  
            print(f"Emails from {url}: ", response.text)  
    except requests.exceptions.Timeout:  
        print(f"Timeout occurred for {url}")  
    except requests.exceptions.RequestException as e:  
        print(f"Error occurred for {url}: {e}")  
```


Despite using the timeout parameter, the script sometimes gets stuck indefinitely and doesn’t proceed to the next URL.

**Steps Taken**:
1. Tried reducing the timeout values to (5, 5) but encountered the same issue.
Ensured that the URLs are valid and accessible.

**My Questions**:
1. Why might the timeout not work as expected in this case?

2. How can I ensure that the script doesn't hang indefinitely when scraping a large number of URLs?

Any help or suggestions to resolve this issue would be greatly appreciated.

**Environment**:

Python version: 3.10.10

requests version: 2.32.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Timeout Issue When Scraping Emails from Multiple URLs Using Python requests #6862

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Timeout Issue When Scraping Emails from Multiple URLs Using Python requests #6862

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions