Coupon Banner
IPCook

How to Scrape Craigslist with Python: Step-by-Step Guide in 2026

Zora Quinn
Zora Quinn
December 30, 2025
12 min read
How To Scrape Scrape Craigslist

You may be analyzing local housing rent trends or tracking price movements in second-hand electronics. In that kind of work, Craigslist often becomes the main source of public listings. The data is visible and updated frequently, but manual collection stops working once your sample starts to grow. When you scrape Craigslist with Python, a script may run fine at first and then fail as request volume increases. In 2026, scraping Craigslist comes down to how well your approach fits current page structures and the access limits that affect repeated requests.

This guide shows how to scrape Craigslist with Python for research and monitoring projects. It covers building search queries, sending and parsing requests, handling pagination, and saving data in formats you can analyze. It also explains what changes as datasets grow, where access restrictions usually show up, and why scripts that work at small scale often break under sustained use. If your goal is to turn Craigslist listings into reliable input for market research or price analysis, this is a solid place to start.

What Data You Can Scrape from Craigslist

When you scrape Craigslist, you work with listing data that is publicly visible on the site. This public layer supports market research, price tracking, and supply analysis, and it is sufficient for most research projects.

Core Listing Fields Available on Craigslist

Craigslist listings follow a relatively stable structure. The fields below appear consistently across search results and individual listing pages, which makes them suitable for repeated collection and longitudinal analysis.

Data field

Where it appears

Why it matters

Listing title

Search results and listing page

Grouping, filtering, and comparison

Price

Search results and listing page

Core metric for pricing analysis

Listing URL

Search results

Reference key and deduplication

Posted date or update time

Listing page

Time series analysis

Category and subcategory

Search results

Market segmentation

Location

Search results

Regional comparison

Description

Listing page

Context and qualitative signals

Search result pages provide fast access to titles, prices, categories, locations, and links. Individual listing pages contain the full description and posting time, which are required when building complete records rather than summaries.

Common Analytical Use Cases for Craigslist Listings

  • Tracking price changes across cities or categories

  • Comparing supply levels between regions

  • Aggregating listings into a single dataset for analysis

  • Monitoring new postings as they appear

Prepare a Python Environment for Craigslist Data Collection

A Python environment for Craigslist scraping should stay simple and reliable. The goal is to send consistent requests, parse HTML safely, and store clean data for later analysis.

Use Python 3 and install only two required libraries.

pip install requests beautifulsoup4

requests manages HTTP communication with Craigslist. BeautifulSoup extracts structured data from returned HTML pages.

If requests fail or parsing returns empty pages, later steps cannot correct these problems. Always make sure your environment can send and receive responses properly before moving forward.

Step 1: Build a Craigslist Search URL

Before sending any request, the search URL defines what data will be collected. When learning how to scrape Craigslist, this step sets the scope of the dataset.

How Craigslist Search URLs Work

The search URL defines what data will be collected. Craigslist uses city subdomains and category paths that determine which listings appear.

base_url = "https://newyork.craigslist.org/search/sss"

Here, newyork sets the city and sss refers to the for-sale section.

You can filter results before scraping to reduce noise.

params = {
    "query": "laptop",
    "min_price": 300,
    "max_price": 800,
    "hasPic": 1
}
# hasPic=1 filters out listings without photos

This small filter improves data quality and keeps requests focused on complete listings.

Step 2: Send HTTP Requests with Python

HTTP requests determine whether pages load consistently or start failing after repeated access.

Use a Stable Session with Retries

Creating a session makes network behavior more predictable and prevents repeated connection setup. Retries handle temporary errors without stopping the scraper.

import requests
from requests.adapters import HTTPAdapter, Retry

session = requests.Session()
retries = Retry(
    total=3,              # retry up to three times
    backoff_factor=1.5,   # waits grow to about 1.5s, 3s, 6s
    status_forcelist=[429, 500, 502, 503, 504]
)
session.mount("https://", HTTPAdapter(max_retries=retries))

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept-Language": "en-US,en;q=0.9"
}

response = session.get(base_url, headers=headers, params=params, timeout=10)

A modest retry delay keeps the scraper polite and reduces the risk of immediate repeated failures.

Validate Each Response

Even a successful status code does not always mean the response contains HTML content. Add checks to avoid silent failures.

if response.status_code != 200:
    print("Request failed with status:", response.status_code)
elif not response.text:
    print("Empty response body")

This creates the first layer of stability in your workflow.

Step 3: Parse Craigslist Search Results

Once a response is received, parsing extracts the data you need.

Locate Listing Elements

Craigslist pages repeat similar structures for each listing. Selecting by structure rather than text keeps your scraper more robust.

from bs4 import BeautifulSoup

def parse_listings(html):
    soup = BeautifulSoup(html, "html.parser")
    listings = soup.select("li.result-row, li.cl-search-result")
    rows = []
    for item in listings:
        title_tag = item.select_one("a.result-title, a.cl-app-anchor")
        price_tag = item.select_one("span.result-price, span.price")
        post_id = item.get("data-pid")
        location_tag = item.select_one("span.result-hood")

        rows.append({
            "post_id": post_id,
            "title": title_tag.text.strip() if title_tag else None,
            "price": price_tag.text.strip() if price_tag else None,
            "url": title_tag["href"] if title_tag and "href" in title_tag.attrs else None,
            "location": location_tag.text.strip(" ()") if location_tag else None
        })
    return rows

results = parse_listings(response.text)
all_results = list(results)  # seed result set for later steps

If the list is empty, check whether the request succeeded or the page layout changed. This produces clean structured rows that can be reused in later steps.

Step 4: Visit Individual Listing Pages

Search result pages contain summaries only. To collect full descriptions or posting times, open each detail page.

import random, time

# Visit a limited number of listings to demonstrate detail collection
for i, listing in enumerate(all_results[:10]):  # limit to the first ten
    if not listing.get("url"):
        continue
    try:
        print(f"Fetching detail page {i+1}: {listing['url']}")
        detail_response = session.get(listing["url"], headers=headers, timeout=10)
        detail_response.raise_for_status()

        detail_soup = BeautifulSoup(detail_response.text, "html.parser")

        description = detail_soup.select_one("#postingbody")
        posted_elem = detail_soup.select_one("time.date.timeago")

        listing["description"] = description.get_text(strip=True, separator=" ") if description else None
        listing["posted_date"] = posted_elem["datetime"] if posted_elem and posted_elem.has_attr("datetime") else None

        time.sleep(random.uniform(1.0, 2.0))
    except requests.exceptions.RequestException as e:
        print(f"Failed to fetch {listing['url']}: {e}")
        listing["description"] = None
        listing["posted_date"] = None

This loop enriches multiple records with detail information while keeping requests controlled and polite.

Step 5: Handle Pagination Across Multiple Pages

Pagination enlarges your dataset but also increases the chance of unstable requests.

Control Pagination Flow

Craigslist uses an offset parameter to move through pages. Add short, random pauses between requests to reduce access frequency.

import time, random

page_size = 120
offset = 0

while True:
    params["s"] = offset
    page = session.get(base_url, headers=headers, params=params, timeout=10)
    page.raise_for_status()

    page_rows = parse_listings(page.text)
    if not page_rows:
        print("No more listings found.")
        break

    all_results.extend(page_rows)
    offset += page_size
    print(f"Collected {len(all_results)} total listings so far.")
    time.sleep(random.uniform(1.5, 3.5))

This pagination loop continuously expands all_results and ensures each request is spaced out to prevent rate limits or temporary blocks.

Step 6: Save Scraped Craigslist Data to CSV or JSON

Collected data needs to be stored in formats that are reusable for later analysis.

Save to CSV

Before writing, check whether the dataset is empty to avoid errors.

import csv

if all_results:
    with open("craigslist_results.csv", "w", newline="", encoding="utf-8") as f:
        fieldnames = ["post_id", "title", "price", "url", "location", "description", "posted_date"]
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(all_results)
    print("Saved results to craigslist_results.csv")
else:
    print("No data to save.")

Save to JSON

JSON preserves structure and works well for programmatic use.

import json

if all_results:
    with open("craigslist_results.json", "w", encoding="utf-8") as f:
        json.dump(all_results, f, ensure_ascii=False, indent=2)
    print("Saved results to craigslist_results.json")

Clean storage keeps your dataset reusable without running the scraper again.

👉 Final check

  • Small mistakes are handled without breaking the workflow

  • Each step can be run and verified independently

  • Failures point back to URL construction, request behavior, or parsing logic

  • The process remains usable as data volume grows

Follow Best Practices for Stable and Responsible Scraping

When web scraping craigslist becomes unreliable, the issue is rarely parsing logic. Incomplete pages, failed pagination, or inconsistent results usually appear after repeated requests and longer runtimes. These signals point to network level limits rather than errors in how data is extracted.

Basic request discipline helps early on. Predictable pacing, short delays during pagination, retrying failures with backoff, and collecting only publicly visible listing data reduce risk in small runs. As scraping craigslist becomes sustained, however, stability turns into a network problem. Concentrating all requests on a single IP increases identity risk and shortens run time. This is where a dedicated craigslist proxy service becomes necessary. IPcook addresses this layer by distributing request identity across residential IPs, keeping long running collection more consistent without changing scraper logic.

Why IPcook fits large scale Craigslist scraping

  • Residential IP rotation that reduces identity concentration

  • Geographic targeting aligned with city based Craigslist subdomains

  • Support for HTTP and SOCKS5 in common Python workflows

  • More stable pagination and detail page access during long runs

If your scraper works briefly but degrades as volume grows, the limitation is likely request identity rather than code quality. IPcook makes it easier to validate stability early and scale Craigslist data collection with fewer interruptions.

Conclusion

Scraping Craigslist effectively requires more than working code. It depends on stable access over time. By structuring requests carefully, validating responses, and spacing pagination, you can maintain reliable data collection as volume grows.

If your scripts start failing during long runs, the issue usually lies in request identity rather than logic. With IPcook’s residential proxy service and geo-targeted sessions, large-scale Craigslist scraping becomes sustainable and efficient. IPcook also offers a 100 MB free residential proxy trial, making it easy to test scraping stability before scaling up. For larger workloads, residential proxy traffic is available from as low as $0.5 per GB, giving you full control over cost and performance.

Try IPcook today and start collecting Craigslist data more reliably.

Related Articles

    No related articles found

Your Global Proxy Network Awaits

Join now and instantly access our pool of 50M+ real residential IPs across 185+ countries.