
You may be analyzing local housing rent trends or tracking price movements in second-hand electronics. In that kind of work, Craigslist often becomes the main source of public listings. The data is visible and updated frequently, but manual collection stops working once your sample starts to grow. When you scrape Craigslist with Python, a script may run fine at first and then fail as request volume increases. In 2026, scraping Craigslist comes down to how well your approach fits current page structures and the access limits that affect repeated requests.
This guide shows how to scrape Craigslist with Python for research and monitoring projects. It covers building search queries, sending and parsing requests, handling pagination, and saving data in formats you can analyze. It also explains what changes as datasets grow, where access restrictions usually show up, and why scripts that work at small scale often break under sustained use. If your goal is to turn Craigslist listings into reliable input for market research or price analysis, this is a solid place to start.
When you scrape Craigslist, you work with listing data that is publicly visible on the site. This public layer supports market research, price tracking, and supply analysis, and it is sufficient for most research projects.
Core Listing Fields Available on Craigslist
Craigslist listings follow a relatively stable structure. The fields below appear consistently across search results and individual listing pages, which makes them suitable for repeated collection and longitudinal analysis.
Data field | Where it appears | Why it matters |
Listing title | Search results and listing page | Grouping, filtering, and comparison |
Price | Search results and listing page | Core metric for pricing analysis |
Listing URL | Search results | Reference key and deduplication |
Posted date or update time | Listing page | Time series analysis |
Category and subcategory | Search results | Market segmentation |
Location | Search results | Regional comparison |
Description | Listing page | Context and qualitative signals |
Search result pages provide fast access to titles, prices, categories, locations, and links. Individual listing pages contain the full description and posting time, which are required when building complete records rather than summaries.
Common Analytical Use Cases for Craigslist Listings
Tracking price changes across cities or categories
Comparing supply levels between regions
Aggregating listings into a single dataset for analysis
Monitoring new postings as they appear
A Python environment for Craigslist scraping should stay simple and reliable. The goal is to send consistent requests, parse HTML safely, and store clean data for later analysis.
Use Python 3 and install only two required libraries.
pip install requests beautifulsoup4requests manages HTTP communication with Craigslist.
BeautifulSoup extracts structured data from returned HTML pages.
If requests fail or parsing returns empty pages, later steps cannot correct these problems. Always make sure your environment can send and receive responses properly before moving forward.
Before sending any request, the search URL defines what data will be collected. When learning how to scrape Craigslist, this step sets the scope of the dataset.
The search URL defines what data will be collected. Craigslist uses city subdomains and category paths that determine which listings appear.
base_url = "https://newyork.craigslist.org/search/sss"Here, newyork sets the city and sss refers to the for-sale section.
You can filter results before scraping to reduce noise.
params = {
"query": "laptop",
"min_price": 300,
"max_price": 800,
"hasPic": 1
}
# hasPic=1 filters out listings without photosThis small filter improves data quality and keeps requests focused on complete listings.
HTTP requests determine whether pages load consistently or start failing after repeated access.
Creating a session makes network behavior more predictable and prevents repeated connection setup. Retries handle temporary errors without stopping the scraper.
import requests
from requests.adapters import HTTPAdapter, Retry
session = requests.Session()
retries = Retry(
total=3, # retry up to three times
backoff_factor=1.5, # waits grow to about 1.5s, 3s, 6s
status_forcelist=[429, 500, 502, 503, 504]
)
session.mount("https://", HTTPAdapter(max_retries=retries))
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9"
}
response = session.get(base_url, headers=headers, params=params, timeout=10)A modest retry delay keeps the scraper polite and reduces the risk of immediate repeated failures.
Even a successful status code does not always mean the response contains HTML content. Add checks to avoid silent failures.
if response.status_code != 200:
print("Request failed with status:", response.status_code)
elif not response.text:
print("Empty response body")This creates the first layer of stability in your workflow.
Once a response is received, parsing extracts the data you need.
Craigslist pages repeat similar structures for each listing. Selecting by structure rather than text keeps your scraper more robust.
from bs4 import BeautifulSoup
def parse_listings(html):
soup = BeautifulSoup(html, "html.parser")
listings = soup.select("li.result-row, li.cl-search-result")
rows = []
for item in listings:
title_tag = item.select_one("a.result-title, a.cl-app-anchor")
price_tag = item.select_one("span.result-price, span.price")
post_id = item.get("data-pid")
location_tag = item.select_one("span.result-hood")
rows.append({
"post_id": post_id,
"title": title_tag.text.strip() if title_tag else None,
"price": price_tag.text.strip() if price_tag else None,
"url": title_tag["href"] if title_tag and "href" in title_tag.attrs else None,
"location": location_tag.text.strip(" ()") if location_tag else None
})
return rows
results = parse_listings(response.text)
all_results = list(results) # seed result set for later stepsIf the list is empty, check whether the request succeeded or the page layout changed. This produces clean structured rows that can be reused in later steps.
Search result pages contain summaries only. To collect full descriptions or posting times, open each detail page.
import random, time
# Visit a limited number of listings to demonstrate detail collection
for i, listing in enumerate(all_results[:10]): # limit to the first ten
if not listing.get("url"):
continue
try:
print(f"Fetching detail page {i+1}: {listing['url']}")
detail_response = session.get(listing["url"], headers=headers, timeout=10)
detail_response.raise_for_status()
detail_soup = BeautifulSoup(detail_response.text, "html.parser")
description = detail_soup.select_one("#postingbody")
posted_elem = detail_soup.select_one("time.date.timeago")
listing["description"] = description.get_text(strip=True, separator=" ") if description else None
listing["posted_date"] = posted_elem["datetime"] if posted_elem and posted_elem.has_attr("datetime") else None
time.sleep(random.uniform(1.0, 2.0))
except requests.exceptions.RequestException as e:
print(f"Failed to fetch {listing['url']}: {e}")
listing["description"] = None
listing["posted_date"] = NoneThis loop enriches multiple records with detail information while keeping requests controlled and polite.
Pagination enlarges your dataset but also increases the chance of unstable requests.
Craigslist uses an offset parameter to move through pages. Add short, random pauses between requests to reduce access frequency.
import time, random
page_size = 120
offset = 0
while True:
params["s"] = offset
page = session.get(base_url, headers=headers, params=params, timeout=10)
page.raise_for_status()
page_rows = parse_listings(page.text)
if not page_rows:
print("No more listings found.")
break
all_results.extend(page_rows)
offset += page_size
print(f"Collected {len(all_results)} total listings so far.")
time.sleep(random.uniform(1.5, 3.5))This pagination loop continuously expands all_results and ensures each request is spaced out to prevent rate limits or temporary blocks.
Collected data needs to be stored in formats that are reusable for later analysis.
Before writing, check whether the dataset is empty to avoid errors.
import csv
if all_results:
with open("craigslist_results.csv", "w", newline="", encoding="utf-8") as f:
fieldnames = ["post_id", "title", "price", "url", "location", "description", "posted_date"]
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(all_results)
print("Saved results to craigslist_results.csv")
else:
print("No data to save.")JSON preserves structure and works well for programmatic use.
import json
if all_results:
with open("craigslist_results.json", "w", encoding="utf-8") as f:
json.dump(all_results, f, ensure_ascii=False, indent=2)
print("Saved results to craigslist_results.json")Clean storage keeps your dataset reusable without running the scraper again.
👉 Final check
Small mistakes are handled without breaking the workflow
Each step can be run and verified independently
Failures point back to URL construction, request behavior, or parsing logic
The process remains usable as data volume grows
When web scraping craigslist becomes unreliable, the issue is rarely parsing logic. Incomplete pages, failed pagination, or inconsistent results usually appear after repeated requests and longer runtimes. These signals point to network level limits rather than errors in how data is extracted.
Basic request discipline helps early on. Predictable pacing, short delays during pagination, retrying failures with backoff, and collecting only publicly visible listing data reduce risk in small runs. As scraping craigslist becomes sustained, however, stability turns into a network problem. Concentrating all requests on a single IP increases identity risk and shortens run time. This is where a dedicated craigslist proxy service becomes necessary. IPcook addresses this layer by distributing request identity across residential IPs, keeping long running collection more consistent without changing scraper logic.
Why IPcook fits large scale Craigslist scraping
Residential IP rotation that reduces identity concentration
Geographic targeting aligned with city based Craigslist subdomains
Support for HTTP and SOCKS5 in common Python workflows
More stable pagination and detail page access during long runs
If your scraper works briefly but degrades as volume grows, the limitation is likely request identity rather than code quality. IPcook makes it easier to validate stability early and scale Craigslist data collection with fewer interruptions.
👀 Related scraping guides
Scraping Craigslist effectively requires more than working code. It depends on stable access over time. By structuring requests carefully, validating responses, and spacing pagination, you can maintain reliable data collection as volume grows.
If your scripts start failing during long runs, the issue usually lies in request identity rather than logic. With IPcook’s residential proxy service and geo-targeted sessions, large-scale Craigslist scraping becomes sustainable and efficient. IPcook also offers a 100 MB free residential proxy trial, making it easy to test scraping stability before scaling up. For larger workloads, residential proxy traffic is available from as low as $0.5 per GB, giving you full control over cost and performance.
Try IPcook today and start collecting Craigslist data more reliably.