
You are looking at a LinkedIn post with an active comment section. The discussion below the post contains questions, objections, and context that do not appear in engagement metrics. When feedback needs to be examined across multiple posts, copying comments by hand quickly becomes unmanageable. That is often when the question comes up: how can you scrape LinkedIn comments and turn them into structured data that can be analyzed.
This article demonstrates a runnable Python workflow for scraping LinkedIn comments from public posts and exporting them into CSV or JSON files for scraping data tasks. It examines how dynamic comment threads load, how comments and nested replies appear in the rendered page, and how extracted data can be structured for analysis. One constraint is established early. As volume increases, access conditions and session behavior tend to limit collection before parsing logic does. The scope remains limited to publicly visible comment data, with implementation details discussed alongside the operational realities of scaling.
When you look at ways to collect comments from LinkedIn, they may seem similar at first. The differences start to show once you move beyond a few posts or repeat the process regularly.
Manual copying works when you only need to review a small number of posts and want to avoid any access risk.
Browser tools and extensions reduce setup time but depend on fixed page structures and offer little control when the page behavior changes.
Python automation uses libraries like Selenium to control a browser, load dynamic comment threads, and extract both comments and replies into structured formats. A LinkedIn post comment scraper built this way holds up better as collection needs grow.
Third-party data providers deliver collected LinkedIn data through APIs or datasets but reduce transparency into sourcing methods, data freshness, and completeness.
Once you need to collect comments from more than a handful of posts, browser automation usually gives you the most control over structure and repeatability. That is why the rest of this guide focuses on scraping automation with Python. If you have not yet built a stable LinkedIn scraping setup, start with how to scrape LinkedIn with Python to understand session handling and setup before moving to comment collection.
When you open a LinkedIn post, the comment section does not load as a complete dataset. What appears on screen is assembled through dynamic content rendering, a pattern common when scraping dynamic content from modern websites. Because comments are delivered after the initial page render, the HTML available at load time rarely contains the full discussion even though comments are visible in the browser.
Comments load gradually through JavaScript and depend on page behavior. A small set appears with the initial render, while additional comments only load after scrolling or other user interaction. The trigger for loading more comments is not fixed. It may respond to scrolling in one case and require explicit interaction in another, which explains why scripts based on a single request or static assumptions often stop early.
LinkedIn comment structure adds further complexity. Top level comments and replies do not share the same position in the page. Replies are nested under parent comments and often load only after a thread expands. These loading patterns account for common collection issues where comments appear on screen but extracted results are incomplete, missing replies, or empty due to page state rather than parsing logic.
This section presents a complete Python workflow for collecting comments from public LinkedIn posts. Each step is intentionally narrow.
Use a recent Python 3 release and a browser automation library. Minor version mismatches can prevent the browser from launching before any page interaction occurs.
Install Selenium and confirm that a compatible browser driver is available.
pip install seleniumRun a minimal check that starts and exits a browser session cleanly.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.linkedin.com")
driver.quit()If this fails, resolve the environment issue before proceeding.
LinkedIn credentials should not be embedded in the script. Repeated automated authentication attempts tend to trigger extra verification and reduce session reliability.
Open a browser session and authenticate manually.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.linkedin.com/login")After login completes, keep the same browser session active. The remaining steps rely on this authenticated context.
Navigate directly to the post URL that contains the comment thread.
post_url = "https://www.linkedin.com/posts/example-post-123"
driver.get(post_url)The only requirement here is that the post renders and the comment section is visible on screen. If comments are not visible, later steps will not succeed.
This is the most fragile part of the workflow. LinkedIn does not expose all comments at once. Additional comments and replies appear after scrolling or interaction. A bounded scrolling loop can trigger loading while avoiding endless attempts.
import time
last_height = driver.execute_script("return document.body.scrollHeight")
attempts = 0
max_attempts = 15
while attempts < max_attempts:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
attempts += 1
else:
last_height = new_height
attempts = 0This approach triggers loading behavior. Full coverage is not guaranteed. As collection volume increases, instability usually appears here first.
After comments are visible in the rendered page, extraction can begin. Main comments and replies should be treated as separate elements to preserve their relationship.
from selenium.webdriver.common.by import By
comment_elements = driver.find_elements(By.CSS_SELECTOR, "div.comments-comment-item")
comments_data = []
for element in comment_elements:
try:
author = element.find_element(By.CSS_SELECTOR, "a.comments-post-meta__actor-link").text
text = element.find_element(By.CSS_SELECTOR, "span[dir='ltr']").text
comments_data.append({
"author": author,
"text": text,
"timestamp": "",
"likes": 0,
"is_reply": "Reply" in element.get_attribute("class")
})
except Exception:
continueThe priority is consistency. A small, repeatable schema holds up better than exhaustive field coverage.
Persist the collected records so the browser session does not need to be repeated.
CSV output
import csv
if comments_data:
with open("linkedin_comments.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=comments_data[0].keys())
writer.writeheader()
writer.writerows(comments_data)JSON output
import json
with open("linkedin_comments.json", "w", encoding="utf-8") as f:
json.dump(comments_data, f, ensure_ascii=False, indent=2)Visible LinkedIn comments have now been converted into structured data. Running this once and running it repeatedly are different conditions. As scope grows, access behavior and session stability tend to matter more than parsing logic.
Comment scraping often appears stable during early tests. Pages load normally, comment threads expand, and extracted data matches what is visible in the browser. As the same workflow runs repeatedly or expands across more posts, results begin to degrade even though the scraping logic itself remains unchanged.
Common failure signals follow a predictable pattern:
Comment threads stop loading before completion
Pages return partial or empty data while comments remain visible on screen
Replies disappear while parent comments remain
Sessions expire more frequently or access becomes restricted
These issues are rarely caused by selectors or parsing logic. The underlying cause lies in access behavior over time. Repeated requests accumulate recognizable patterns over time, and stability depends on IP distribution, session consistency, and controlled request pacing. Under these conditions, residential proxies can help reduce repetition by spreading access across real user IPs.
Stability improves when access behavior remains predictable across repeated runs.
Key controls include:
Regulating request pace to avoid burst patterns
Limiting each run instead of collecting entire threads at once
Keeping browser sessions consistent across interactions
Avoiding identity changes within the same collection cycle
Reducing IP reuse across repeated access
For longer or batched collection, access infrastructure becomes part of the workflow. Residential IP distribution combined with session persistence and geographic diversity helps maintain consistent access as patterns accumulate. IPcook supports this approach through residential IP rotation with session continuity, allowing repeated comment collection without altering existing scraping logic.
Key IPcook capabilities supporting stable LinkedIn comment scraping include:
Pricing starts at $3.2 per GB and drops to $0.5 per GB at scale
55M+ real residential IPs with rotation at request or time level
Sticky sessions configurable up to 24 hours to preserve thread continuity
City- and country-level geo targeting for localized access behavior
Non-expiring traffic suitable for repeated or extended collection cycles
HTTP(S) and SOCKS5 support for Python-based scraping tools
You can start with 100 MB of IPcook's free residential proxy traffic right now to verify access consistency, response completeness, and comment thread coverage under real LinkedIn scraping conditions before scaling further.
Scraping LinkedIn comments reliably depends as much on access conditions as on code logic. This guide shows how to scrape LinkedIn comments from public posts using Python while managing dynamic loading, nested replies, and structured exports. The process works consistently on a small scale, but as runs grow longer or repeat across posts, stability begins to hinge on how sessions persist and traffic behaves over time.
For teams that monitor discussions continuously or collect at scale, the access environment matters more than code changes. Using realistic request pacing and rotating IPs keeps comment loading stable across sessions. IPcook provides affordable and reliable residential proxies that mimic real user browsing, support large-scale monitoring, and help maintain consistent results across repeated runs.