Skip to main content

How to Scrape LinkedIn Comments (Tools, Code & Ethics)

Junaid Khalid

Junaid Khalid

16 min read

You're launching a product, analyzing competitors, or researching your target audience's pain points. LinkedIn comments contain goldmines of unfiltered feedback, questions, and objections.

But manually copying hundreds of comments is tedious. You need the data systematically extracted, organized, and ready for analysis.

This is where LinkedIn comment scraping comes in. But before you start extracting data, you need to understand three critical dimensions: what's technically possible, what's legally permissible, and what's ethically sound.

This guide covers all three. You'll learn the tools, the code, and the boundaries. By the end, you'll know how to extract LinkedIn comment data responsibly (or why you might choose alternative approaches entirely).

What is LinkedIn Comment Scraping?

Comment scraping means programmatically extracting comments from LinkedIn posts and converting them into structured data you can analyze.

Why scrape LinkedIn comments?

Research and competitive analysis: Understand what questions prospects ask on competitor posts. Identify unmet needs. Discover language patterns your target audience uses.

Example: A SaaS founder scrapes comments on competitors' feature announcement posts to discover what users are requesting, complaining about, or praising.

Sentiment analysis on engagement: Track sentiment trends across your industry. Are people excited about new regulations or concerned? Which topics generate positive vs. negative reactions?

Example: A marketing agency scrapes comments on industry news posts to gauge professional sentiment about remote work policies, informing their content strategy.

Lead generation and prospecting: Identify engaged commenters on strategic posts. Someone who writes thoughtful comments on industry content is an active LinkedIn user likely to respond to connection requests.

Example: A business consultant scrapes comments from top leadership coaches' posts, then reaches out to engaged commenters with personalized connection requests.

Influencer identification: Find active participants in your niche. Who comments frequently? Whose comments get the most replies? These are micro-influencers worth building relationships with.

Content strategy insights: Analyze which comment types get the most engagement. Do questions outperform statements? Do people prefer data-driven comments or personal stories?


The legality of web scraping exists in gray areas. The answer depends on how you scrape, what you scrape, and what you do with the data.

LinkedIn's Terms of Service on scraping: LinkedIn's User Agreement (Section 8.2) explicitly prohibits scraping:

"You agree that you will not... use bots or other automated methods to access the Services, add or download contacts, send or redirect messages, or perform other activities through the Services."

This is unambiguous. LinkedIn's official position is: don't scrape our platform.

Notable legal cases (hiQ Labs vs. LinkedIn):

The most important scraping precedent came from hiQ Labs v. LinkedIn (2019-2022). Key developments:

  • hiQ's position: They argued that scraping publicly visible data (including comments) should be legal, as it's already publicly accessible
  • LinkedIn's position: Even public data is protected by the Computer Fraud and Abuse Act (CFAA) when accessed via scraping
  • Initial ruling (2019): Ninth Circuit sided with hiQ, stating that scraping publicly accessible data doesn't violate CFAA
  • Final outcome (2022): Case remanded and ultimately settled, leaving the legal question partially unresolved

Current legal consensus (as of 2025):

  • Scraping public data is in a legal gray area, not clearly illegal but not clearly legal
  • Scraping data behind login walls is riskier legally
  • The issue is more about LinkedIn's Terms of Service (a contract violation) than criminal law
  • LinkedIn actively pursues legal action against commercial scraping operations

Ethical scraping practices:

Even if something is technically legal or in a gray area, ethical considerations matter:

  1. Respect user privacy: Comments are public, but users didn't consent to mass data extraction. Avoid exposing personally identifiable information (PII) in research outputs.

  2. Don't sell or misuse scraped data: Extracting comments for market research differs from selling email lists scraped from profiles.

  3. Respect rate limits: Excessive scraping burdens LinkedIn's servers. Implement delays between requests.

  4. Consider alternatives first: Does the LinkedIn API provide what you need? Can you collect data manually for smaller projects? Scraping should be a last resort, not the first approach.

  5. Be transparent: If publishing research based on scraped LinkedIn data, disclose your methodology.

When to use the official LinkedIn API instead:

LinkedIn provides APIs for legitimate business use cases:

  • LinkedIn Marketing API: For advertising and sponsored content analytics
  • LinkedIn Consumer API: For authorized integrations (limited access)
  • LinkedIn Sales Navigator: For lead generation within their ecosystem

API limitations for comment data:

  • Most LinkedIn APIs don't provide broad comment access
  • Comment data access requires special partnerships
  • Rate limits are strict
  • Approval process is lengthy and selective

For most use cases (competitive research, sentiment analysis), LinkedIn won't grant API access. This forces the choice between manual collection or scraping.

Our recommendation: For commercial projects or ongoing data collection, the legal and reputational risks of scraping may outweigh benefits. For one-time research projects with small datasets, scraping public comments carries lower risk but isn't risk-free.


Methods to Scrape LinkedIn Comments

Four approaches exist, ranging from completely manual to fully automated.

Method 1: Manual Copy-Paste (Small Scale)

What it involves: Manually copying comments from posts and pasting them into a spreadsheet.

When to use it:

  • Analyzing 50-100 comments from a handful of posts
  • One-time research projects
  • When you want complete safety from ToS violations
  • Academic research requiring careful context preservation

Pros:

  • Zero technical skills required
  • No scraping software needed
  • Complete control over what data you collect
  • No risk of account restrictions

Cons:

  • Extremely time-consuming
  • Doesn't scale beyond small projects
  • Manual errors likely
  • Difficult to maintain consistency

Process:

  1. Open target LinkedIn post
  2. Scroll to load all comments (click "Load more comments")
  3. Copy each comment into spreadsheet columns: commenter name, comment text, timestamp, likes
  4. Repeat for each post

Time estimate: 3-5 minutes per post (10-15 comments per post)


Method 2: Browser Extensions & Tools

What it involves: Using third-party browser extensions or desktop applications designed for LinkedIn scraping.

Popular tools:

  • Phantombuster (cloud-based automation)
  • Octoparse (no-code scraping)
  • Browser extensions like "LinkedIn Comment Extractor"

When to use it:

  • Moderate volume (100-500 comments)
  • Non-technical users
  • When you need occasional scraping capability
  • Situations where paying for a tool is acceptable

Pros:

  • No coding required
  • Point-and-click interface
  • Handles authentication
  • Often includes CSV export

Cons:

  • Costs $50-150/month for robust tools
  • Still violates LinkedIn ToS
  • Limited customization
  • May stop working if LinkedIn changes their site structure

Method 3: Python + Selenium/BeautifulSoup

What it involves: Writing custom Python scripts using libraries like Selenium (browser automation) or BeautifulSoup (HTML parsing).

When to use it:

  • Large-scale projects (1,000+ comments)
  • Need for customization
  • Technical users comfortable with code
  • Ongoing data collection needs

Pros:

  • Complete control over scraping logic
  • Free (just your time investment)
  • Can integrate with data analysis pipelines
  • Customizable for specific use cases

Cons:

  • Requires programming knowledge
  • Time-consuming to set up initially
  • Breaks when LinkedIn changes their HTML structure
  • Still violates LinkedIn ToS
  • Risk of IP bans if not careful

We'll cover this method in detail in the next section.


Method 4: Third-Party Scraping Services

What it involves: Hiring companies that provide "LinkedIn data as a service" or freelancers who scrape on your behalf.

When to use it:

  • You need large datasets but lack technical skills
  • One-time project where outsourcing makes sense
  • Want to distance your personal LinkedIn account from scraping

Pros:

  • No technical work required
  • Vendors handle infrastructure
  • Typically deliver cleaned, structured data

Cons:

  • Expensive ($500-5,000+ depending on scale)
  • Quality varies significantly
  • You're still responsible for how you use the data
  • Legal liability unclear
  • No guarantee of data freshness or accuracy

Comparison table:

Method Cost Skill Level Scale Risk Level Time Investment
Manual Copy-Paste Free None 50-100 comments Very Low Very High
Browser Tools $50-150/mo Low 100-500 comments Medium Low
Python Scripts Free High 1,000+ comments Medium-High High (setup), Low (ongoing)
Scraping Services $500-5,000 None Unlimited High Very Low

How to Scrape LinkedIn Comments with Python (Tutorial)

This section provides a technical walkthrough for developers comfortable with Python.

Important disclaimer: This tutorial is for educational purposes. Scraping LinkedIn violates their Terms of Service. Use this knowledge responsibly and understand the risks.

Prerequisites: Installing Selenium & Chrome Driver

What you'll need:

  • Python 3.8 or higher
  • Chrome browser
  • Basic familiarity with Python and command line

Installation steps:

# Install required Python packages
pip install selenium beautifulsoup4 pandas

# Install webdriver-manager for automatic ChromeDriver management
pip install webdriver-manager

Why Selenium? LinkedIn's content is dynamically loaded via JavaScript. Tools like BeautifulSoup alone can't access this content. Selenium automates a real browser, allowing access to JavaScript-rendered content.


Step 1: Authenticate with LinkedIn

Selenium needs to authenticate as you to access LinkedIn content.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
import time

# Set up Chrome driver
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

# Navigate to LinkedIn login
driver.get('https://www.linkedin.com/login')

# Wait for manual login
input("Log in to LinkedIn in the browser window, then press Enter here...")

print("Logged in successfully!")

Important: This script requires you to manually log in. Storing credentials in code is insecure and may trigger LinkedIn's security alerts.


Step 2: Navigate to Target Post

# URL of the LinkedIn post you want to scrape comments from
post_url = "https://www.linkedin.com/posts/username_activity-123456789"

driver.get(post_url)
time.sleep(3)  # Wait for page to load

Step 3: Scroll to Load All Comments

LinkedIn uses "lazy loading" - comments appear as you scroll. You need to scroll repeatedly to load all comments.

from selenium.webdriver.common.keys import Keys

def scroll_to_load_comments(driver, max_scrolls=20):
    """Scroll down repeatedly to load all comments"""

    body = driver.find_element(By.TAG_NAME, 'body')

    for i in range(max_scrolls):
        # Scroll down
        body.send_keys(Keys.PAGE_DOWN)
        time.sleep(2)  # Wait for content to load

        # Try to click "Show more comments" button if it exists
        try:
            show_more_button = driver.find_element(
                By.XPATH,
                "//button[contains(@class, 'comments-comments-list__show-all-link')]"
            )
            show_more_button.click()
            time.sleep(2)
        except:
            pass  # Button doesn't exist or already clicked

    print(f"Finished scrolling after {max_scrolls} scrolls")

# Execute scrolling
scroll_to_load_comments(driver)

Note: max_scrolls=20 limits scrolling to prevent infinite loops. Adjust based on expected comment volume.


Step 4: Extract Comment Text, Author, Timestamp

from bs4 import BeautifulSoup

def extract_comments(driver):
    """Extract all comment data from the loaded page"""

    # Get page HTML
    html = driver.page_source
    soup = BeautifulSoup(html, 'html.parser')

    # Find all comment containers
    # Note: LinkedIn's HTML structure changes frequently
    # These selectors may need updating
    comment_elements = soup.find_all('article', class_='comments-comment-item')

    comments_data = []

    for comment in comment_elements:
        try:
            # Extract author name
            author = comment.find('span', class_='comments-post-meta__name-text').get_text(strip=True)

            # Extract comment text
            comment_text = comment.find('span', class_='comments-comment-item__main-content').get_text(strip=True)

            # Extract timestamp
            timestamp = comment.find('time', class_='comments-comment-item__timestamp')
            timestamp_text = timestamp.get_text(strip=True) if timestamp else "N/A"

            # Extract like count (if available)
            like_element = comment.find('button', {'aria-label': lambda x: x and 'Like' in x})
            likes = 0
            if like_element:
                like_text = like_element.get_text(strip=True)
                # Parse like count from text like "5 Likes"
                likes = int(''.join(filter(str.isdigit, like_text))) if any(char.isdigit() for char in like_text) else 0

            comments_data.append({
                'author': author,
                'comment': comment_text,
                'timestamp': timestamp_text,
                'likes': likes
            })

        except Exception as e:
            print(f"Error parsing comment: {e}")
            continue

    return comments_data

# Extract all comments
comments = extract_comments(driver)
print(f"Extracted {len(comments)} comments")

Important: LinkedIn frequently changes their HTML class names and structure to discourage scraping. This code will likely need adjustments when LinkedIn updates their interface.


Step 5: Export to CSV/JSON

import pandas as pd
import json

def export_comments(comments_data, format='csv'):
    """Export comments to CSV or JSON"""

    if format == 'csv':
        df = pd.DataFrame(comments_data)
        df.to_csv('linkedin_comments.csv', index=False)
        print("Comments exported to linkedin_comments.csv")

    elif format == 'json':
        with open('linkedin_comments.json', 'w', encoding='utf-8') as f:
            json.dump(comments_data, f, indent=2, ensure_ascii=False)
        print("Comments exported to linkedin_comments.json")

# Export in both formats
export_comments(comments, format='csv')
export_comments(comments, format='json')

# Close the browser
driver.quit()

Full Python Code Example

Here's the complete script:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import pandas as pd
import time

def setup_driver():
    """Initialize Chrome driver"""
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    return driver

def login_to_linkedin(driver):
    """Navigate to LinkedIn and wait for manual login"""
    driver.get('https://www.linkedin.com/login')
    input("Log in to LinkedIn, then press Enter...")

def scroll_to_load_comments(driver, max_scrolls=20):
    """Scroll to load all comments"""
    body = driver.find_element(By.TAG_NAME, 'body')

    for i in range(max_scrolls):
        body.send_keys(Keys.PAGE_DOWN)
        time.sleep(2)

        try:
            show_more = driver.find_element(
                By.XPATH,
                "//button[contains(text(), 'more comment')]"
            )
            show_more.click()
            time.sleep(2)
        except:
            pass

def extract_comments(driver):
    """Extract comment data"""
    html = driver.page_source
    soup = BeautifulSoup(html, 'html.parser')

    comments = []
    comment_elements = soup.find_all('article', class_='comments-comment-item')

    for comment in comment_elements:
        try:
            author = comment.find('span', class_='comments-post-meta__name-text').get_text(strip=True)
            text = comment.find('span', class_='comments-comment-item__main-content').get_text(strip=True)
            timestamp = comment.find('time')
            timestamp_text = timestamp.get_text(strip=True) if timestamp else "N/A"

            comments.append({
                'author': author,
                'comment': text,
                'timestamp': timestamp_text
            })
        except:
            continue

    return comments

def main():
    """Main scraping workflow"""
    # Setup
    driver = setup_driver()
    login_to_linkedin(driver)

    # Navigate to post
    post_url = input("Enter LinkedIn post URL: ")
    driver.get(post_url)
    time.sleep(3)

    # Load and extract comments
    print("Loading comments...")
    scroll_to_load_comments(driver)

    print("Extracting comments...")
    comments = extract_comments(driver)

    # Export
    df = pd.DataFrame(comments)
    df.to_csv('linkedin_comments.csv', index=False)
    print(f"Exported {len(comments)} comments to linkedin_comments.csv")

    driver.quit()

if __name__ == "__main__":
    main()

To run this script:

  1. Save as scrape_linkedin_comments.py
  2. Run: python scrape_linkedin_comments.py
  3. Log in when prompted
  4. Enter the LinkedIn post URL
  5. Wait for extraction to complete

Analyzing Scraped LinkedIn Comment Data

Once you have comment data, here's how to extract insights.

Sentiment Analysis with NLP Tools

from textblob import TextBlob
import pandas as pd

def analyze_sentiment(comment_text):
    """Analyze sentiment of a comment"""
    blob = TextBlob(comment_text)
    polarity = blob.sentiment.polarity  # -1 (negative) to 1 (positive)

    if polarity > 0.1:
        return 'Positive'
    elif polarity < -0.1:
        return 'Negative'
    else:
        return 'Neutral'

# Load your scraped comments
df = pd.read_csv('linkedin_comments.csv')

# Add sentiment column
df['sentiment'] = df['comment'].apply(analyze_sentiment)

# Analyze distribution
print(df['sentiment'].value_counts())

Example output:

Positive    45
Neutral     32
Negative    8

This reveals overall sentiment on a post or across multiple posts.


Identifying Top Commenters & Influencers

# Find most active commenters
top_commenters = df['author'].value_counts().head(10)
print("Most Active Commenters:")
print(top_commenters)

# Find comments with most engagement (likes)
top_comments = df.nlargest(10, 'likes')[['author', 'comment', 'likes']]
print("\nMost Liked Comments:")
print(top_comments)

This identifies micro-influencers worth connecting with or partnering with.


Engagement Pattern Analysis

# Analyze comment length vs engagement
df['word_count'] = df['comment'].apply(lambda x: len(x.split()))

# Correlation between length and likes
correlation = df[['word_count', 'likes']].corr()
print("Correlation between comment length and likes:")
print(correlation)

# Average engagement by comment length bracket
df['length_category'] = pd.cut(df['word_count'], bins=[0, 20, 50, 100, 500], labels=['Short', 'Medium', 'Long', 'Very Long'])
engagement_by_length = df.groupby('length_category')['likes'].mean()
print("\nAverage likes by comment length:")
print(engagement_by_length)

This reveals whether your audience prefers concise comments or detailed responses.


Keyword & Topic Clustering

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans

# Extract key topics using TF-IDF
vectorizer = TfidfVectorizer(max_features=20, stop_words='english')
tfidf_matrix = vectorizer.fit_transform(df['comment'])

# Get top keywords
feature_names = vectorizer.get_feature_names_out()
print("Top Keywords in Comments:")
print(feature_names)

# Cluster comments into topics
kmeans = KMeans(n_clusters=5, random_state=42)
df['topic_cluster'] = kmeans.fit_predict(tfidf_matrix)

# Analyze clusters
for cluster_id in range(5):
    cluster_comments = df[df['topic_cluster'] == cluster_id]
    print(f"\nCluster {cluster_id} ({len(cluster_comments)} comments):")
    print(cluster_comments['comment'].head(3))

This automatically groups comments by theme, revealing what topics dominate the conversation.


Alternatives to Scraping: LinkedIn API

Before scraping, explore official alternatives.

Official LinkedIn API capabilities:

  • LinkedIn Marketing API: Access campaign performance, sponsored content analytics
  • LinkedIn Consumer API: Limited access for member profile data (with user authorization)
  • LinkedIn Lead Gen Forms API: Access leads from LinkedIn ad campaigns

Limitations for comment data:

  • No public API for extracting comments from posts
  • Comment access requires special partnership agreements
  • Most API access is restricted to LinkedIn's advertising customers

When API access is required:

  • Building a product that integrates with LinkedIn
  • Running ongoing data collection for commercial purposes
  • Need for reliable, long-term data access

How to apply for API access:

  1. Create a LinkedIn App via LinkedIn Developers
  2. Submit detailed use case explanation
  3. Wait for approval (can take weeks to months)
  4. Most applications are rejected unless you're an established company with clear value proposition

Reality: For comment analysis, LinkedIn won't grant API access in most cases. This forces the choice between manual collection, scraping (with risks), or alternative approaches.


Best Practices & Safety Tips

If you proceed with scraping despite the risks, follow these guidelines to minimize issues.

Use rate limiting to avoid detection:

import random
import time

def polite_scrape(urls):
    """Scrape with delays to avoid detection"""
    for url in urls:
        # Random delay between 3-7 seconds
        delay = random.uniform(3, 7)
        time.sleep(delay)

        # Your scraping logic here
        scrape_post(url)

LinkedIn monitors request frequency. Human-like delays (3-10 seconds between actions) reduce detection risk.

Respect user privacy & GDPR compliance:

  • Don't collect personally identifiable information (PII) beyond what's publicly visible
  • If scraping from EU users, understand GDPR implications
  • Don't sell or share scraped data containing personal information
  • Anonymize data when possible for research outputs

Don't sell or misuse scraped data:

  • Using scraped comments for market research (internal use) carries different ethical weight than selling scraped data
  • Spamming commenters with cold outreach is both unethical and ineffective

Consider using official APIs when possible:

  • Always check if an official API meets your needs before scraping
  • APIs provide reliable, legal data access
  • Scraping should be a last resort, not the default approach

Warning signs you're scraping too aggressively:

  • LinkedIn sends security alerts
  • Your account gets temporarily restricted
  • Comments stop loading (rate limited)
  • LinkedIn adds CAPTCHA challenges

If you see any of these, stop immediately and wait 24-48 hours before resuming at lower volume.


Should You Scrape LinkedIn Comments? (Our Verdict)

When scraping makes sense:

  • One-time academic research projects with small datasets
  • Personal learning and experimentation
  • Competitive research where alternative data sources don't exist
  • Situations where you're willing to accept potential account restrictions

When to avoid scraping:

  • Commercial products or services depending on scraped data
  • Ongoing, large-scale data collection
  • Situations where your LinkedIn account is valuable and account restrictions would be costly
  • Any use case where legal liability is a concern

Better alternatives to consider:

1. Manual collection for small projects: If you need data from 5-10 posts, manual copy-paste is safer and nearly as fast as scripting setup.

2. Survey your audience instead: Want to understand pain points? Ask directly via LinkedIn polls or surveys rather than inferring from scraped comments.

3. Use LigoAI for engagement instead of extraction: If your goal is understanding audience sentiment to improve your engagement strategy, using LigoAI to engage thoughtfully generates similar insights through direct interaction - without the legal and ethical risks of scraping.

4. LinkedIn Sales Navigator: For lead generation, Sales Navigator provides filtered prospect lists legally and reliably.

5. Social listening tools: Platforms like Brandwatch or Sprout Social offer LinkedIn monitoring capabilities through official partnerships.

The reality of scraping in 2025: LinkedIn's anti-scraping measures are more sophisticated than ever. Detection algorithms identify bot-like behavior. Legal precedents remain unclear. The risks have increased while alternatives have improved.

For most professionals, the risk-reward calculation doesn't favor scraping. The exceptions are researchers with one-time projects and technical users willing to accept potential account consequences.


Start Engaging Strategically Instead

If your goal is understanding LinkedIn conversations to improve your own engagement, there's a safer approach: participate actively using AI assistance.

Why engagement beats extraction:

  • You generate insights while building relationships
  • Zero legal or ethical concerns
  • LinkedIn rewards active participants with increased visibility
  • More sustainable long-term strategy

Using LigoAI to understand your audience:

Instead of scraping comments to analyze sentiment, use LigoAI to engage with posts in your niche. As you engage, you'll naturally understand:

  • What topics resonate with your audience
  • Which questions come up repeatedly
  • What language and tone works best
  • Who the active participants are

You gather the same insights through participation rather than observation. And you build your professional brand in the process.

The bottom line: Scraping is technically possible but legally risky and ethically questionable. For most use cases, better alternatives exist that don't jeopardize your LinkedIn account or create legal exposure.

Choose the approach that aligns with your values and risk tolerance. When in doubt, engage rather than extract.


Junaid Khalid

About the Author

Junaid Khalid

I have helped 50,000+ professionals with building a personal brand on LinkedIn through my content and products, and directly consulted dozens of businesses in building a Founder Brand and Employee Advocacy Program to grow their business via LinkedIn