TikTok Post Data Extractor

Extract TikTok post data at scale—captions, hashtags, video URLs, engagement metrics, and author insights—in a clean, analysis-ready format. Built for teams that need dependable TikTok post data extraction for trend tracking, influencer research, and performance reporting.

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for tiktok-post-data-extractor you've just found your team — Let’s Chat. 👆👆

Introduction

TikTok Post Data Extractor collects detailed post-level information from TikTok profiles and returns structured data you can plug into dashboards, reports, or ML pipelines. It solves the headache of manually compiling post metrics and metadata, especially when you need consistent fields across many creators or campaigns. This is for analysts, growth marketers, researchers, and developers who need reliable TikTok post data extraction without busywork.

Analytics & Monitoring Workflow

Accepts one or more TikTok profile @handles as input for batch collection
Extracts post text, hashtags, timestamps, media URLs, and engagement counters
Includes author profile fields and aggregated author statistics when available
Designed for repeatable monitoring runs to compare performance over time
Outputs JSON that’s easy to export to CSV/Excel or load into databases

Features

Feature	Description
Batch profile processing	Collect post data from multiple TikTok @handles in one run for faster analysis.
Post metadata extraction	Captures post ID, publish time, description text, and language signals for downstream analytics.
Hashtag & mention parsing	Extracts hashtags and structured text ranges so you can analyze trends and topics accurately.
Engagement metrics	Retrieves likes, views, comments, shares, and saves (when available) for performance reporting.
Media URL collection	Provides video play/download URLs and cover images to support content review and archiving workflows.
Author enrichment	Adds author identity fields plus authorStats summaries for influencer evaluation.
Export-ready output	Produces structured JSON that can be exported into CSV/Excel or used directly in BI pipelines.
Resilient crawling	Includes retry logic and safe request pacing patterns to improve stability across runs.

What Data This Scraper Extracts

Field Name	Field Description
id	Unique post identifier used for deduplication and joins.
desc	Post caption text as displayed on the post.
createTime	Publish time as a UNIX timestamp (seconds) for time-series analysis.
contents	Parsed caption segments and structured hashtag/mention ranges (when available).
textExtra	Structured entities extracted from the caption (e.g., hashtags) with start/end offsets.
challenges	Detected hashtags/topics linked to the post (title, id, and related media fields).
stats.playCount	View count for the post (when available).
stats.diggCount	Like count for the post (when available).
stats.commentCount	Comment count for the post (when available).
stats.shareCount	Share count for the post (when available).
stats.collectCount	Save/collection count for the post (when available).
video.playAddr	Primary playback URL(s) and video identifiers for the post media.
video.downloadAddr	Download URL (if exposed) for archiving or offline review.
video.cover	Cover image URL for quick previews and thumbnails.
video.duration	Video length in seconds for content profiling.
music.title	Audio title attached to the post (e.g., original sound).
music.authorName	Audio author/creator name as shown on TikTok.
author.uniqueId	Creator username / handle for attribution and joins.
author.nickname	Display name of the creator.
author.signature	Creator bio snippet (when available).
author.verified	Verification status flag (when available).
authorStats.followerCount	Total followers for the creator at collection time.
authorStats.heartCount	Total likes/heart count shown on profile (when available).
authorStats.videoCount	Total videos on the creator profile (when available).
scrapedAt	Collection timestamp added by the project for auditing and freshness.

Example Output

[
      {
        "id": "7526156529721003286",
        "desc": "Can you answer all the questions ? #fyp #foru #fypviralシ #videoviral #challenge #brainteaser",
        "createTime": 1752319876,
        "textLanguage": "en",
        "author": {
              "uniqueId": "moona_writes3",
              "nickname": "The Storyteller's Corner",
              "verified": false,
              "signature": "• Follow and like my page 💐💐"
        },
        "authorStats": {
              "followerCount": 17200,
              "heartCount": 231700,
              "videoCount": 38
        },
        "stats": {
              "playCount": 409,
              "diggCount": 6,
              "commentCount": 0,
              "shareCount": 0,
              "collectCount": 1
        },
        "challenges": [
              { "id": "229207", "title": "fyp" },
              { "id": "108264", "title": "foru" },
              { "id": "1666593428398085", "title": "fypviralシ" }
        ],
        "video": {
              "duration": 27,
              "cover": "https://p16-.../origin.image",
              "playAddr": "https://v16-.../video.mp4"
        },
        "music": {
              "title": "original sound",
              "authorName": "The Storyteller's Corner"
        },
        "scrapedAt": "2025-12-18T00:00:00.000Z"
      }
]

Directory Structure Tree

tiktok-post-data-extractor (IMPORTANT :!! always keep this name as the name of the apify actor !!! Tiktok Post Data Extractor )/
├── src/
│   ├── main.py
│   ├── runner.py
│   ├── pipelines/
│   │   ├── profile_queue.py
│   │   ├── post_collector.py
│   │   └── transforms.py
│   ├── extractors/
│   │   ├── tiktok_profile.py
│   │   ├── tiktok_posts.py
│   │   └── parsing_text_extra.py
│   ├── http/
│   │   ├── client.py
│   │   ├── retries.py
│   │   └── headers.py
│   ├── outputs/
│   │   ├── dataset_writer.py
│   │   ├── exporters.py
│   │   └── schema_normalizer.py
│   ├── config/
│   │   ├── settings.py
│   │   └── logging.yml
│   └── utils/
│       ├── time_utils.py
│       ├── validators.py
│       └── fingerprints.py
├── data/
│   ├── inputs.sample.json
│   └── sample.output.json
├── tests/
│   ├── test_parsing_text_extra.py
│   ├── test_schema_normalizer.py
│   └── test_post_transforms.py
├── .env.example
├── .gitignore
├── requirements.txt
├── pyproject.toml
├── LICENSE
└── README.md

Use Cases

Marketing analysts use it to benchmark TikTok post performance so they can spot winning content patterns and iterate faster.
Influencer managers use it to evaluate creators using engagement + follower context so they can shortlist partners with measurable ROI.
Trend researchers use it to track hashtags and viral formats over time so they can predict emerging topics earlier.
News & media teams use it to monitor public-facing TikTok posts around events so they can capture sentiment shifts quickly.
Data scientists use it to build labeled datasets from post text + metrics so they can train models for performance prediction or topic clustering.

FAQs

What inputs are supported? Provide one or more TikTok profile @handles. The project batches profiles, then collects recent posts for each handle and normalizes them into a consistent schema.

How many posts does it collect per profile? By default it targets a recent window (commonly ~30+ posts per profile depending on availability). You can adjust limits in configuration to balance depth vs. speed.

Why do some fields show up as missing or zero? Some metrics and media URLs depend on visibility, region, A/B delivery, or content restrictions. The extractor keeps a stable schema and gracefully leaves fields empty when TikTok doesn’t expose them.

How do I reduce blocking and improve stability? Use reliable proxies, keep concurrency conservative, and enable retries with backoff. If you run frequent monitoring, schedule runs with spacing and store previous post IDs to avoid re-collecting the same items.

Performance Benchmarks and Results

Primary Metric: ~25–45 profiles/hour at ~30 posts/profile under conservative concurrency, depending on network and proxy quality.

Reliability Metric: 92–97% successful profile runs across mixed account sizes when retries + pacing are enabled.

Efficiency Metric: Typical memory footprint stays under ~250–450 MB for mid-size batches by streaming outputs and avoiding full in-memory media hydration.

Quality Metric: 95%+ field completeness for core analytics fields (post id, caption, hashtags, createTime, views/likes/comments) on public profiles, with optional fields varying by post visibility and region.

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TikTok Post Data Extractor

Introduction

Analytics & Monitoring Workflow

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

foehlyaveniss3q/tiktok-post-data-extractor

Folders and files

Latest commit

History

Repository files navigation

TikTok Post Data Extractor

Introduction

Analytics & Monitoring Workflow

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages