Extract TikTok post data at scale—captions, hashtags, video URLs, engagement metrics, and author insights—in a clean, analysis-ready format. Built for teams that need dependable TikTok post data extraction for trend tracking, influencer research, and performance reporting.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for tiktok-post-data-extractor you've just found your team — Let’s Chat. 👆👆
TikTok Post Data Extractor collects detailed post-level information from TikTok profiles and returns structured data you can plug into dashboards, reports, or ML pipelines. It solves the headache of manually compiling post metrics and metadata, especially when you need consistent fields across many creators or campaigns. This is for analysts, growth marketers, researchers, and developers who need reliable TikTok post data extraction without busywork.
- Accepts one or more TikTok profile @handles as input for batch collection
- Extracts post text, hashtags, timestamps, media URLs, and engagement counters
- Includes author profile fields and aggregated author statistics when available
- Designed for repeatable monitoring runs to compare performance over time
- Outputs JSON that’s easy to export to CSV/Excel or load into databases
| Feature | Description |
|---|---|
| Batch profile processing | Collect post data from multiple TikTok @handles in one run for faster analysis. |
| Post metadata extraction | Captures post ID, publish time, description text, and language signals for downstream analytics. |
| Hashtag & mention parsing | Extracts hashtags and structured text ranges so you can analyze trends and topics accurately. |
| Engagement metrics | Retrieves likes, views, comments, shares, and saves (when available) for performance reporting. |
| Media URL collection | Provides video play/download URLs and cover images to support content review and archiving workflows. |
| Author enrichment | Adds author identity fields plus authorStats summaries for influencer evaluation. |
| Export-ready output | Produces structured JSON that can be exported into CSV/Excel or used directly in BI pipelines. |
| Resilient crawling | Includes retry logic and safe request pacing patterns to improve stability across runs. |
| Field Name | Field Description |
|---|---|
| id | Unique post identifier used for deduplication and joins. |
| desc | Post caption text as displayed on the post. |
| createTime | Publish time as a UNIX timestamp (seconds) for time-series analysis. |
| contents | Parsed caption segments and structured hashtag/mention ranges (when available). |
| textExtra | Structured entities extracted from the caption (e.g., hashtags) with start/end offsets. |
| challenges | Detected hashtags/topics linked to the post (title, id, and related media fields). |
| stats.playCount | View count for the post (when available). |
| stats.diggCount | Like count for the post (when available). |
| stats.commentCount | Comment count for the post (when available). |
| stats.shareCount | Share count for the post (when available). |
| stats.collectCount | Save/collection count for the post (when available). |
| video.playAddr | Primary playback URL(s) and video identifiers for the post media. |
| video.downloadAddr | Download URL (if exposed) for archiving or offline review. |
| video.cover | Cover image URL for quick previews and thumbnails. |
| video.duration | Video length in seconds for content profiling. |
| music.title | Audio title attached to the post (e.g., original sound). |
| music.authorName | Audio author/creator name as shown on TikTok. |
| author.uniqueId | Creator username / handle for attribution and joins. |
| author.nickname | Display name of the creator. |
| author.signature | Creator bio snippet (when available). |
| author.verified | Verification status flag (when available). |
| authorStats.followerCount | Total followers for the creator at collection time. |
| authorStats.heartCount | Total likes/heart count shown on profile (when available). |
| authorStats.videoCount | Total videos on the creator profile (when available). |
| scrapedAt | Collection timestamp added by the project for auditing and freshness. |
[
{
"id": "7526156529721003286",
"desc": "Can you answer all the questions ? #fyp #foru #fypviralシ #videoviral #challenge #brainteaser",
"createTime": 1752319876,
"textLanguage": "en",
"author": {
"uniqueId": "moona_writes3",
"nickname": "The Storyteller's Corner",
"verified": false,
"signature": "• Follow and like my page 💐💐"
},
"authorStats": {
"followerCount": 17200,
"heartCount": 231700,
"videoCount": 38
},
"stats": {
"playCount": 409,
"diggCount": 6,
"commentCount": 0,
"shareCount": 0,
"collectCount": 1
},
"challenges": [
{ "id": "229207", "title": "fyp" },
{ "id": "108264", "title": "foru" },
{ "id": "1666593428398085", "title": "fypviralシ" }
],
"video": {
"duration": 27,
"cover": "https://p16-.../origin.image",
"playAddr": "https://v16-.../video.mp4"
},
"music": {
"title": "original sound",
"authorName": "The Storyteller's Corner"
},
"scrapedAt": "2025-12-18T00:00:00.000Z"
}
]
tiktok-post-data-extractor (IMPORTANT :!! always keep this name as the name of the apify actor !!! Tiktok Post Data Extractor )/
├── src/
│ ├── main.py
│ ├── runner.py
│ ├── pipelines/
│ │ ├── profile_queue.py
│ │ ├── post_collector.py
│ │ └── transforms.py
│ ├── extractors/
│ │ ├── tiktok_profile.py
│ │ ├── tiktok_posts.py
│ │ └── parsing_text_extra.py
│ ├── http/
│ │ ├── client.py
│ │ ├── retries.py
│ │ └── headers.py
│ ├── outputs/
│ │ ├── dataset_writer.py
│ │ ├── exporters.py
│ │ └── schema_normalizer.py
│ ├── config/
│ │ ├── settings.py
│ │ └── logging.yml
│ └── utils/
│ ├── time_utils.py
│ ├── validators.py
│ └── fingerprints.py
├── data/
│ ├── inputs.sample.json
│ └── sample.output.json
├── tests/
│ ├── test_parsing_text_extra.py
│ ├── test_schema_normalizer.py
│ └── test_post_transforms.py
├── .env.example
├── .gitignore
├── requirements.txt
├── pyproject.toml
├── LICENSE
└── README.md
- Marketing analysts use it to benchmark TikTok post performance so they can spot winning content patterns and iterate faster.
- Influencer managers use it to evaluate creators using engagement + follower context so they can shortlist partners with measurable ROI.
- Trend researchers use it to track hashtags and viral formats over time so they can predict emerging topics earlier.
- News & media teams use it to monitor public-facing TikTok posts around events so they can capture sentiment shifts quickly.
- Data scientists use it to build labeled datasets from post text + metrics so they can train models for performance prediction or topic clustering.
What inputs are supported? Provide one or more TikTok profile @handles. The project batches profiles, then collects recent posts for each handle and normalizes them into a consistent schema.
How many posts does it collect per profile? By default it targets a recent window (commonly ~30+ posts per profile depending on availability). You can adjust limits in configuration to balance depth vs. speed.
Why do some fields show up as missing or zero? Some metrics and media URLs depend on visibility, region, A/B delivery, or content restrictions. The extractor keeps a stable schema and gracefully leaves fields empty when TikTok doesn’t expose them.
How do I reduce blocking and improve stability? Use reliable proxies, keep concurrency conservative, and enable retries with backoff. If you run frequent monitoring, schedule runs with spacing and store previous post IDs to avoid re-collecting the same items.
Primary Metric: ~25–45 profiles/hour at ~30 posts/profile under conservative concurrency, depending on network and proxy quality.
Reliability Metric: 92–97% successful profile runs across mixed account sizes when retries + pacing are enabled.
Efficiency Metric: Typical memory footprint stays under ~250–450 MB for mid-size batches by streaming outputs and avoiding full in-memory media hydration.
Quality Metric: 95%+ field completeness for core analytics fields (post id, caption, hashtags, createTime, views/likes/comments) on public profiles, with optional fields varying by post visibility and region.
