Deliverables
- Dynamic page collection with Selenium and Puppeteer
- Collection status, failure reasons, and retry queue management
- Separation of large file storage and metadata
- Operational logs, collection reports, and admin review flows
Service
Build data collection pipelines for dynamic pages, large files, and long-running jobs.
Beyond simple scraping, collection systems need retries, resume logic, status tracking, storage separation, and monitoring.
Delivered
Evidence Collection Crawler
A high-volume data collection pipeline that gathered about 19TB of video data over two months for AI illegal-content classification training.
Proof
Collected about 19TB of video data over two months
Live
Tech Collection
A live AI curation blog aggregator that crawls technical blogs, summarizes and classifies posts with ChatGPT, and improved TPS by 15x through caching.
Proof
Built an automated crawl, summarize, classify, and serve pipeline

A public-service renewal project that delivered telecom order integration APIs, order-status Kakao AlimTalk notifications, and a 32x Oracle query improvement.
Proof
Delivered real-time order integration APIs for external telecom operators