李在明
,
Experience
2 Year
Job Type
On the job
Educational
College
Location
China, Hangzhou
Chat
Personal Advantage
Job Preference
No Preference yet
Experience
Data acquisition engineer
2024-09 - 2025-04
杭州大坝科技有限公司 运营部
python python爬虫 js逆向 AES4国密 脱壳技术
Content
O2O本地生活服务平台"来咯乐卟"项目 担任角色 | 数据爬虫开发工程师 核心技术工作: 本地生活数据采集系统开发 基于Python搭建自动化爬取框架,结合Selenium处理动态渲染页面,Requests处理API接口数据抓取 使用Chrome开发者工具逆向解析关键加密参数,通过请求头(User-Agent/Cookie)模拟规避基础反爬机制 采用AsyncIO异步协程技术提升采集效率,设计合理的延时策略保证系统稳定运行 维护包含3万+商家的基础数据库,覆盖餐饮、娱乐等本地生活核心品类 运营数据自动化处理系统 通过Fiddler抓包分析小程序管理后台通信协议,还原数据上传接口逻辑 开发Python自动化脚本批量处理商家图文资料,使用Requests模拟表单提交,将人工上传效率提升5倍 设计异常重试机制处理网络波动,通过日志系统监控脚本执行状态 商业数据决策支持 运用PowerBI搭建数据看板,设计商家质量评估体系(客流量、评分、品类热度等维度) 开发数据清洗规则处理原始数据中的重复、错误信息,提升分析结果可信度 定期输出TOP100优质商家名单,为运营部门的地推计划提供数据支持 • 具备微博广告投放实战经验,熟悉信息流广告/搜索广告/开屏广告等主流投放渠道 • 熟练使用Python爬虫技术(Selenium+Scrapy),可快速抓取微博热搜关联素材及竞品广告创意,日均处理数据量2000+
Data acquisition engineer
2025-05 - 2025-10
旅脉 数据采集
python js逆向 mysql
Content
设计并开发了一个基于Python的模块化数据爬取、清洗与存储系统,专门用于从多个招聘网站(如Boss直聘、阿里巴巴、腾讯、字节跳动等40+平台)获取职位和公司信息。系统日均处理数据量超过10万条,数据准确率达到98%以上。 核心技术与职责 1. 多平台爬虫开发: 使用DrissionPage和requests库开发了针对40+招聘网站的定制化爬虫 实现了反爬虫对抗机制,包括动态代理IP切换、浏览器指纹模拟、请求头随机化等技术 针对JavaScript加密数据,使用execjs和Node.js环境实现了AES-CBC、RC4、国密SM4等加密算法的解密功能 2. 数据处理与清洗: 利用pandas和openpyxl处理Excel配置文件,实现不同来源数据的字段映射和标准化 开发了基于模糊匹配(fuzzywuzzy)的城市、行业、职位智能识别系统 构建了数据去重机制,防止重复记录插入,提升数据质量 3. AI增强功能: 集成基于BERT的职位类型预测模型,自动识别和分类职位类型 使用jieba分词和机器学习算法开发了专业匹配模型,提高数据处理准确率 4. 系统架构与性能优化: 采用模块化设计,将爬虫、清洗、存储等功能解耦,便于维护和扩展 实现多线程/异步处理机制,使用ThreadPoolExecutor和asyncio提升爬取效率 配置MySQL连接池(dbutils.PooledDB),优化数据库访问性能 设计批量数据处理和存储方案,大幅提升系统吞吐量 5. 技术栈: 编程语言: Python 3.x, JavaScript (Node.js) 核心框架: DrissionPage, requests, BeautifulSoup4, pandas 数据库: MySQL AI框架: transformers (Hugging Face), PyTorch 其他技术: crypto-js, execjs, fuzzywuzzy, jieba 项目成果: 成功构建了稳定高效的数据采集系统,支持日均10万+数据处理 实现了98%以上的数据准确率和99%以上的系统可用性 系统可扩展性强,新增网站支持,大大提升爬虫编写效率 为招聘数据分析和人才挖掘提供了高质量的数据基础
Education
No Education yet
Popular members
Syed Hasnain Ali
Mr. Syed Hasnain Ali has passed B.E Telecommunications from SUKKUR-IBA (2009-2013). He had passed the NATIONAL ICT UNDERGRADUATE SCHOLARSHIP test on the behalf of which he was selected to continue Telecommunication Engineering Program in Sukkur-IBA. Mr. Hasnain at first (Jan 2014), had joined teaching profession where he was conducting ICT (Introduction to Information and Communication Technology) and Mathematics subjects. In July 2015, he had joined a Telecommunication Company where he had to work in LDI NOC (Long Distance International Network Operations Center) Department. There he had to manage, monitor and control Telecom Networks; Monitor, Manage and Control VoIP (SIP) and TDM (SS7) calls. For all this, he had to manage Cisco Switches, Juniper Routers and hp prolient G6, G5 and G3 Servers for International and Local calls' Traffic Termination. In Aug 2018, he had Joined Virtual University of Pakistan (a Federal Government University) where Online Education is being given to students. He had joined Virtual University Karachi Campus and there, appointed as an Assistant Network Administrator where he had to carry on his activities regarding System and Network Administration He was also appointed as an IT & Data Associate in Indus Health Network. IT Support Engineer in Baywest Pvt LTD and right now he is working as an Assistant Manager Support Services - IT in The Citizens Foundation. Syed Hasnain Ali is right now in Dubai on VISIT VISA till 4th Dec_2022. He may be easily reached via email; syedhasnainalijaffry@gmail.com & Cellular Number: +971522254825Mr. Syed Hasnain Ali has passed B.E Telecommunications from SUKKUR-IBA (2009-2013). He had passed the NATIONAL ICT UNDERGRADUATE SCHOLARSHIP test on the behalf of which he was selected to continue Telecommunication Engineering Program in Sukkur-IBA. Mr. Hasnain at first (Jan 2014), had joined teaching profession where he was conducting ICT (Introduction to Information and Communication Technology) and Mathematics subjects. In July 2015, he had joined a Telecommunication Company where he had to work in LDI NOC (Long Distance International Network Operations Center) Department. There he had to manage, monitor and control Telecom Networks; Monitor, Manage and Control VoIP (SIP) and TDM (SS7) calls. For all this, he had to manage Cisco Switches, Juniper Routers and hp prolient G6, G5 and G3 Servers for International and Local calls' Traffic Termination. In Aug 2018, he had Joined Virtual University of Pakistan (a Federal Government University) where Online Education is being given to students. He had joined Virtual University Karachi Campus and there, appointed as an Assistant Network Administrator where he had to carry on his activities regarding System and Network Administration He was also appointed as an IT & Data Associate in Indus Health Network. IT Support Engineer in Baywest Pvt LTD and right now he is working as an Assistant Manager Support Services - IT in The Citizens Foundation. Syed Hasnain Ali is right now in Dubai on VISIT VISA till 4th Dec_2022. He may be easily reached via email; syedhasnainalijaffry@gmail.com & Cellular Number: +971522254825
View