Senior Data Scientist
Posted 50ds ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Senior Data Scientist at Protege optimizing large datasets for AI model training. Leading strategies for data quality, diversity, and collaboration with cross-functional teams.
Responsibilities:
- Design and apply statistical and machine learning methods to curate, filter, and enrich large-scale unstructured datasets
- Develop frameworks to assess data diversity, duplication, and informativeness. Design statistical approaches to de-risk training datasets
- Collaborate with model training teams to identify data bottlenecks and optimize dataset performance. Emphasis on ability to collaborate with large foundational models and smaller startups
- Provide leadership on data quality strategy and shape internal best practices
- Evaluate external datasets for integration, focusing on scalability, quality, and relevance to model performance. Help build data scorecards
- Contribute to research and development of tools that automate data preprocessing and validation
Requirements:
- PhD or equivalent Master's Degree + 4+ years industry experience in machine learning, economics, mathematics, engineering, computer science, statistics, or a related quantitative field
- Strong understanding of AI model training pipelines, including pre-processing and evaluation
- Experience working with large, unstructured datasets, especially text
- Background in statistical analysis, bias detection, and data validation
- Able to identify high-impact problems and drive independent solutions


















