AI Data Engineer
As a key member of our team, your primary responsibilities will revolve around building high-quality multimodal (image and text) datasets that can advance state-of-the-art generative AI models. In this role, you will be expected to:
  • Build high-quality multimodal (image and text) datasets through a variety of channels, including web data crawling from the internet, manual annotation from human, automatic data generation using deep learning models, etc.
  • Create UI and systems for data acquirement, preprocessing, labeling, filtering, viewing, and storage.
  • Collaborate closely with our AI researchers, actively participating in deep learning model training and evaluation.
  • Collaborate with a group of data annotators, set the requirements for data annotation and closely monitor the annotation quality.
  • A minimum of three years of professional experience in software development, data science, or a related field
  • Proficiency in Python, with hands-on experience in PyTorch
  • Proven expertise in building well-balanced and diverse datasets
  • Strong knowledge of developing a web crawler for data extraction from the internet, utilizing Python or other programming languages
  • Able to implement a simple UI for data visualization via Streamlit, Gradio or other available tools
  • Experience with using multimodal deep learning models, such as CLIP, BLIP, Stable Diffusion, etc
  • (Bonus) Passionate about building innovative AI models and products
Empowering everyone with best-in-class generative AI
