AI Data Engineer
engineering — Singapore
As a key member of our team, your primary responsibilities will revolve around building high-quality multimodal (image and text) datasets that can advance state-of-the-art generative AI models. In this role, you will be expected to:
Responsibilities
- Build high-quality multimodal (image and text) datasets through a variety of channels, including web data crawling from the internet, manual annotation from human, automatic data generation using deep learning models, etc.
- Create UI and systems for data acquirement, preprocessing, labeling, filtering, viewing, and storage.
- Collaborate closely with our AI researchers, actively participating in deep learning model training and evaluation.
- Collaborate with a group of data annotators, set the requirements for data annotation and closely monitor the annotation quality.
Qualifications
- A minimum of three years of professional experience in software development, data science, or a related field
- Proficiency in Python, with hands-on experience in PyTorch
- Proven expertise in building well-balanced and diverse datasets
- Strong knowledge of developing a web crawler for data extraction from the internet, utilizing Python or other programming languages
- Able to implement a simple UI for data visualization via Streamlit, Gradio or other available tools
- Experience with using multimodal deep learning models, such as CLIP, BLIP, Stable Diffusion, etc
- (Bonus) Passionate about building innovative AI models and products