People Matter

Data Analyst

Scipher Medicine

Scipher Medicine

IT, Data Science
Boston, MA, USA
Posted 6+ months ago

About Us: Scipher Medicine is a precision immunology company that uses AI and network science to match patients with the most effective therapies. Its flagship product, PrismRA, is a blood-based test that predicts whether a patient with RA is likely to respond to anti-TNF therapy. Scipher Medicine is also developing the Spectra platform, which can be used to discover and validate new drug targets for a variety of autoimmune diseases. The company has made significant progress in a short period of time, with its PrismRA test already being used by clinicians in the US and Europe, and its Spectra platform having the potential to revolutionize drug development for autoimmune diseases.

Job Description: We're seeking a proactive and highly skilled Data Analyst with a background in Computer Science, Data Engineering, or a related field, coupled with expertise in Google Cloud Platform (GCP) and Amazon Web Services (AWS). As our Data Analyst, you will be directly working with Data Scientists taking a hands-on role designing, developing, and maintaining our data ingestion and ETL pipelines. This role demands a solid understanding of data automation, ETL operations, and data orchestration, particularly within the GCP and AWS environments. Additionally, you'll support model development across various domains by ensuring data pipelines are optimized for performance and reliability.

Key Responsibilities:

  • Design, develop, and maintain data ingestion and ETL pipelines on the GCP and AWS platforms.
  • Collaborate closely with data scientists to translate Python-based data processing and analysis pipelines into scalable PySpark ETL processes suitable for deployment on cloud platforms such as AWS or GCP.
  • Use industry best practices for design, testing, and documentation of code and infrastructure.
  • Conduct thorough data analysis and preprocessing to ensure data quality and reliability.
  • Support the implementation of a wide range of models by ensuring data pipelines are optimized for performance.
  • Perform feature selection and engineering to enhance data processing workflows.
  • Utilize best practices in data engineering and ETL development to achieve desired outcomes.
  • Evaluate and benchmark data pipeline performance using appropriate metrics and validation techniques.
  • Stay updated with the latest advancements in data engineering and cloud technologies.

Requirements:

  • Bachelor's degree in Computer Science, Data Engineering, or a related quantitative field, minimum 2 years’ experience OR Masters degree in fields above or related quantitative fields.
  • 1+ years of experience in data engineering (including academic experience).
  • Proven experience in developing and deploying data pipelines.
  • Strong understanding of ETL operations and data automation.
  • Excellent programming skills in languages such as Python.
  • Hands-on experience with GCP services like Cloud Storage, BigQuery, and Dataflow, as well as AWS services like S3, Redshift, Glue, Docker, and Lambda.
  • Familiarity with DevOps practices and tools, including CI/CD pipelines.
  • Experience with version control systems like Git and platforms like GitHub.
  • Understanding of coding best practices, including code reviews, testing, and documentation.
  • Ability to work effectively in a collaborative, team-oriented environment.
  • Exceptional problem-solving and analytical abilities.
  • Excellent communication and presentation skills, with the ability to convey complex concepts to non-technical stakeholders.
  • Familiarity with clinical trial data and related concepts is a plus

Skills:

  • Data Engineering: ETL operations, data ingestion, data orchestration.
  • Proficiency in data modeling and database design.
  • Strong programming skills, especially in languages such as Python.
  • Knowledge of statistical analysis, data preprocessing, and feature engineering.
  • Strong analytical and problem-solving abilities.
  • Excellent communication and collaboration skills to work effectively with cross-functional teams.
  • Familiarity with cloud platforms and tools including GCP (Cloud Storage, BigQuery, Dataflow) and AWS (S3, Redshift, Glue, Docker, Lambda).
  • DevOps: CI/CD pipelines, version control systems (Git), and coding best practices.

Scipher is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability status, protected veteran status, or any other characteristic protected by law