img
OUR HIRING PROCESS:We will review your application against our job requirements. We do not employ machine learning technologies during this phase as we believe every human deserves attention from another human. We do not think machines can evaluate your application quite like our seasoned recruiting professionalsevery person is unique. We promise to give your candidacy a fair and detailed assessment.We may then invite you to submit a video interview for the review of the hiring manager. This video interview is often followed by a test or short project that allows us to determine whether you will be a good fit for the team.At this point, we will invite you to interview with our hiring manager and/or the interview team. Please note: We do not conduct interviews via text message, Telegram, etc. and we never hire anyone into our organization without having met you face-to-face (or via Zoom). You will be invited to come to a live meeting or Zoom, where you will meet our INFUSE team.From there on, its decision time! If you are still excited to join INFUSE and we like you as much, we will have a conversation about your offer. We do not make offers without giving you the opportunity to speak with us live.INFUSE is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Privacy PolicyINKHUB is ingesting 10 million raw PDFs to build the internets richest catalog of marketing-grade B2B content - tagged, summarized, and searchable by topic, company, or intent.Were looking for an applied ML engineer to own the semantic ingestion pipeline, from raw PDFs to tagged, summarized, and embedded assets.What Youll DoOwn the ETL pipeline from raw PDFs (S3-ingested) to structured resourcesFinalize our summarization + classification flow using open-source models with GPT-4o fallbackApply filtering logic (3 years old, 100 pages, etc) to enforce resource qualityMap each asset to the specific topic taxonomy (10+ per topic across ~9,000 topics)Generate dense embeddings using sentence-transformersLoad and query embeddings using Milvus or pgvectorImplement freshness logic to identify and index only new or updated content based on file diffing, crawl timestamp, or document hashBuild a QA/eval harness: format compliance, recall@5, drift monitoringExpose /v1/semantic-search via FastAPI, with filtering and rank fusionCollaborate closely with our Tech Lead on UX integration and snippet generationYour ToolboxPython, PyTorch, sentence-transformers, OpenAI APIs, or similar pretrained LLMs.FastAPI, Milvus or pgvector, PyPDF/Tika, Airflow or Lambda for orchestrationDocker, GPU scheduling, Athena/Redshift SQLYou Might Be a Fit If...Youve built ML pipelines that touched real users, not just notebooksYouve worked on semantic search, embeddings, or large-scale taggingYouve wrestled with unstructured data and love turning chaos into clarityYou like working fast, iterating with feedback, and tracking metrics that matterWhy This Role MattersYour models decide what gets found, how its tagged, and which content and companies stand out. Youll help define what relevance and freshness mean for over a million resources and 50,000+ company pages-and make sure INKHUB stays ahead of the curve.

1 Jobs Found

INFUSE

Semantic Backend Engineer (Contract, Remote)

3 weeks ago   IT & Telecoms   Vilnius Full-time   €
OUR HIRING PROCESS:We will review your application against our job requirements. We do not employ machine learning technologies during this phase as we believe every human deserves...