Voice-Vision
A Streamlit-based web app that accepts speech and image inputs in a single interface. Voice transcripts are analyzed alongside vision-model insights so users can narrate a scenario, upload a photo, and receive synchronized intelligence in real time.
- Real-time speech capture, transcription, and NLP post-processing.
- Image ingestion pipeline that routes through downloaded vision weights.
- Unified Streamlit UI that fuses voice insights with visual analysis.
- Automated model-weight downloader to keep local artifacts in sync.