Tomofun, the company behind the Furbo Pet Camera, successfully reduced its deployment costs by 83% by migrating its pet behavior detection model from GPU-based instances to AWS Inferentia2-powered EC2 Inf2 instances, achieving high throughput for real-time inference without altering the core logic of its existing model. The transition involved using lightweight wrapper classes to maintain compatibility while leveraging the cost efficiency and performance of Inferentia2 for scaling their service.
The key takeaway for someone in enterprise AI and SaaS is the significant cost reduction achieved by Tomofun through migrating from GPU-based instances to AWS Inferentia2-based EC2 Inf2 instances for real-time vision-language model inference. This transition resulted in an 83% decrease in deployment costs without compromising performance, highlighting the potential for cost optimization in AI workloads by leveraging purpose-built AI chips like Inferentia2. For your own projects, consider assessing the cost-effectiveness and scalability benefits of switching to specialized AI infrastructure for large-scale inference tasks.