Hoy es 1 de Octubre, por que no hacemos un poco de ML? He oido por ahi que esta de moda.
Le he pedido a perplexity que me de un par de ideas. Y he seleccionado una de las típicas. Total, todas estas startup hacen exactamente lo mismo a nivel de ingeniería, lo que algunas lo hacen mal y otras lo hacen peor, solo cambian el input, el modelo y el output.
Objetivo self-host.
Pero hasta el mas valiente de entre nosotros se teme a si mismo
Your project idea sounds innovative and practical! Here’s a breakdown of the proposed pipeline, along with some insights and suggestions for each step:
Project Pipeline Overview
Grabar Conversación de Audio o Video:
- Description: Capture audio or video conversations using a recording tool.
- Considerations: Ensure you have permission to record conversations, especially if they involve multiple participants. You could use libraries like pyaudio for audio or opencv for video recording in Python.
Sacar los Silencios del Video:
- Description: Remove silent parts from the recorded video to create a more concise version.
- Tools: Use video editing libraries like moviepy or ffmpeg to analyze audio tracks and trim silences. This step will enhance the viewing experience by focusing on relevant content.
Con Whisper Meter Captions con Timestamps:
- Description: Utilize OpenAI's Whisper model to generate captions with timestamps from the audio track.
Implementation: You can use the Whisper API or run the model locally if you have the resources. Ensure that the generated captions are synced accurately with the audio timestamps for better usability.
Con los Captions Hacer Resúmenes, Capítulos y Usar los Timestamps para Volver a la Info:
- Description: Use the generated captions to create summaries and chapters, allowing users to navigate through the conversation easily.
Implementation: - Summarization: Implement a summarization model (like BART or T5) to condense the conversation into key points.
- Chapter Creation: Identify natural breaks or topics in the conversation to create chapters based on timestamps.
Search Functionality: Develop a search feature that allows users to find specific parts of the conversation using keywords, leveraging timestamps for quick navigation.
Additional Suggestions
- User Interface: Consider building a user-friendly interface where users can upload videos, view summaries, and navigate through chapters easily. A web application using frameworks like Flask or Django could work well.
- Multi-language Support: If your target audience is diverse, consider adding multi-language support for captions and summaries.
- Data Privacy: Ensure that you comply with data privacy regulations when handling audio/video recordings, especially if they contain personal information.
Testing and Feedback:
En anteriores capítulos hemos hecho otras cosas:
- The Ugly Org - una DB para mi amigo George Hotz y comma.ai en Rust
- una db mejor que la de geohotz en rust
- https://github.com/vrnvu/rust-minikeyvalue/
- the ugly org - bittorrent al toque
- un bittorrent y un tracker full funcional
- https://github.com/vrnvu/rust-bittorrent
- the ugly organization - lifts front-end
- react y tailwind
- https://github.com/vrnvu/lifts/
- the ugly organization - we write ugly code
- montar un cloud aws con terraform de zero, proyectos cron con CI/CD, gestionar seguridad...
- https://github.com/orgs/uglyorganization/repositories
- y algún otro que resulto ser demasiado simple, fácil... se queda en review.