A research team at Seoul National University of Science and Technology has unveiled 'Bllossom-V 3.1', the first Korean LMM, on HuggingFace.
This model, capable of processing both text and images, supports Korean and English, and was developed based on high-quality data built through a project funded by the Ministry of Science and ICT.
Particularly, it utilized Korean-English parallel visual corpus data to enhance Korean language performance, and the hierarchical connection method allows for stable support of two languages.
Seoul National University of Science and Technology (SeoulTech) Professor Kyung-Tae Lim's Multimodal Language Processing Lab (MLP) announced on September 4th that it had released 'Bllossom-V 3.1', the first Korean-specialized vision-language model, through the HuggingFace leaderboard.
This model is a vision-language model developed based on 'Bllossom', a language model jointly built by SeoulTech and TeddySum, through an additional training process for image processing. It supports both Korean and English and can process not only text but also images. The release of Bllossom-V 3.1 is significant as it showcases the first Korean-specialized LMM on HuggingFace.
The data that played a key role in the development of Bllossom-V 3.1, the first Korean-specialized LMM, was produced through the 'Document Generation and Information Retrieval Data' project hosted by the Ministry of Science and ICT and managed by the Korea Institute for Information and Communications Technology Promotion (NIA). Media Group Saramgwasup (Saramgwasup), a multimodal data specialist company, participated as the overall manager of the project and built high-quality, specialized data with EuclidSoft.
Furthermore, Bllossom-V 3.1 is a model that has completed large-scale Korean and English pre-training using the Layer Aligning method jointly developed by SeoulTech and TeddySum, enabling stable support for both languages. Additionally, it has received positive feedback for significantly improving Korean performance without sacrificing English performance by applying the MVIF Korean-English parallel visual corpus data directly built by the research team. The massive computing resources required for pre-training the vision-language model were supported by the Artificial Intelligence Industry Convergence Business Unit (AICA).
Yoon-Ki Han, CEO of Saramgwasup, who oversaw the construction of the model's training data, said, “I am very proud to have contributed to the creation of the first Korean-English vision-language open-source model through the construction of high-quality data.” He added, “We will continue to contribute to the creation of open data that can be used in various ways.”
The Bllossom-V 3.1 model can be found here.
About Media Group Saramgwasup
Media Group Saramgwasup, founded by creators with auteur-like individuality, boasts outstanding achievements in fields related to digitalization based on Visual Data, artificial intelligence (AI), big data, autonomous driving, and virtual reality (VR). Based on this, the company is pursuing global market entry through diverse business expansion, including in-house technology and service development. It is growing into a company that fosters growth and happiness for both individuals and organizations by placing people at the heart of its culture and business, with data acting as the bridge.