728x90
1.transformers
자연어 처리(NLP)와 관련된 작업을 쉽게 수행할 수 있도록 제공되는 Hugging Face의 강력한 Python 라이브러리
다양한 사전 학습된 모델(pretrained model)을 제공하며, 모델을 쉽고 효율적으로 활용할 수 있는 기능을 지원한다.
쉬운 API를 통해 모델을 로드하고 사용할수있다.
2.Pipeline
Pipeline은 Hugging Face transformers 라이브러리에서 제공하는 고수준API로,
사전 학습된 모델과 토크나이저를 조합하여 특정 자연어 처리(NLP) 작업을 빠르게 수행할 수 있도록 도와주는 기능이다.
복잡한 설정 없이도 단 몇 줄의 코드로 NLP 태스크를 실행할 수 있다.
2-1.텍스트 생성 (text-generation)
주어진 텍스트를 기반으로 자동으로 텍스트를 생성한다.
from transformers import pipeline
generator = pipeline('text-generation', model='gpt2')
print(generator("Once upon a time", max_length=50))
#결과
[{'generated_text': 'Once upon a time the gods were not to be seen with fear, but to be considered as having been called, for in the past the deities of old, and even as old were not believed to have ascended to heaven. The same gods and goddess'}]
2-2.텍스트 분류 (text-classification)
텍스트의 감정 분석 또는 다른 카테고리로 텍스트를 분류한다.
from transformers import pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
output = classifier("I love this movie!")
print(output)
#결과
[{'label': 'POSITIVE', 'score': 0.9998775720596313}]
2-3.질문-응답 (question-answering)
주어진 문맥에서 사용자가 제시한 질문에 대한 답을 추출한다.
from transformers import pipeline
qa = pipeline("question-answering", model="bert-large-uncased-whole-word-masking-finetuned-squad")
context = "Hugging Face is creating amazing tools for NLP."
answer = qa(question="What is Hugging Face creating?", context=context)
print(answer)
#결과
{'score': 0.7493830323219299, 'start': 25, 'end': 38, 'answer': 'amazing tools'}
2-4.텍스트 요약 (summarization)
긴 텍스트를 간략하게 요약한다.
from transformers import pipeline
summarizer = pipeline("summarization")
text = "Artificial intelligence has rapidly evolved over the past few decades, transforming
numerous industries by enabling machines to perform tasks that were once considered exclusive
to human beings, such as understanding natural language, recognizing images, and making decisions
based on data. With advancements in machine learning algorithms, deep learning models, and the
availability of massive datasets, AI has made significant strides in areas like healthcare,
finance, autonomous vehicles, and entertainment, bringing both immense potential for innovation
and challenges regarding ethics, privacy, and job displacement."
summary = summarizer(text)
print(summary)
#결과
[{'summary_text': 'AI has made significant strides in areas like healthcare, finance,
autonomous vehicles, and entertainment . AI is able to perform tasks that were once considered
exclusive to human beings, such as understanding natural language .'}]
2-5.번역 (translation)
한 언어에서 다른 언어로 번역한다.
from transformers import pipeline
translator = pipeline("translation_en_to_fr")
translated_text = translator("Hello, how are you?")
print(translated_text)
#결과
[{'translation_text': 'Bonjour, comment êtes-vous?'}]
2-6.개체명 인식 (ner)
이름, 장소, 조직 등의 개체명을 인식하여 텍스트에서 추출한다.
from transformers import pipeline
ner = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")
result = ner("Hugging Face Inc. is based in New York City.")
print(result)
#결과
[{'entity': 'I-ORG', 'score': np.float32(0.999185), 'index': 1, 'word': 'Hu', 'start': np.int32(0), 'end': np.int32(2)}, {'entity': 'I-ORG', 'score': np.float32(0.98826337), 'index': 2, 'word': '##gging', 'start': np.int32(2), 'end': np.int32(7)}, {'entity': 'I-ORG', 'score': np.float32(0.99680424), 'index': 3, 'word': 'Face', 'start': np.int32(8), 'end': np.int32(12)}, {'entity': 'I-ORG', 'score': np.float32(0.9992124), 'index': 4, 'word': 'Inc', 'start': np.int32(13), 'end': np.int32(16)}, {'entity': 'I-LOC', 'score': np.float32(0.99912935), 'index': 9, 'word': 'New', 'start': np.int32(30), 'end': np.int32(33)}, {'entity': 'I-LOC', 'score': np.float32(0.9991788), 'index': 10, 'word': 'York', 'start': np.int32(34), 'end': np.int32(38)}, {'entity': 'I-LOC', 'score': np.float32(0.99941075), 'index': 11, 'word': 'City', 'start': np.int32(39), 'end': np.int32(43)}]
2-7.텍스트 생성 (대화형 챗봇) (conversational)
사용자가 입력한 텍스트를 기반으로 모델이 대화에 응답한다.
from transformers import pipeline
conversation = pipeline("text-generation", model="microsoft/DialoGPT-medium")
# 대화 이력을 주고 대답을 유도
dialogue_history = "User: Hello, who are you?\nBot: My name is chatbot\nUser: What is your name?\nBot:"
response = conversation(dialogue_history, max_length=100)
print(response[0]['generated_text'])
#결과
User: Hello, who are you?
Bot: My name is chatbot
User: What is your name?
Bot: My name is chatbot
2-8.감정 분석 (sentiment-analysis)
텍스트의 감정을 긍정적 또는 부정적으로 분류
from transformers import pipeline
sentiment_analyzer = pipeline("sentiment-analysis")
result = sentiment_analyzer("I am so happy with this product!")
print(result)
#결과
[{'label': 'POSITIVE', 'score': 0.9998794794082642}]
2-9. fill-mask
마스크 토큰을 채우는 태스크로, 주어진 문장에서 일부 단어가 마스킹되어 있을 때 그 자리에 적합한 단어를 예측하는 작업
from transformers import pipeline
fill_mask = pipeline("fill-mask", model="bert-base-uncased")
# 문장에서 마스크된 단어 예측
result = fill_mask("The capital of France is [MASK].")
print(result)
#결과
[{'score': 0.41678884625434875, 'token': 3000, 'token_str': 'paris', 'sequence': 'the capital of france is paris.'}, {'score': 0.07141605019569397, 'token': 22479, 'token_str': 'lille', 'sequence': 'the capital of france is lille.'}, {'score': 0.06339244544506073, 'token': 10241, 'token_str': 'lyon', 'sequence': 'the capital of france is lyon.'}, {'score': 0.04444721341133118, 'token': 16766, 'token_str': 'marseille', 'sequence': 'the capital of france is marseille.'}, {'score': 0.030297040939331055, 'token': 7562, 'token_str': 'tours', 'sequence': 'the capital of france is tours.'}]
2-10. feature-extraction
입력 텍스트에서 특징을 추출하는 작업을 수행하는 태스크
(문장을 벡터화하여 텍스트 데이터의 특징을 나타내는 고차원 표현을 얻는 데 사용)
from transformers import pipeline
feature_extractor = pipeline("feature-extraction", model="bert-base-uncased")
# 텍스트에서 특징 벡터 추출
text = "The quick brown fox jumps over the lazy dog"
features = feature_extractor(text)
print(features)
#결과
[[[-0.28680315613746643, 0.19666296243667603, -0.030298739671707153, 0.026034830138087273, 0.03543538227677345, -0.2309354841709137, 0.24428656697273254, 0.18813961744308472, -0.21425262093544006, -0.1674482226371765, -0.26253777742385864, -0.06266085803508759, -0.08383075147867203, 0.22012129426002502, 0.08247353136539459, -0.20929597318172455, -0.0031666792929172516, 0.11451029777526855, 0.08241947740316391, -0.3218972682952881, -0.01800994761288166, -0.3554046154022217, -0.13068076968193054, -0.1989479660987854, 0.2122272551059723, -0.09914439916610718, 0.09241244196891785, 0.04873329773545265, -0.013968296349048615, 0.22892151772975922, 0.1001611277461052, -0.07322248816490173, 0.1135338544845581, 0.0017675124108791351, 0.003248250111937523, -0.036330416798591614, 0.3464372158050537, -0.22762098908424377, 0.21113961935043335, 0.2644560933113098, 0.11919791996479034, 0.33877888321876526, 0.05948124825954437, -0.048146478831768036, 0.17387482523918152, -0.4298490881919861, -2.104947090148926, -0.197062686085701, 0.10350975394248962, -0.33880725502967834, -0.2039531022310257, 0.1761169731616974, 0.015010938048362732, 0.18302640318870544, -0.1633974313735962, -0.09146419167518616, -0.2166876643896103, 0.8101699948310852, 0.05586116388440132, -0.04931221529841423, -0.15135279297828674, 0.03400237485766411, -0.17783208191394806, 0.16953827440738678, -0.341378390789032, 0.2635940611362457, -0.021730978041887283, 0.0428248830139637, 0.1606375277042389, 0.31509101390838623, -0.17428871989250183, -0.22821378707885742, 0.12440681457519531, 0.24298132956027985, 0.13453254103660583, -0.14263400435447693, 0.1108018159866333, 0.06609727442264557, -0.0646228939294815, -0.015764057636260986, 0.13270865380764008, 0.17036046087741852, 0.25056806206703186, 0.31127074360847473, 0.1606602966785431, 0.20450632274150848, 0.17459949851036072, -0.009420489892363548, -0.13019275665283203, 0.49855178594589233, -0.13795281946659088, 0.10579814016819, -0.07662670314311981, 0.36631152033805847, -0.01849183812737465, 0.0029215216636657715, 0.2110719382762909, -0.06485189497470856, 0.36017510294914246, 0.12957225739955902, 0.03519410640001297, -0.08323973417282104, 0.10369297862052917, -0.17210593819618225, -0.08833253383636475, -0.1874619722366333, -0.12411082535982132, 0.024968311190605164, -0.08634438365697861, -2.6554746627807617, 0.11880648136138916, 0.02640966698527336, 0.02851690538227558, -0.15241152048110962, 0.15620452165603638, 0.4300912618637085, 0.3394225239753723, -0.03260992467403412, -0.1732845902442932, 0.12408609688282013, -0.13879947364330292, 0.2689967453479767, -0.057260215282440186, -0.2810550034046173, 0.2255735546350479, 0.02251310646533966, -0.07964135706424713, 0.021433774381875992, 0.21759918332099915, -0.0020252540707588196, -0.036101922392845154, 0.3408764898777008, -0.037055760622024536, -0.0762004554271698, -0.006750114262104034, 0.03929843008518219, 0.5451196432113647, -0.22109778225421906, -0.1700913906097412, -0.19461894035339355, -0.24776087701320648, 0.15433762967586517, -3.1018354892730713, 0.26789966225624084, 0.5128973126411438, -0.016113372519612312, -0.3548853397369385, -0.1243925541639328, 0.18702711164951324, 0.32484179735183716, -0.06352975964546204, 0.16659203171730042, -0.2793274521827698, 0.33971643447875977, -0.2722198963165283, -0.021901577711105347, -0.057661041617393494, -0.19960619509220123, 0.3047432005405426, 0.5056770443916321, -0.09643929451704025, -0.19265887141227722, 0.09206189960241318, -0.5304378867149353, -0.059830836951732635, -0.20908018946647644, 0.3692020773887634, 0.10488261282444, -0.38511088490486145, 0.12156528234481812, 0.18544632196426392, 0.06202717125415802, 0.3240406811237335, 0.12688076496124268, 0.30438196659088135, -0.16566583514213562, -0.08126339316368103, -0.13980698585510254, 0.1393856406211853, -0.292312353849411, -0.1773846298456192, 0.38236111402511597, -0.056221120059490204, 0.11438103765249252, 0.10896342992782593, -0.04041047766804695, 0.0264449305832386, 0.0730045884847641, -0.1449800282716751, -0.27423256635665894, 0.16862325370311737, 0.10401411354541779, 0.2768476903438568, -0.15654003620147705, 0.18136394023895264, -0.3867495656013489, 0.031187226995825768, -0.9282472729682922, 0.22016343474388123, 0.007346548140048981, -0.130105122923851, -0.11378832161426544, -0.19975239038467407, 0.14283889532089233, -0.0718226209282875, 3.7502102851867676, 0.12048932909965515, 0.08727128058671951, 0.2504061460494995, 0.032023753970861435, -0.11975687742233276, 0.25041621923446655, -0.12121458351612091, -0.04192318394780159, 0.11249624937772751, -0.1397656798362732, 0.15642507374286652, -0.0009400621056556702, -0.17593565583229065, 0.22299446165561676, -0.005883164703845978, 0.3236662447452545, -0.2256155014038086, 0.012364823371171951, 0.0024755075573921204, 0.07577887177467346, 0.20077335834503174, 0.037452831864356995, -0.2513401210308075, -0.9242392778396606, 0.05150190368294716, -0.09397120773792267, -0.35053372383117676, 0.16524915397167206, -0.4615684747695923, -0.43480873107910156, 0.24729177355766296, -0.07399731874465942, 0.398999959230423, -0.11670590937137604, 0.0199582539498806, 0.18128687143325806, 0.1936425119638443, 0.11333465576171875, 0.03381149843335152, 0.0983508974313736, 0.35286617279052734, 0.09223376214504242, -0.20238034427165985, -0.3425055742263794, 0.3363665044307709, -0.000981256365776062, 0.2171396017074585, -0.33461272716522217, 0.03685157001018524, -0.22162088751792908, -0.02040775492787361, 0.12231557816267014, -0.22076481580734253, 0.0944022685289383, -0.10601004958152771, -0.13942331075668335, 0.06321761012077332, 0.0848388597369194, -0.04248090833425522, -0.09013048559427261, -0.22221574187278748, -0.16975800693035126, 0.08489447832107544, 0.04343456029891968, -0.1020161584019661, -0.10810201615095139, -0.058782100677490234, -4.483575344085693, 0.07331481575965881, 0.09459589421749115, 0.46751904487609863, 0.09151887148618698, 0.07520279288291931, -0.1049676463007927, 0.27157217264175415, 0.015445600263774395, -0.46809232234954834, 0.3322887420654297, 0.3863289952278137, -0.18328039348125458, 0.1146286204457283, -0.649963915348053, 0.19040438532829285, -0.09380271285772324, -0.09983716905117035, 0.2759068012237549, 0.17007720470428467, 0.16726301610469818, -0.25228646397590637, -0.09994438290596008, 0.02523803524672985, 0.18759411573410034, 0.09372803568840027, -0.22389602661132812, 0.030441448092460632, 0.5294294953346252, -0.2775977551937103, 0.27460813522338867, -0.04855047166347504, 0.017964089289307594, 0.03591754287481308, 0.014076406136155128, -3.3555870056152344, -0.015008315443992615, 0.02992713637650013, -0.006348419934511185, 0.3264373242855072, -0.3654279112815857, 0.2647407054901123, 0.11282666027545929, -0.3500625491142273, 0.35229772329330444, -0.16285786032676697, -0.05778688192367554, 0.10922932624816895, 0.1864340901374817, 0.2762880325317383, -0.2154846489429474, 0.11972655355930328, -0.10683249682188034, -0.1022571474313736, -0.035263922065496445, -0.28261685371398926, -0.16822700202465057, -0.07016577571630478, 0.020030608400702477, -0.3035644590854645, 0.0318499356508255, -0.1622103899717331, -0.4624800384044647, -0.15825553238391876, 0.006095210090279579, 0.09316106140613556, -0.15965332090854645, 0.3264533579349518, -0.05102479085326195, -0.25254249572753906, -0.10703373700380325, -0.013613313436508179, 0.12928977608680725, 0.42499521374702454, 0.21023638546466827, -0.04943682253360748, 0.53718101978302, 0.22117793560028076, 0.48511719703674316, 0.3551604747772217, 0.0476122684776783, -0.03852090239524841, -0.48407047986984253, 0.2761755883693695, -0.07226483523845673, -0.015469705685973167, 0.12220107018947601, 0.916701078414917, -0.17543286085128784, 0.16165882349014282, -0.16816654801368713, 0.20486679673194885, 0.20975999534130096, 0.2033861130475998, -0.07856544852256775, 0.42397409677505493, -0.37573128938674927, 0.15935713052749634, 0.4628056287765503, 0.34462618827819824, -0.2676122188568115, 0.275296151638031, 0.027993299067020416, -0.11614929139614105, -0.00835610181093216, 0.08384847640991211, 0.28393539786338806, 0.05043816938996315, -0.8689311146736145, 0.007929563522338867, 0.2102605700492859, -0.16846990585327148, 0.08059830963611603, 0.33583390712738037, -0.34033384919166565, -0.1272045224905014, -0.13863740861415863, 0.06312281638383865, 0.28059837222099304, -0.32118162512779236, -0.3808932602405548, -0.04443962872028351, 0.16086196899414062, -0.1216016486287117, 0.19138555228710175, -0.1414569914340973, 0.35662347078323364, -0.3416646420955658, -0.09090964496135712, -0.031722187995910645, 0.17885254323482513, 0.13481514155864716, -0.6785671710968018, -0.007687747478485107, 0.0006062723696231842, 0.0974925309419632, -0.3236491084098816, -0.33586668968200684, 0.0656411349773407, -0.2858342230319977, -0.23834830522537231, 0.019458206370472908, 0.2861795425415039, -0.0975046306848526, 0.2560274302959442, -0.28820598125457764, 0.0602373331785202, -0.07929572463035583, -0.014447912573814392, 0.6179730892181396, 0.041423600167036057, 0.21421273052692413, 0.33905309438705444, -0.054724160581827164, 0.07043209671974182, 0.007932078093290329, 0.03231845051050186, 0.22264903783798218, -0.14434367418289185, -0.2102840393781662, -0.015115275979042053, 0.27791616320610046, -0.22639723122119904, -0.0055937133729457855, -0.16541925072669983, -0.09417000412940979, -0.23901015520095825, -0.11683380603790283, -0.755134642124176, 0.009993486106395721, 0.2222934365272522, -0.05294231325387955, 0.188501238822937, 0.4055735766887665, 0.1126050278544426, -0.07904471457004547, 0.22067224979400635, 0.16269691288471222, 0.3768778443336487, -0.08595865964889526, 0.16380292177200317, 0.011731106787919998, 0.10393273830413818, 0.07433102279901505, 0.33396783471107483, -0.03485693782567978, -0.06329777836799622, 0.04079079255461693, -0.11561427265405655, 0.1174900233745575, -0.2787216305732727, 0.18163208663463593, -0.08956748247146606, 0.1238505095243454, 0.42765793204307556, 0.001082337461411953, 0.021892108023166656, -1.3603662252426147, -0.05394117906689644, -0.08494565635919571, -0.10702820122241974, 0.2191287875175476, -0.5326406359672546, -0.31020447611808777, 0.3366406559944153, -0.17873506247997284, 0.010835591703653336, -0.10338854044675827, 0.32791057229042053, 0.0019517801702022552, -0.059468358755111694, -0.12949928641319275, -0.06974710524082184, 0.1430085450410843, -0.10381967574357986, 0.11750411242246628, 0.05534645915031433, -0.14477859437465668, 0.40044862031936646, 0.10140208154916763, -0.3236985504627228, 0.05663337558507919, 0.13012441992759705, 0.05384440720081329, 0.3491630554199219, 0.15930886566638947, 0.40460196137428284, 0.18421711027622223, 0.17829127609729767, -0.39508917927742004, 0.16409963369369507, -0.20880505442619324, 0.36444664001464844, 0.3182564079761505, -0.11057429015636444, 0.11076609790325165, 0.11210878193378448, -0.1222885400056839, 0.18865931034088135, 0.2224566638469696, -0.2273465245962143, 0.2515835464000702, 0.13477806746959686, -0.10826827585697174, 0.1257142424583435, 0.030309630557894707, -0.4951154589653015, -0.033461686223745346, 0.09034
2-11.zero-shot-classification
사전 훈련된 모델을 사용하여 새로운 분류 작업을 수행하는 태스크입
라벨이 없는 새로운 분류 작업에 대해 훈련 없이 모델이 텍스트의 클래스(범주)를 예측할 수 있도록한다.
from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
# 문장과 후보 클래스 정의
sequence = "I had a great time at the concert last night."
candidate_labels = ["positive", "negative", "neutral"]
# 문장을 주어진 클래스에 대해 분류
result = classifier(sequence, candidate_labels)
print(result)
#결과
{'sequence': 'I had a great time at the concert last night.',
'labels': ['positive', 'neutral', 'negative'],
'scores': [0.8989167809486389, 0.06067398935556412, 0.04040921851992607]}
728x90
'AI' 카테고리의 다른 글
[AI] 아키텍처(architectures), 체크포인트(checkpoints) (0) | 2025.01.07 |
---|---|
[AI] 파인튜닝(Fine-tuning) (0) | 2025.01.06 |
[AI] 허깅 페이스(Huggingface) (0) | 2025.01.06 |