Chatbots de IA: Desarrollan Método Innovador Para Prevenir Respuestas Tóxicas

13 abril 2024 Hector Russo

Investigadores del MIT han desarrollado un método innovador y más eficiente para prevenir respuestas tóxicas de los chatbots de IA, utilizando un modelo avanzado de aprendizaje automático.

Tradicionalmente, asegurar que las respuestas de los chatbots sean seguras y adecuadas se logra a través de un proceso conocido como «red teaming».

Este proceso implica que evaluadores humanos intenten deliberadamente provocar respuestas dañinas de los sistemas de IA. Sin embargo, debido a la complejidad y variedad de interacciones posibles, los métodos tradicionales han mostrado limitaciones.

Chatbots de IA - Red Teaming — *Imagen DALL-E*

Nuevo Enfoque Para Mejorar Seguridad de Chatbots de IA

El equipo del Laboratorio de IA Improbable del MIT y del Laboratorio de IA Watson MIT-IBM han liderado un nuevo enfoque que utiliza el aprendizaje automático para mejorar la efectividad de estas pruebas.

Red teaming tradicional implica a evaluadores humanos diseñando indicaciones para desencadenar respuestas tóxicas de los chatbots de IA o sea respuestas de odio o dañinas. No obstante, debido a la gran variedad de posibles salidas tóxicas, prever y probar cada indicación tóxica resulta casi imposible. Aunque indispensable, el red teaming tradicional enfrenta desafíos significativos en escala, efectividad y uso de recursos.

Los investigadores han desarrollado un modelo de aprendizaje automático que automatiza la generación de indicaciones de red teaming. Este modelo utiliza técnicas de exploración basadas en la curiosidad para generar un amplio rango de indicaciones que provocan respuestas tóxicas de forma más efectiva.

La novedad y la diversidad de las indicaciones son recompensadas, incentivando al modelo a explorar y generar nuevas indicaciones en lugar de repetir las conocidas.

Beneficios sobre los Métodos Tradicionales

Este nuevo método tiene varias ventajas, como una mayor diversidad de indicaciones probadas y una mayor eficiencia, lo que permite actualizaciones y mejoras más frecuentes de los modelos de IA. También ha demostrado ser más efectivo que otros enfoques de aprendizaje automático y evaluadores humanos en identificar respuestas potencialmente tóxicas de los chatbots de IA.

Aplicaciones Prácticas y Futuro

El éxito de este enfoque tiene profundas implicaciones para la implementación de sistemas de IA más seguros y confiables. Las futuras investigaciones buscarán expandir los tipos de indicaciones que el modelo puede generar y explorar la integración de políticas específicas de empresas o normas sociales en el proceso de entrenamiento.

Este avance representa un paso significativo hacia la seguridad de la IA, mostrando el compromiso continuo con el desarrollo ético de la tecnología.

La investigación ha sido financiada por una combinación de asociaciones académicas y corporativas, reflejando la creciente importancia de la seguridad de la IA en los sectores público y privado.

Relacionado

Síguenos en las redes sociales para estar al día con todas las noticias, cursos gratuitos y demás artículos interesantes. Aquí te dejamos varias opciones:

* En Twitter, nos encuentras como @Geeksroom.
* Para vídeos, suscríbete a nuestro canal de Youtube.
* En Instagram, disfruta de nuestras imágenes.
* También podrás disfrutar de Geek's Room a través de Pinterest.

Tweet Share Reddit

Acerca del Autor

Hector Russo

Desde hace 32 años está radicado en Dallas, Texas y desde mucho antes se dedicó a la Tecnología de la Información. En su oportunidad fue incluido por Ivy Worldwide en su lista Top 25 influencers en Tecnología. Actualmente también es gerente de IT en una importante compañía del sector de Energía y además es miembro actual del panel que elige los mejores vehículos del año para el mercado hispano de Estados Unidos, a través de los Hispanic Motor Press Awards.

Resumen de Privacidad

Esta página web utiliza cookies para mejorar su experiencia mientras navega por el sitio web. De estas, las cookies que se clasifican como necesarias se almacenan en su navegador, ya que son esenciales para el funcionamiento de las funcionalidades básicas del sitio web. También utilizamos cookies de terceros que nos ayudan a analizar y comprender cómo utiliza este sitio web. Estas cookies se almacenarán en su navegador solo con su consentimiento. También tiene la opción de rechazar estas cookies. Pero el rechazo de algunas de estas cookies puede afectar su experiencia de navegación.

Necessary

Siempre activado

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duración	Descripción
AWSALBCORS	7 days	Amazon Web Services set this cookie for load balancing.
consentUUID	1 year	This cookie is used as a unique identification for the users who has accepted the cookie consent box.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_csrf	session	This cookie is essential for the security of the website and visitor. It ensures visitor browsing security by preventing cross-site request forgery.

Functional

Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.

Cookie	Duración	Descripción
na_id	1 year 1 month	The na_id is set by AddThis to enable sharing of links on social media platforms like Facebook and Twitter.
na_rn	1 month	The na_rn cookie is used to recognize the visitor upon re-entry. It allows to record details on user behaviour and facilitate the social sharing function provided by Addthis.com.
na_sc_e	1 month	The na_sc_e cookie is used to recognize the visitor upon re-entry. It allows to record details on user behaviour and facilitate the social sharing function provided by Addthis.com.
na_sr	1 month	The na_sr cookie is used to recognize the visitor upon re-entry. It allows to record details on user behaviour and facilitate the social sharing function provided by Addthis.com.
na_srp	1 minute	The na_srp cookie is used to recognize the visitor upon re-entry. It allows to record details on user behaviour and facilitate the social sharing function provided by Addthis.com.
na_tc	1 year 1 month	The na_tc cookie is used to recognize the visitor upon re-entry. It allows to record details on user behaviour and facilitate the social sharing function provided by Addthis.com.
ouid	1 year 1 month	Associated with the AddThis widget, this cookie helps users to share content across various networking and sharing forums.
__cf_bm	30 minutes	Cloudflare set the cookie to support Cloudflare Bot Management.

Performance

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

Cookie	Duración	Descripción
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
d	3 months	Quantserve sets this cookie to anonymously track information on how visitors use the website.

Analytics

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.

Cookie	Duración	Descripción
ANON_ID	3 months	This cookie, set by Tribal Fusion, collects data on user visits to the website, such as what pages have been accessed .
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
suid	1 year	Simpli. fi sets this cookie to store a distinct session ID.
u	1 year	This cookie is used by Bombora to collect information that is used either in aggregate form, to help understand how websites are being used or how effective marketing campaigns are, or to help customize the websites for visitors.
uid	2 months	This is a Google UserID cookie that tracks users across various website segments.
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_gat_gtag_UA_*	1 minute	Google Analytics sets this cookie to store a unique user ID.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
__gads	1 year 24 days	Google sets this cookie under the DoubleClick domain, tracks the number of times users see an advert, measures the campaign's success, and calculates its revenue. This cookie can only be read from the domain they are currently on and will not track any data while they are browsing other sites.

Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.

Cookie	Duración	Descripción
A3	1 year	Yahoo set this cookie for targeted advertising.
ab	1 year	Owned by agkn, this cookie is used for targeting and advertising purposes.
anj	3 months	AppNexus sets the anj cookie that contains data stating whether a cookie ID is synced with partners.
ANON_ID_old	3 months	This cookie helps to categorise the users interest and to create profiles in terms of resales of targeted marketing. This cookie is used to collect user information such as what pages have been viewed on the website for creating profiles.
cid_*	1 year	Crimtan sets this cookie as remarketing cookie that is used to send relevant ads to users on subsequent sites.
CMID	1 year	Casale Media sets this cookie to collect information on user behaviour for targeted advertising.
CMPRO	3 months	CasaleMedia sets CMPRO cookie for anonymous usage tracking and targeted advertising.
CMPS	3 months	CasaleMedia sets CMPS cookie for anonymous user tracking based on users' website visits to display targeted ads.
DSID	1 hour	This cookie is set by DoubleClick to note the user's specific user identity. It contains a hashed/encrypted unique ID.
everest_g_v2	1 year	The cookie is set under the everesttech.net domain to map clicks to other events on the client's website.
gid_*	1 year	Crimtan sets this cookie to enable targeted advertising and user profiling.
GoogleAdServingTest	session	Google sets this cookie to determine what ads have been shown to the website visitor.
IDE	1 year 24 days	Google DoubleClick IDE cookies store information about how the user uses the website to present them with relevant ads according to the user profile.
mc	1 year 1 month	Quantserve sets the mc cookie to track user behaviour on the website anonymously.
mt_mop	1 month	MediaMath uses this cookie to synchronize the visitor ID with a limited number of trusted exchanges and data partners.
pxrc	2 months	This cookie is set by pippio to provide users with relevant advertisements and limit the number of ads displayed.
rlas3	1 year	RLCDN sets this cookie to provide users with relevant advertisements and limit the number of ads displayed.
suid_legacy	1 year	Collects information on user preferences and interaction with web-campaign content which is used on CRM-campaign-platforms used by website owners for promoting events or products.
test_cookie	15 minutes	doubleclick.net sets this cookie to determine if the user's browser supports cookies.
UserID1	3 months	Adition sets this cookie as a unique anonymous ID for a website visitor. This ID is used to identify the user across sessions and to track their activity on the website. The data collected is used for analysis purposes.
uuid	1 year 27 days	MediaMath sets this cookie to avoid the same ads from being shown repeatedly and for relevant advertising.
uuid2	3 months	The uuid2 cookie is set by AppNexus and records information that helps differentiate between devices and browsers. This information is used to pick out ads delivered by the platform and assess the ad performance and its attribute payment.
wfivefivec	1 year 1 month 1 day	W55c sets this cookie to collect data on the user's visits to the website, such as what pages have been loaded. The registered data is used for targeted ads.
_gu	1 month	GetSiteControl sets this cookie to track user information from marketing campaigns.
__gpi	1 year 24 days	Google Ads Service uses this cookie to collect information about from multiple websites for retargeting ads.

Others

Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.

Cookie	Duración	Descripción
APC	5 months 27 days	Description is currently not available.
ar_debug	3 months	Description is currently not available.
b	1 year	Description is currently not available.
C	1 month	Description is currently not available.
ccpaApplies	1 year	No description available.
ccpaUUID	1 year	No description available.
dnsDisplayed	1 year	No description available.
jcsuuid	1 year	No description available.
matchgoogle	1 month	No description available.
signedLspa	1 year	No description available.
webu_session	session	Description is currently not available.
webu_session.sig	session	Description is currently not available.
_auid	1 year	No description available.
_sp_su	1 year	Description is currently not available.
_tracker	1 year 1 month 1 day	No description available.
__flips	30 minutes	No description available.
__flipu	2 years	No description available.