Aprendizaje Automático sobre
Grandes Volúmenes de Datos

Clase 7

Pablo Ariel Duboue, PhD

Universidad Nacional de Córdoba,
Facultad de Matemática, Astronomía y Física
figura escudo.png

None.1 Septima Clase: Clustering Estadístico

None.1.1 Clase anterior

Preguntas
Recordatorio
Revisión K-Means
K-Means, gráficamente
figura K_Means_Example_Step_1.png figura K_Means_Example_Step_2.png figura K_Means_Example_Step_3.png figura K_Means_Example_Step_4.png
(Wikipedia)

None.1.2 Bigdata

¿Qué es Bigdata?
El valor está en los datos
Las computadoras como humanizadoras
La democratización del cómputo
Los resultados inesperados de la abundancia
Data Science
Conceptos de Bigdata
Pasos del proceso de Bigdata
  1. Adquicisión de datos
  2. Limpieza de datos
  3. Análisis de datos
  4. Uso en predicción

None.1.3 Algoritmo EM

Bayes revisitado
hML = argmaxh ∈ Hp(D|h)
hML = argmaxh ∈ Hp(di|h)
hML = argmaxh ∈ H(1)/((2πσ2))e − (1)/(2σ²)(di − μ)2
Estimador ML
hML  =  argmaxh ∈ H(1)/((2πσ2))e − (1)/(2σ²)(di − μ)2  =  argmaxh ∈ H(1)/((2πσ2))e − (1)/(2σ²)(di − h(xi))2  =  argminh ∈ Hmi = 1(di − h(xi))2
Ejemplo de EM
μML = argminμmi = 1(xi − μ)2
Calculando los E[zij]
E[zij]  =  (p(x = xi|μ = μi))/(2n = 1p(x = xi|μ = μn))  =  (e − (1)/(2σ²)(xi − μj)2)/(2n = 1e − (1)/(2σ²)(xi − μn)2)
Calculando los μi
μj = (mi = 1E[zij]xi)/(mi = 1E[zij])

None.1.4 Thoughtland

Thoughland
Ejemplo de Thoughtland
figura website.png
Entrada
@relation auto_mpg
@attribute mpg numeric
@attribute cylinders numeric
@attribute displacement numeric
@attribute horsepower numeric
@attribute weight numeric
@attribute acceleration numeric
@attribute modelyear numeric
@attribute origin numeric
@data
18.0,8,307.0,130.0,3504.,12.0,70,1
14.0,8,455.0,225.0,3086.,10.0,70,1
24.0,4,113.0,95.00,2372.,15.0,70,3
22.0,6,198.0,95.00,2833.,15.5,70,1
...
+400 instancias más
Salida
There are four components and eight dimensions. Components One, Two and Three are small. Components One, Two and Three are very dense. Components Four, Three and One are all far from each other. The rest are all at a good distance from each other.
There are four components and eight dimensions. Components One, Two and Three are small. Components One, Two and Three are very dense. Components Four and Three are far from each other. The rest are all at a good distance from each other.
There are five components and eight dimensions. Components One, Two and Three are small and Component Four is giant. Components One, Two and Three are very dense. Components One and Four are at a good distance from each other. Components Two and Three are also at a good distance from each other. Components Two and Five are also at a good distance from each other. The rest are all far from each other.
Arquitectura
figura architecture.png
Aprendizaje Automático
figura points_j48.png
Clustering