Cloudera's four-day workshop covers data science and machine learning workflows at scale using
Apache Spark 2 and other key components of the Hadoop ecosystem. It emphasizes
the use of data science and machine learning methods to address real-world business challenges.
Using scenarios and datasets from a fictional technology company, students discover insights to
support critical business decisions and develop data products to transform the business. The
material is presented through a sequence of brief lectures, interactive demonstrations, extensive
hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are
conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science
Workbench (CDSW) environment.
Designed for data scientists who currently use Python or R to work with smaller
datasets on a single machine and who need to scale up their analyses and machine learning models
to large datasets on distributed clusters. Data engineers and developers with some knowledge of
data science and machine learning may also find this workshop useful.
Workshop participants should have a basic understanding of Python or R and some experience
exploring and analyzing data and developing statistical or machine learning models. Knowledge
of Hadoop or Spark is not required.
Download Cloudera Data Scientist Training 190827
None. Laptops will be provided.
Upon completion of the training, you will receive a Training Certificate of Completion.
SoftSource Solutions Conference Room
10 Ubi Crescent #04-62/63 Ubi TechPark (Lobby D) Singapore 408564