Oct 2020 / Data analysis workflows with R and Python

News

  • Homework is posted (if you are interested in a credit). See “Materials” below.

Part of the Scientific Computing in Practice lecture series at Aalto University.

Audience: Researchers who are using or will soon be using R and Python for data analysis, who know how to program with these languages, but do not necessarily know what are the best practices for data analysis. The course material is available in both R and Python, but this is not a course on the basics of scientific programming. If you wish to prep up your scientific programming skills, we recommend taking our Sept 2020 / Python for Scientific Computing-course.

About the course: We provide a practical introduction and advice for data analysis in R and Python. We will learn how you should organize your workflow, how to organize your data for efficient data analysis, how to obtain statistics and fit models into your data, and how to think about scaling your workflow.

The course is suited for people who are starting on doing data analysis and would like to start on a good workflow.

The exercises can be done with either R or Python.

Course consists of four three hour sessions that will be done online via zoom. In these sessions we’ll learn of core concepts of data analysis. We will do exercises during the sessions (available via course’s GitHub repository).

Time, date (Europe/Helsinki timezone): (convert 12:00 to your timezone)

  • Fri 2.10., 13:00-15:00 Installation help session

  • Mon 5.10., 12:00-15:00 (Understanding data analysis workflows)

  • Wed 7.10., 12:00-15:00 (Data preparation)

  • Mon 12.10., 12:00-15:00 (Modeling)

  • Wed 14.10., 12:00-15:00 (Scaling your analysis) (Modeling)

  • Fri 16.10., 12:00-15:00 (Scaling your analysis)

  • Please connect to all sessions 10 minutes early: icebreakers and intro already starts then.

Practical information

Place: This is an online course via Zoom (link sent to registered participants). The course is also streamed via Twitch (the CodeRefinery channel) so that anyone may follow along without registration. There is a HackMD link (collaborative edited notes), which is used for asking questions during the course. The actual material is here.

Cost: Free of charge for FGCI consortium members including Aalto employees and students.

Registration: registration is open (note: Zoom meeting is full, further registrations will get information on streaming only)

Instructors and organizers:

  • Simo Tuomisto, M. Sc., Aalto Scientific Computing / Department of Computer Science

  • Richard Darst Aalto Scientific Computing (coordination)

Credits: Credits available for the Aalto students and course certificate can be provided on request for the outsiders. Full course hours correspond roughly to 1 ECTS. Students who wish to get a certificate should hand in the special assignment and participate to at least 3 of 4 lectures.

Instructions on how to obtain the special assignment can be found at the Materials-section.

Preparation

Software installation is done via conda. Installation instructions are provided here.

There will be an installation help session on Fri 2.10. between 13:00-15:00. There we can help you install the required software.

Additional course info at: scip -at- aalto.fi

Preparation

Prerequisites include basic programming in Python or R.

Preparation: Online workshops can be a productive format, but it takes some effort to get ready. Browse these resources:

Software installation:

There will be an installation help session on Fri 2.10. between 13:00-15:00. There we can help you install the required software.

Community standards

This is a large course, and we will have many diverse groups attending it. Everyone will be both a teacher and a learner and help to make the course successful. Since this is a large and interactive course which we are just now prototyping, there will be some rough edges and not everything will go perfectly. Please learn from our mistakes, too!

This course consists of both lectures, hands-on exercises, and demos. It is designed to have a range of basic to advanced topics: there should be something for everyone.

The main point this course is the exercises, and they will happen in breakout rooms where we expect people to work together and help each other. We expect everyone to help each other as best as they can with respect for different levels of knowledge - at the same time be aware of your own limitations. No one is better than anyone else, we just have different existing skills and backgrounds.

If there is anything wrong, tell us - if you need to contact us privately, you can message the host on Zoom or contact us outside the course. This could be as simple as “speak louder / text on screen is unreadable” or someone is creating a harmful learning environment.

Material

Full course material can be found here.

Exercises are provided via course’s GitHub repository.

Special assignment can be found here. To download it, right-click click this link and save the link as a file to the course folder. The deadline is 4.december. Please send results to the course email address listed above (scip -at- aalto.fi) and make sure the answers are clearly findable.

News and notes

Week 40:

  • Please see the installation instructions (link above). You need to install anaconda before the first day, or else you will quickly fall behind. This class is so large that we won’t be able to do help you catch up. You should also verify your installation (this is part of the installation instructions). On the Friday before, we have an installation help session - you can join there for help.

  • Please remember to join the meeting 10 minutes early. Our icebreakers and introductions already start then. If you are on time, you are late!

Homework

See “materials” above.