Data Analyst Learning Journey — Learning programming
I personally love telling a meaningful story that you can have an actionable plan or impactful decision afterwards. I strive to become a better data analyst, and I would like to share my journey.
I graduated with Accountancy major, and Data intelligence minor. Under my minor, it included four courses in data cleaning, statistical modeling and data mining techniques, all taught in SAS. To be completely honest, the courses were very useful; however, they were very short and fast, and the knowledge would eventually fade away very fast.
On my journey to become a data analyst, I aspire to master the following: SQL, Python, and statistical modeling. For the scope of this article, I will focus on the learning of programming techniques and tips:
SQL
SQL (Structured query language) is a programming language that is used for data manipulation and data storing in relational database. There are a variety of SQL versions that are used in the industry currently. Regardless of the versions and choices, the SQL versions have similar syntax, with some differences upon use.
From my own experience, I listed out the tips on learning SQL and the resources below.
Tips on learning SQL:
- Structure your learning
- Search and select learning resources: Selecting learning resources is an important steps. It is essential to select the resources based on your level. If you are a complete beginner, I would recommend websites or learning resources that let you practice and incorporate “Hint” or learning the fundamentals through practice. For intermediate or more advanced, practice and query performance improvement can be a more advanced focus. For beginner, I would recommend searching: SQL learning roadmap. I would also input some of the materials for reference below.
- Limit to 2 main learning resources at a time, if possible: With internet, you can easily find so many resources to start learning or continue your practice. However, this is also a disadvantages as you might get overwhelmed and not sure where to start and which to choose. This is paralysis of choices, so limit to 2 main resources and plan your study on these two resources.
- Be curious, and be proactive in learning: Through the courses of learning, it is very important to be proactive. This can be asking and searching the following questions: Is there a way to write lean codes for this query? Is there an alternative method? How would this codes process, and intentional coding?
2. Practice and plan your practice
- Set up a realistic schedule: Having a plan for your study is important, motivation is important; however, depending on motivation alone is not sufficient. It is especially important to have consistency in your learning. From this, setting a realistic learning and practice schedule is important. Realistic schedule really depends on your current situation, it can be a work-around to your full time job, weekend, etc. Example schedule: learn materials/theory 2 days of the week (each 15–30 minutes), practice queries 2–3 days of the week (each 30–45 minutes).
- Which problem and difficulties would be solved in each session of the learning, and how long should be the session?: This question depends on your current level. For beginners, this is more of a question on how many problems/concepts are you expecting to learn in each session. For intermediate and above, the decision would be how many medium or hard problems you should practice. The length of the session is also important to determine the optimal time you need to learn and process afterwards. I set my session at around 1–2 hours each.
- How long should you spend on a problem before asking for help?: Solving problems alone is an amazing way to learn; however, when you are stuck, do not hesitate to ask for help or use “Hint” system. It is extremely important for you to learn and part of the learning process is to process new ideas and see from other’s people problem solving as well. Set an expectation of problem solving session before asking for help. Example could be 10m of being stuck and doing research. Note down session difficulties for revision.
- Review past lessons and new techniques: Have a notebook or record to keep track of the concepts, commands, or structures learned. Have a reviewing session is helpful for knowledge retention.
3. Tactics review & Tactics change
- There is always a dip in learning, the learning curve in the process. It is normal to get frustrated; however, this can be a sign that the learning tactics should be improved.
- Tactics review: Use a session to review the difficulties, or frustration point and brainstorm. Are you missing knowledge, are you missing experience, logical thinking, or problem identification.
- Tactics change: Learning can be hard, and changing tactics can be often to adapt to the learning curve and needs of different phases.
- Avoid changing tactics to often, distract the focus, and don't let the tactics to fully form or take effect, recommend to at least work on the tactics at least 2-3 weeks.
SQL learning resources:
Learning:
- Beginner: https://www.databasestar.com/sql-roadmap/
- Intermediate-Advanced: https://use-the-index-luke.com/
Practice:
- Beginner: https://pgexercises.com/questions/basic/
- Intermediate-Advanced:
Leetcode (have option for beginner learning/practice as well, however, might require premium account).
Codewars
Mental support:
- The Dip — Seth Godin: This is a self-improvement and strategy book that I really love. This really reminds me of why I am learning, and the difficulties “the dip” that I am in to overcome it.
Python
Python is a programming language that is used to manipulate data, with a vigorous open-sources libraries and an active and supportive community.
My Python learning journey was very interesting, with some plateau and some periods of significant improvements. From my experience, I would call the learning an incremental and slow-burn learning. From this, I noted down the following tips based on the journey to become a data analyst:
Tips on learning Python:
- Equal structure and practice
- Search and limit main learning source. However, proactively read and learn on non-main learning sources: Similarly to SQL learning, there are so many learning resources in Python, the Python community is extremely robust and supportive. From this, selection of two main learning resources is a must; however, you should also take advantage on the learning opportunities from the Python community. For example, there are many Medium reading sources that you can select from. Be cautious, practice critical thinking, and do some more research after reading would be a great method to learn.
- Fundamentals of Python: Fundamentals of Python can be learned on the Python.org website. The resource can be long; however, very comprehensive and include all the fundamentals on what you want to do later on.
- Python libraries for data analysis: Pandas, numpy, Matplotlib (or other visualization libs: Bokeh, Plotly)
- Set up the practice environment right away: Although learning the fundamentals is important, it is equally essential to try and setup the practice environment as soon as possible. This is especially useful for practice by doing personal projects or follow labs/exercises in the learning materials. I would recommend choosing a IDE (Integrated Development environment) application to start. Some of the common ones: Visual Studio Code, Pycharm, Spyder, etc. You can watch IDE setup on youtube for references. In addition, learning to use the following tools can be useful: Github & Git (version controls for your code), pip (library installations), venv (virtual environment, optional)
2. Personal project & open-sources
- Practice right away, and equal time to learning fundamentals: Books served as learning material usually have lab and exercises sections at each end of chapter. This is essential for you to get your hand involved and practice. This is also an opportunity to start getting the feeling of debugging and problem solving if your codes don’t work.
- Learn code refactoring and PEP for better and more professional codes: As a self-taught coders, I especially feel the pressure to prove myself and improve my codes. Some of the strategy for me would be learning on lean codes through code refactoring and codes that follow PEP guidelines. PEP is a style guide for Python codes which helps to enhance readability. As a programmer, it is important to write clear and readable codes not only for other team members, but also for yourself months or years later.
- Participate in open sources projects on Github: Open sources projects or libraries can be a great way to learn more on techniques, develop your skills, and write cleaner codes as readable codes are a must for collaboration. In addition, it would also give you chances to meet and collaborate with community members and learn from them.
3. Plan your project and practice (have some structure, plan)
- Have project timeline: For personal projects to improve your skills, I would recommend having a project proposal and a project timeline. The project proposal could be the purpose or goal of the project, data sources, the range of the data. The goal of the project proposal and timeline is for you (and maybe others if you share on Github) to understand fully the scope of the project, the dataset, and what to expect each week. to gauge the project ensuring continuous learning, and not too long pause.
- Plan a project and structure of the project: This is usually depending on the type of project; however, it would generally include the following steps: Problem statement identification, deliverables identification, data sourcing, data cleaning, data exploration, data modeling, delivering deliverables, review of deliverables and code refactoring/performance check, and presentation of insights.
- Learn from others: As the Python community is so robust, it is encouraged to learn from others by checking others’ projects, codes, participate in open sources, etc. Note: You can star a project on Github to follow, and review back the projects of your interest.
Python learning resources:
Learning
- Beginner:
https://www.python.org/about/gettingstarted/
https://docs.python.org/3/tutorial/ - Other:
https://peps.python.org/pep-0008/ (PEP)
Python for Data Analysis — Wes Mckinney
Practice
Data sources can be easily found in open data sources from government, Kaggle, or such. I would generally decide the topics first, then search for data sources due to the availability of data. However, noted that sometimes you might also change your problem statements depending on the type of data found.
I hope the information above helps you in starting your programming learning to become a data analyst or suggesting some tips to improve your learning. Please don’t hesitate to reach out to me if you have questions, discussion or want to collaborate on project.